Just starred death-to-ie11 [1] by gabLaroche [2]. It’s an exciting project with a lot to offer.
Countdown for IE11 end of support
References:
[1]: https://github.com/gabLaroche/death-to-ie11
[2]: https://github.com/gabLaroche
Publishing rhythm
📝 Packages to Investigate Notes
- jmespath
- Tabnine
Bulwark # [1]
|-|-|
|github: |https://github.com/zaxr/bulwark|
I definitely want to try this out with kedro.
Bulwark is a package for convenient property-based testing of pandas dataframes, supported for Python 3.5+.
Example # [2]
import bulwark.decorators as dc
@dc.IsShape((-1, 10))
@dc.IsMonotonic(strict=True)
@dc.HasNoNans()
def compute(df):
# complex operations to determine result
...
return result_df
References:
[1]: #bulwark
[2]: #example
I came across awesome-data-engineering [1] from igorbarinov [2], and it’s packed with great features and ideas.
A curated list of data engineering tools for software developers
References:
[1]: https://github.com/igorbarinov/awesome-data-engineering
[2]: https://github.com/igorbarinov
I’m really excited about vscode-python [1], an amazing project by microsoft [2]. It’s worth exploring!
Python extension for Visual Studio Code
References:
[1]: https://github.com/microsoft/vscode-python
[2]: https://github.com/microsoft
Debugging Python
Using pdb # [1]
References:
[1]: #using-pdb
Just Use Pathlib
Pathlib is an amazing cross-platform path tool.
Import # [1]
from pathlib import Path
Create path object # [2]
Current Directory
cwd = Path('.').absolute()
Users Home Directory
home = Path.home()
module directory
module_path = Path(__file__)
Others
Let’s create a path relative to our current module.
data_path = Path(__file__) / 'data'
Check if files exist # [3]
Make Directories # [4]
data_path.mkdir(parents=True, exists_ok=True)
rename files # [5]
Path(data_path /'example.csv').rename('real.csv')
List files # [6]
Glob Files # [7]
data_path.glob('*.csv')
recursively
data_path.rglob('*.csv')
Write # [8]
Path(data_path / 'meta.txt').write_text(f'created on {datetime.datetime.today()})
References:
[1]: #import
[2]: #create-path-object
[3]: #check-if-files-exist
[4]: #make-directories
[5]: #rename-files
[6]: #list-files
[7]: #glob-files
[8]: #write
Custom Python Exceptions
Custom Exceptions # [1]
class ProjectNameError(NameError):
pass
class UserNameError(NameError):
pass
class CondaEnvironmentError(RuntimeError):
pass
class BucketNotDefinedError(NameError):
pass
References:
[1]: #custom-exceptions
Filtering Pandas
query # [1]
Good for method chaining, i.e. adding more methods or filters without assigning a new variable.
# is
skus.query('AVAILABILITY == " AVAILABLE"')
# is not
skus.query('AVAILABILITY != " AVAILABLE"')
masking # [2]
general purpose, this is probably the most common method you see in training/examples
# is
skus[skus['AVAILABILITY'] == 'AVAILABLE']
# is not
skus[~skus['AVAILABILITY'] == 'AVAILABLE']
isin # [3]
capable of including multiple strings to include
# is in
df[df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])]
# is not in
df[~df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])]
contains # [4]
Good For partial matches
# contains
df[df.AVAILABILITY.str.contains('AVA')]
# not contains
df[~df.AVAILABILITY.str.contains('AVA')]
MASKS # [5]
anything that we put inside of square brackets can be set as a variable then passed in.
service_mask = skus['AVAILABILITY'] == 'AVAILABLE'
name_mask = skus['NAME'] == 'Dell chromebook 11'
Operators # [6]
& - and
~ - not
| - or
AVAILABLE and ...
Digital Ocean
I love digital ocean for it’s simplicity and its commitment to open source.
If you’re into interesting projects, don’t miss out on Recreation-of-Nature [1], created by Kashu7100 [2].
ALife simulation with Python: patterns, behavior, and cognition.
References:
[1]: https://github.com/Kashu7100/Recreation-of-Nature
[2]: https://github.com/Kashu7100
Quick Progress Bars in python using TQDM
tqdm is one of my favorite general purpose utility libraries in python. It
allows me to see progress of multipart processes as they happen. I really like
this for when I am developing something that takes some amount of time and I am
unsure of performance. It allows me to be patient when the process is going
well and will finish in sufficient time, and allows me to 💥 kill it and find a
way to make it perform better if it will not finish in sufficient time.
[1]
for more gifs like these follow me on twitter
@waylonwalker [2]
Add a simple Progress bar!
from tqdm import tqdm
from time import sleep
for i in tqdm(range(10)):
sleep(1)
convenience
TQDM also has a convenience function called trange that wraps the range function with a tqdm progress bar automatically.
from tqdm import trange
from time import sleep
for i in trange(range(10)):
sleep(1)
notebook support
There is also notebook support. If you are bouncing between ipython and jupyter I recomend importing from the auto ...
I’m impressed by bake [1] from kennethreitz [2].
Bake — the strangely familiar workflow utility.
References:
[1]: https://github.com/kennethreitz/bake
[2]: https://github.com/kennethreitz
Check out terminal [1] by microsoft [2]. It’s a well-crafted project with great potential.
The new Windows Terminal and the original Windows console host, all in the same place!
References:
[1]: https://github.com/microsoft/terminal
[2]: https://github.com/microsoft
Clean up Your Data Science with Named Tuples
If you are a regular listener of TalkPython [1] or PythonBytes you have hear Michael Kennedy talk about Named Tuples many times, but what are they and how do they fit into my data science workflow.
Example # [2]
As you graduate your scripts into modules and libraries you might start to notice that you need to pass a lot of data around to all of the functions that you have created. For example if you are running some analysis utilizing sales, inventory, and pricing data. You may need to calculate total revenue, inventory on hand. You may need to pass these data sets into various models to drive production or pricing based on predicted volumes.
Load data # [3]
Here we setup functions that can load data from the sales database. Assume that we also have similar functions to get_inventory and get_pricing.
def get_engine():
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
def get_sales():
'''
gets sales history from the sales database
'''
engine = ge...
Background Tasks in Python for Data Science
This post is intended as an extension/update from background tasks in
python [1]. I started using background
the week that Kenneth Reitz released it. It takes away so much boilerplate
from running background tasks that I use it in more places than I probably
should. After taking a look at that post today, I wanted to put a better data
science example in here to help folks get started.
This post is intended as an extension/update from background tasks in python [1]. I started using background the week that Kenneth Reitz released it. It takes away so much boilerplate from running background tasks that I use it in more places than I probably should. After taking a look at that post today, I wanted to put a better data science example in here to help folks get started.
I use it in more places than I probably should
Before we get into it, I want to make a shout out to Kenneth Reitz for making this so easy. Kenneth is a python God for all that he has given to the community in so many w...
If you’re into interesting projects, don’t miss out on starship [1], created by starship [2].
☄🌌️ The minimal, blazing-fast, and infinitely customizable prompt for any shell!
References:
[1]: https://github.com/starship/starship
[2]: https://github.com/starship
alttch [1] has done a fantastic job with rapidtables [2]. Highly recommend taking a look.
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
References:
[1]: https://github.com/alttch
[2]: https://github.com/alttch/rapidtables
📝 Bash Notes
Bash is super powerful.
File System Full # [1]
Show Remaining Space on Drives
df -h
show largest files in current directory
du . -h --max-depth=1
Move files then symlink them
mkdir /mnt/mounted_drive
mv ~/bigdir /mnt/mounted_drive
ln -s /mnt/mounted_drive/bigdir ~/bigdir
Fuzzy One Liners # [2]
a() {source activate "$(conda info --envs | fzf | awk '{print $
edit in vim
vf() { fzf | xargs -r -I % $EDITOR % ;}
cat a file
vf() { fzf | xargs -r -I % $EDITOR % ;}
bash execute
bf() { bash "$(fzf)" }
git [3] add
gadd() { git status -s | fzf -m | awk '{print $2}' | xargs git add && git status -s}
git reset
greset() { git status -s | fzf -m | awk '{print $2}' |xargs git reset && git status -s}
Kill a process
fkill() {kill $(ps aux | fzf | awk '{print($2)}')}
Finding things # [4]
Files # [5]
fd-find [6] is amazing for finding files, it even respects your .gitignore file 😲. Install with apt install fd-find.
fd md
ag -g python
find . -n "*.md"
++Vanilla Bonus
Content # [7]
** sh...
Autoreload in Ipython
I have used %autoreload for several years now with great success and 🔥 rapid reloads. It allows me to move super fast when developing libraries and modules. They have made some great updates this year that allows class modules to be automatically be updated.
What I like about autoreload # [1]
🔥 Blazing Fast
💥 Keeps me in the comfort of my text editor
👏 Allows me to use Jupyter when I need
👟 Extremely Reliable
One of the biggest benefits that I find is that it shortens the distance between my module/library code and test code inside of a terminal/notebook. Now I primarily use jupyter notebooks for the presentation aspect. I develop code from the comfort of my editor with all of the tools I have setup, and run the functions in a notebook to get the output. From there I might do some aggregations or plots, but the 🥩 meat of development is done outside of jupyter.
Now I primarily use jupyter notebooks for the presentation aspect.
Enabling Autoreload # [2]
📐 config
This is a sh...
If you’re into interesting projects, don’t miss out on psutil [1], created by giampaolo [2].
Cross-platform lib for process and system monitoring in Python
References:
[1]: https://github.com/giampaolo/psutil
[2]: https://github.com/giampaolo