Posts tagged: python

Ipython-Config

I use my ipython terminal daily. It’s my go to way of running python most of the time. After you use it for a little bit you will probably want to setup a bit of your own configuration.

Activate your virtual environment of choice and pip install it. Any time you are running your project in a virtual environment, you will need to install ipython inside it to access those packages from ipython.

pip install ipython

You are using a virtual environment right? Virtual environments like venv or conda can save you a ton of pain down the road.

...

Custom Ipython Prompt

I’ve grown tired of the standard ipython prompt as it doesn’t do much to give me any useful information. The default one gives out a line number that only seems to add anxiety as I am working on a simple problem and see that number grow to several hundred. I start to question my ability 🤦‍♂️.

If you already have an ipython config you can move on otherwise check out this post on creating an ipython config.

Ipython-Config

...

Ipython Ninjitsu

Stop going to google everytime your stuck and stay in your workflow. The ipython ? is a superhero for productivity and staying on task.

from kedro.pipeline import Pipeline Pipeline? Init signature: Pipeline( nodes: Iterable[Union[kedro.pipeline.node.Node, ForwardRef('Pipeline')]], *, tags: Union[str, Iterable[str]] = None, ) Docstring: A ``Pipeline`` defined as a collection of ``Node`` objects. This class treats nodes as part of a graph representation and provides inputs, outputs and execution order. Init docstring: Initialise ``Pipeline`` with a list of ``Node`` instances. Args: nodes: The iterable of nodes the ``Pipeline`` will be made of. If you provide pipelines among the list of nodes, those pipelines will be expanded and all their nodes will become part of this new pipeline. tags: Optional set of tags to be applied to all the pipeline nodes. Raises: ValueError: When an empty list of nodes is provided, or when not all nodes have unique names. CircularDependencyError: When visiting all the nodes is not possible due to the existence of a circular dependency. :

Note This does jump you into a pager, a j,k or up, down to navigate, q to quit.

Docstring not enough for you use case. I often run into cases where the docstring is not clear enough and I need to see the implementation for myself to see what a function does.

...

Automating my Post Starter

One thing we all dread is mundane work of getting started, and all the hoops it takes to get going. This year I want to post more often and I am taking some steps towards making it easier for myself to just get started.

When I start a new post I need to cd into my blog directory, start neovim in a markdown file with a clever name, copy some frontmatter boilerplate, update the post date, add tags, a description, and a cover.

hot and fast

...

Windowing Python Lists

In python data science we often will reach for pandas a bit more than necessary. While pandas can save us so much there are times where there are alternatives that are much simpler. The itertoolsandmore-itertools` are full of cases of this.

This post is a walkthrough of me solving a problem with more-itertools rather than reaching for a for loop, or pandas.

I am working on a one-line-link expander for my blog. I ended up doing it, just by modifying the markdown with python. I first split the post into lines with content.split('\n'), then look to see if the line appears to be just a link. One more safety net that I wanted to add was to check if there was whitespace around the line, this could not simply be done in a list comprehension by itself. I need just a bit of knowledge of the surrounding lines, enter more-itertools.

...

Testing Data Pipelines

Lint/Format/Doc ¶ black flake8 interrogate mypy Pipeline Assertions ¶ pipeline constructs pipeline as expected nodes pipeline has minimum nodes test minimum tags test alternate tags Catalog Assertions ¶ test catalog follows naming structure Node Tests ¶ test function does the correct operations on test data Great Expectations ¶

reasons-to-kedro

There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benefits for teams to collaborate, develop, and deploy data pipelines

What is Kedro

Kedro makes it super easy to get started with their cli that utilizes cookiecutter under the hood.

...

What's New in Kedro 0.16.6

Kedro 0.16.6 is out! Let’s take a look through the release notes

This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible.

Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies.

...

Designing a "Router" for kedro

I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called nodes_global. It worked fairly well but did have some issues around being set as a global variable.

But…

One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don’t need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don’t necessarily end up using it.

...

Reclaim memory usage in Jupyter

Today I ran into an issue where we had a one-off script that just needed to work, but it was just chewing threw memory like nothing.

It started with a colleague asking me How do I clear the memory in a Jupyter notebook, these are the steps we took to debug the issue and free up some memory in their notebook.

How do I clear the memory in a Jupyter notebook?

...

Strip Trailing Whitespace from Git projects

A common linting error thrown by various linters is for trailing whitespace. I most often use flake8. I generally have [pre-commit](https://waylonwalker.com/pre-commit-is-awesome hooks setup to strip this, but sometimes I run into situations where I jump into a project without it, and my editor lights up with errors. A simple fix is to run this one-liner.

bash

git grep -I --name-only -z -e '' | xargs -0 sed -i -e 's/[ \t]\+$\r\?$$/\1/'

...

Three things to Automate with Python using Pandas

Here are three things that I see my non programming counterparts doing every single day. These really sum up so much of what folks do within an office. So many of us dabble in or become power users of spreadsheets without knowing there is an alternative out there that can save us time, automate boring things, and allow us to open up our minds for the part that we add value, Thinking about the data.

Lets face it, stitching together spreadsheets is zero value add by itself, but if you can see something in the data and take action on it, this can be huge value add to your company. Learning just a bit of python will help focus more of your attention on “value add operations” and leave the mundane stuff to your computer.

I see this one all the time. One team gets a spreadsheet from another team once per month and they need to stich all the pieces together. Excel really opens the door for some nasty hidden bugs in your manually stiched together data. It also takes time out of your day that you dont need to spend.

...

How to Install miniconda on linux (from the command line only)

miniconda is a python distribution from continuum. It’s a slimmed-down version of their very popular anaconda distribution. It comes with its own environment manager and has eased the install process for many that do not have a way to compile c-extensions. It made it much easier to install the data science stack on windows a few years ago. These days windows are much better than it was back then at compiling c-extensions. I still like its environment manager, which installs to a global directory rather than a local directory for your project.

Installing miniconda on Linux can be a bit tricky the first time you do it completely from the terminal. The following snippet will create a directory to install miniconda into, download the latest python 3 based install script for Linux 64 bit, run the install script, delete the install script, then add a conda initialize to your bash or zsh shell. After doing this you can restart your shell and conda will be ready to go.

mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh ~/miniconda3/bin/conda init bash ~/miniconda3/bin/conda init zsh

Options ¶ #

The miniconda.sh script comes with a few basic options. Most notably we used -b to be able...

...

What's New in Kedro 0.16.4

If we take a look at the release notes I see one major feature improvement on the list, auto-discovery of hooks.

## Major features and improvements * Enabled auto-discovery of hooks implementations coming from installed plugins.

This one comes a bit surprising as it was just casually mentioned in #435

...

Integration testing with Python, TestProject.io, and GitHub Actions

As I continue to build out waylonwalker.com I sometimes run into some errors that are not caught because I do not have good testing implemented. I want to explore some integration testing options using GitHub’s actions.

Running integration tests will not prevent bugs from happening completely, but it will allow me to quickly spot them and rollback.

The very first thing that comes to my mind is anything that is loaded or ran client-side. Two things quickly came to mind here. I run gatsby so most of my content is statically rendered, and it yells at me if something isn’t as expected. For performance reasons I lazy load cards on my blogroll, loading all of the header images gets heavy and kills lighthouse (if anyone actually cares). I am also loading some information from the top open-source libraries that I have created. To prevent the need to rebuild the whole site to get the latest information I am just using the GitHub API client-side.

...

🐍 Practice Python Online

When learning a new skill it’s important to practice along the way. In order for me to show up to practice I need to make it easy to show up. An easy way to show up to practice with python is to use an online repl. With these you can try out something quick. Sometimes I see snippets from blogs or tweets and I need to try the out for myself to really understand.

When learning a new skill it’s important to practice along the way. In order for me to show up to practice I need to make it easy to show up. An easy way to show up to practice with python is to use an online repl. With these, you can try out something quick. Sometimes I see snippets from blogs or tweets and I need to try them out for myself to really understand.

Here are three different options that I have used in the past to try out something at some various levels. I am sure there are plenty more, but these are three that I have tried. I am not covering all of them, because It’s been a while since I have used one other than replit.com

...

Kedro Catalog

I am exploring a kedro catalog meta data hook, these are some notes about what I am thinking.

try pandas method -> try spark -> try dict/list -> none

Is there an easy way to create a nosql database in memory from a a list of dictionaries?

How python tools configure

Mypy’s config parser seems to be one of the most complex. This is likely in part to it having the largest backwards compatability of all projects that I looked at.

mypy/config_parser

options/config.py

...

🐍 Parsing RSS feeds with Python

I am looking into a way to replace my google reader experience that I had back in 2013 before google took it from us. I am starting by learning how to parse feeds with python, and without much previous knowledge, it proved to be much easier than anticipated thanks to the feedparser library.

This is how I used python to parse rss and setup my own custom feed.

Install the feedparser library.

...

SLIDES - understanding python \args and \\*kwargs

Python *args and **kwargs are super useful tools, that when used properly can make you code much simpler and easier to maintain. Large manual conversions from a dataset to function arguments can be packed and unpacked into lists or dictionaries. Beware though, this power can lead to some really unreadable/unusable code if done wrong.

I generally post these as a carousel on LinkedIn based on a full article. Let mw know what you think of it shown inside of a blog @_waylonwalker.

...

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight