Posts tagged: python

All posts with the tag "python"

268 posts latest post 2026-03-31
Publishing rhythm
Jan 2026 | 3 posts

Using Kedro In Scripts

With the latest releases of kedro 0.17.x, it is now possible to run kedro pipelines from within scripts. While I would not start a project with this technique, it will be a good tool to keep in my back pocket when I want to sprinkle in a bit of kedro goodness in existing projects.

What is Kedro

If your just learning about kedro check out this post walking through it

...

Silence Kedro Logs

Kedro can have a chatty logger. While this is super nice in production so see everything that happened during a pipeline run. This can be troublesome while trying to implement a cli extension with clean output.

First, how does one silence a python log? Python loggers can be retrieved by the logging module’s getLogger function. Then their log level can be changed. Much of kedro’s chattiness comes from INFO level logs. I don’t want to hear about anything for my current use case unless it’s essential, i.e., a failure. In this case, I set the log levels to ERROR as most errors should stop execution anyways.

Getting a python logger is straightforward if we know the name of the logger. The following block will grab the logger object for the logger currently registered under the name passed in.

...

Python Diskcahe is locked

Running multiple processes using the same diskcache object can cause issues with locks. As I was trying to setup a rich Live display for markata I ran into issues where each part could not nun simultaneusly. As I had followed the instructions from discache it was not directly aparant to me, so I had to make a simple example to experiment and play with at a small scale.

Minimum reporducible error is one of my superpowers in development. I do this very often to sus out what is really happening. My day to day work is processing data with python, I keep a number of very small data sets handy to break and fix. This helps separate complexities of the project and the problem.

Markata has a lot going on. It’s a plugins all the way down static site generator built in python. Trying to find the root cause through the layers of plugin and cli modules can be a pain, but in this case building a very simple minimum reporducible error was much easier.

...

3 min read

Vim Fugitive

:G :G status :G commit :G add % :Gdiff :G push :Glog

Add current file and commit with diff in a split #

function! s:GitAdd() exe "G add %" exe "G diff --staged" exe "only" exe "G commit" endfunction :command! GitAdd :call s:GitAdd() nnoremap gic :GitAdd<CR> 

:on[ly] #

C-W o

:on[ly] will make the current buffer the only one on the screen. This is super helpful as many of fugitive commands will open in a split by default.

cycle through the jumplist

...

What is if __name__ == "__main___", and how do I use it.

When a python module is called it is assigned the __name__ of __main__ otherwise if it’s imported it will be assigned the __name__ of the module.

Let’s create a module to play with __name__ a bit. We will call this module nodes.py. It is a module that we may want to run by it’self or import and use in other modules.

#!python # nodes.py if __name__ == "nodes": import sys import __main__ print(f"you have imported me {__name__} from {sys.modules['__main__'].__file__}") if __name__ == "__main__": print("you are running me as main")

I have set this module up to execute one of two if statements based on whether the module it’self is being ran or if the module is being imported.

...

3 min read

Zev Averbach Interview

Zev Averbach, Frustrated spreadsheet jockey to software developer at 36

Q: Tell me about your journey as a spreadsheet jockey into Data Engineering?

A: First of all, it’s hilarious that I accidentally found your questions for this interview by Googling myself. 😊

...

Pytest capsys

Testing print/log statements in pytest can be a bit tricky, capsys makes it super easy, but I often struggle to find it.

capsys is a builtin pytest fixture that can be passed into any test to capture stdin/stdout. For a more comprehensive description check out the docs on capsys

Simply create a test function that accepts capsys as an argument and pytest will give you a capsys opject.

1 min read

Building Rich a Dev Server

Draft Post

I’ve really been digging @willmcgugan’s rich library for creating TUI like interfaces in python. I’ve only recently started to take full advantage of it.

I am working on a project in which I want to have a dev server running continuously in the background. I really like dev servers theat automatically chooose an unused port and list out the running pid so that I can kill it if I need to.

...

fix crlf for entire git repo

Final Result # git checkout main git reset --hard git rm -rf --cached . echo &#34;* text=auto&#34; > .gitattributes git add .
1 min read

Automatic Conda Environments

I have automated my process to create virtual environments in my python projects, here is how I did it.

I’ve really been digging my new tmux session management setup. Now I have leveled it up by adding direnv to my workflow. It will execute a shell script whenever I cd into a directory. One thing I wanted to add to this was, automatic activation of python environments whenever I cd into a directory, or create a new environment if one does not exist.

https://waylonwalker.com/tmux-nav-2021/

...

3 min read

How I Review Pipeline Code

I have started doing more regular PR’s on my teams Kedro pipelines. I generally take a two phase approach to the review in order to give the reviewee both quick and detailed feedback.

What is Kedro

Phase1 is typically a quick scan over the PR right within the PR window in my browser.

...

2 min read

Kedro pipeline_registry.py

With the realease of kedro==0.17.2 came a new module in the project template pipeline_registry.py. Here are some notes that I learned while playing with this new module.

You should now have something that looks like this in your src/<package-name>/pipeline_registry.py.

"""Project pipelines.""" from typing import Dict from kedro.pipeline import Pipeline def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a ``Pipeline`` object. """ return {"__default__": Pipeline([])}

pipeline_registry only works in kedro>=0.17.2

...

🐍 Pluggable Architecture with Python

pytest has open sourced their amazing plugin framework pluggy, it allows library authors to give their users a way to modify the libaries behavior without needing to submit a change that may not make sense to the entire library.

My experience so far as a plugin user, and plugin author has been great. Building and using plugins are incredibly intuitive. I wanted to dive a bit deeper and see how they are implemented inside of a library and its a bit of a mind bend the first time you try to do it.

A hook is a single function that has a specific place that it is ran by the PluginManager.

...

4 min read

⚙ How Python Tools Are Configured

There are various ways to configure python tools, config files, code, or environment variables. Let’s look at a few projects that allow users to configure them through the use of config files and how they do it.

This will not include how they are implemented, I’ve looked at a few and its not simple. This will focus on where config is placed and the order in which duplicates are resolved.

The motivation of this article is to serve as a bit of a reference guide for those who may want to create their own package that needs configuration.

...

5 min read

Minimal Kedro Pipeline

How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It’s only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py.

📝 Note this is only a composable pipeline, not a full project, it does not contain a catalog or runner.

I have everything for this post hosted in this gihub repo, you can fork it, clone it, or just follow along.

...