Posts tagged: python

All posts with the tag "python"

What is if __name__ == "__main___", and how do I use it.

When a python module is called it is assigned the __name__ of __main__ otherwise if it’s imported it will be assigned the __name__ of the module.

Let’s create a module to play with __name__ a bit. We will call this module nodes.py. It is a module that we may want to run by it’self or import and use in other modules.

#!python # nodes.py if __name__ == "nodes": import sys import __main__ print(f"you have imported me {__name__} from {sys.modules['__main__'].__file__}") if __name__ == "__main__": print("you are running me as main")

I have set this module up to execute one of two if statements based on whether the module it’self is being ran or if the module is being imported.

...

3 min read

Zev Averbach Interview

Zev Averbach, Frustrated spreadsheet jockey to software developer at 36

Q: Tell me about your journey as a spreadsheet jockey into Data Engineering?

A: First of all, it’s hilarious that I accidentally found your questions for this interview by Googling myself. 😊

...

Pytest capsys

Testing print/log statements in pytest can be a bit tricky, capsys makes it super easy, but I often struggle to find it.

capsys is a builtin pytest fixture that can be passed into any test to capture stdin/stdout. For a more comprehensive description check out the docs on capsys

Simply create a test function that accepts capsys as an argument and pytest will give you a capsys opject.

1 min read

Building Rich a Dev Server

Draft Post

I’ve really been digging @willmcgugan’s rich library for creating TUI like interfaces in python. I’ve only recently started to take full advantage of it.

I am working on a project in which I want to have a dev server running continuously in the background. I really like dev servers theat automatically chooose an unused port and list out the running pid so that I can kill it if I need to.

...

fix crlf for entire git repo

Final Result ¶ # git checkout main git reset --hard git rm -rf --cached . echo "* text=auto" > .gitattributes git add .
1 min read

Automatic Conda Environments

I have automated my process to create virtual environments in my python projects, here is how I did it.

I’ve really been digging my new tmux session management setup. Now I have leveled it up by adding direnv to my workflow. It will execute a shell script whenever I cd into a directory. One thing I wanted to add to this was, automatic activation of python environments whenever I cd into a directory, or create a new environment if one does not exist.

https://waylonwalker.com/tmux-nav-2021/

...

3 min read

How I Review Pipeline Code

I have started doing more regular PR’s on my teams Kedro pipelines. I generally take a two phase approach to the review in order to give the reviewee both quick and detailed feedback.

What is Kedro

Phase1 is typically a quick scan over the PR right within the PR window in my browser.

...

2 min read

Kedro pipeline_registry.py

With the realease of kedro==0.17.2 came a new module in the project template pipeline_registry.py. Here are some notes that I learned while playing with this new module.

You should now have something that looks like this in your src/<package-name>/pipeline_registry.py.

"""Project pipelines.""" from typing import Dict from kedro.pipeline import Pipeline def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a ``Pipeline`` object. """ return {"__default__": Pipeline([])}

pipeline_registry only works in kedro>=0.17.2

...

🐍 Pluggable Architecture with Python

pytest has open sourced their amazing plugin framework pluggy, it allows library authors to give their users a way to modify the libaries behavior without needing to submit a change that may not make sense to the entire library.

My experience so far as a plugin user, and plugin author has been great. Building and using plugins are incredibly intuitive. I wanted to dive a bit deeper and see how they are implemented inside of a library and its a bit of a mind bend the first time you try to do it.

A hook is a single function that has a specific place that it is ran by the PluginManager.

...

4 min read

⚙ How Python Tools Are Configured

There are various ways to configure python tools, config files, code, or environment variables. Let’s look at a few projects that allow users to configure them through the use of config files and how they do it.

This will not include how they are implemented, I’ve looked at a few and its not simple. This will focus on where config is placed and the order in which duplicates are resolved.

The motivation of this article is to serve as a bit of a reference guide for those who may want to create their own package that needs configuration.

...

5 min read

Minimal Kedro Pipeline

How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It’s only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py.

📝 Note this is only a composable pipeline, not a full project, it does not contain a catalog or runner.

I have everything for this post hosted in this gihub repo, you can fork it, clone it, or just follow along.

...

Markdown Cli

This is a post that may be a work in progress for awhile, Its a collections of thoughts on managing my blog, but could be translated into anythiung that is just a collection of markdown.

Kedro - My Data Is Not A Table

In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask.

These containers for data contain many convenient methods to manipulate table like data structures. Sometimes we leverage other data types, namely vanilla types like lists and dicts, or even numpy data types.

What is Kedro

...

Quickly Change Conda Env With Fzf

Changing conda environments is a bit verbose, I use a function with fzf that both lists environments and selects the one I want in one go.

I have used conda as a virtual environment tool for years now. I started using conda for its simplicity to install packages on windows, but now that has gotten so much better and it’s been years since I have run a conda install command. I’m sure that I could use a different environment manager, but it works for me and makes sense.

What environment manager do you use for python?

...

3 min read

Minimal Python Package

What does it take to create an installable python package that can be hosted on pypi?

This post is somewhat inspired by the bottle framework, which is famously created as a single python module. Yes, a whole web framework is written in one file.

. ├── setup.py └── my_pipeline.py

setup.py #

from setuptools import setup setup( name="", version="0.1.0", py_modules=["my_pipeline", ], install_requires=["kedro"], ) 

name #

The name of the package can contain any letters, numbers, “_”, or “-”. Even if it’s for internal/personal consumption only I usually check for discrepancy with pypi so that you don’t run into conflicts.

...

2 min read