Posts tagged: python

What's New in Kedro 0.16.4

If we take a look at the release notes [1] I see one major feature improvement on the list, auto-discovery of hooks. ## Major features and improvements * Enabled auto-discovery of hooks implementations coming from installed plugins. This one comes a bit surprising as it was just casually mentioned in #435 [2] [2] Think pytest # [3] As mentioned in #435 [2] this is the model that pytest uses. Not all plugins automatically start doing things right out of the box but require a CLI argument. simplicity # [4] It feels a bit crazy that simply installing a package will change the way that your pipeline gets executed. I do like that it requires just a bit less reaching into the framework stuff for the average user. Most folks will be able to write in the catalog and nodes without much change to the rest of the project. Implementation # [5] Reading through the docs [6], they show us that we can make our hooks automatically register by adding a kedro.hooks endpoint that points to a ...

Integration testing with Python, TestProject.io, and GitHub Actions

Caution None of the testproject.io urls resolve anymore in JAN 2025, I removed all of the broken links. As I continue to build out waylonwalker.com [1] I sometimes run into some errors that are not caught because I do not have good testing implemented. I want to explore some integration testing options using GitHub’s actions. Running integration tests will not prevent bugs from happening completely, but it will allow me to quickly spot them and rollback. --- 🤔 What to test first? # [2] The very first thing that comes to my mind is anything that is loaded or ran client-side. Two things quickly came to mind here. I run gatsby so most of my content is statically rendered, and it yells at me if something isn’t as expected. For performance reasons I lazy load cards on my blogroll, loading all of the header images gets heavy and kills lighthouse (if anyone actually cares). I am also loading some information from the top open-source libraries that I have created. To prevent the need...

🐍 Practice Python Online

When learning a new skill it’s important to practice along the way. In order for me to show up to practice I need to make it easy to show up. An easy way to show up to practice with python is to use an online repl. With these you can try out something quick. Sometimes I see snippets from blogs or tweets and I need to try the out for myself to really understand. When learning a new skill it’s important to practice along the way. In order for me to show up to practice I need to make it easy to show up. An easy way to show up to practice with python is to use an online repl. With these, you can try out something quick. Sometimes I see snippets from blogs or tweets and I need to try them out for myself to really understand. Three online REPLS # [1] Here are three different options that I have used in the past to try out something at some various levels. I am sure there are plenty more, but these are three that I have tried. I am not covering all of them, because It’s been a while sin...

Kedro Catalog

I am exploring a kedro catalog meta data hook, these are some notes about what I am thinking. Process # [1] - metadata will be attached to the dataset object under a .metadata attribute - metadata will be updated after_node_run - metadata will be empty until a pipeline is ran with the hook on - optionally a function to add metadata will be added - metadata will be stored in a file next to the filepath - meta Problems This Hook Should solve # [2] - what datasets have a columns with sales in the name - what datasets were updated after last tuesday - which pipeline node created this dataset - how many rows are in this dataset (without reloading all datasets) implementation details # [3] - metadata will be attached to each dataset as a dictionary - list/dict comprehensions can be used to make queries Metadata to Capture # [4] try pandas method -> try spark -> try dict/list -> none - column names - length - Null count - created_by node name Database? # [5] Is there...

How python tools configure

mypy # [1] Mypy’s config parser seems to be one of the most complex. This is likely in part to it having the largest backwards compatability of all projects that I looked at. mypy/config_parser [2] flake8 # [3] options/config.py [4] black # [5] black [6] portray # [7] - only uses pyproject.toml portray/config.py [8] interrogate # [9] - only uses pyproject.toml References: [1]: #mypy [2]: https://github.com/python/mypy/blob/master/mypy/config_parser.py [3]: #flake8 [4]: https://github.com/PyCQA/flake8/blob/master/src/flake8/options/config.py [5]: #black [6]: https://github.com/psf/black/blob/master/src/black/__init__.py#L277-L331 [7]: #portray [8]: https://github.com/timothycrosley/portray/blob/main/portray/config.py [9]: #interrogate

🐍 Parsing RSS feeds with Python

I am looking into a way to replace my google reader experience that I had back in 2013 before google took it from us. I am starting by learning how to parse feeds with python, and without much previous knowledge, it proved to be much easier than anticipated thanks to the feedparser library. This is how I used python to parse rss and setup my own custom feed. Install # [1] Install the feedparser library. conda create -n reader python=3.8 -y source activate reader pip install feedparser Get the content # [2] import feedparser feed = feedparser.parse('https://waylonwalker.com/rss.xml') The feed object # [3] The feed is a feedparser.FeedParserDict. For all intents and purposes this seems to just behave like a dict with the following keys(). feed.keys() ['feed', 'entries', 'bozo', 'headers', 'etag', 'href', 'status', 'encoding', 'version', 'namespaces', 'content']) feed has some general information about the rss feed, but the meat of the feed is in entries. The rest of the keys we...

SLIDES - understanding python \args and \\*kwargs

Python *args and **kwargs are super useful tools, that when used properly can make you code much simpler and easier to maintain. Large manual conversions from a dataset to function arguments can be packed and unpacked into lists or dictionaries. Beware though, this power can lead to some really unreadable/unusable code if done wrong. I generally post these as a carousel on LinkedIn based on a full article. Let mw know what you think of it shown inside of a blog @_waylonwalker [1]. [2] See the full article here [2] Slides # [3] --- [4] --- [5] --- [6] --- [7] --- [8] --- [9] --- [10] --- [11] --- [12] --- [13] References: [1]: https://twitter.com/_WaylonWalker [2]: https://waylonwalker.com/python-args-kwargs [3]: #slides [4]: https://images.waylonwalker.com/args-kwargs-slide-1.png [5]: https://images.waylonwalker.com/args-kwargs-slide-2.png [6]: https://images.waylonwalker.com/args-kwargs-slide-3.png [7]: https://images.waylonwalker.com/args-kwargs...

Gracefully adopt kedro, the catalog

Why use kedro catalog? # [1] While using the catalog alone will not reap all of the benefits of the framework, it does get you and your project ready for the full framework eventually. For me the full benefit of the catalog comes when you combine it with the pipeline and dont even touch read/write steps at all. Taking a step into kedro by adopting the catalog first will give you a way to organize all of your data loads in one place, and stop manually writing read/write code, which can be different for each data and storage type. You just don’t need to think about it. --- - iperitive loading style - organizes your data - all file locations can be quickly identified - can be dropped into kedro later --- “can be dropped into kedro later” Let’s talk a bit more about that 2 Ways to Gracefully adopt the catalog # [2] How do I get started with the kedro catalog - add with the code api - load from yaml (recommended) 1. Adding to the catalog with the code api # [3] how to use ...

How to find things in your kedro catalog

kedro 0.16.2 just dropped last week with a long-awaited feature… catalog search! I went as far as monkey patching this into each of my projects. I work jump between a few really big projects that have tons of datasets. Being able to quickly search for what I need is so useful. The Catalog # [1] The kedro data catalog is a key component to the kedro framework. It handles all data loading and saving for you. It is configurable and hackable. Having all your data connections listed in one place make it so easy to pick your project up and move it to a completely new environment. That sweet imperative loading style saves so much read/write overhead. I can load all my data with a single command whether it’s in amazon s3, google cloud platform, or a local file. Kick start a toy project # [2] Just like with most of these articles, I am going to create a conda environment so that I don’t break any existing projects and scaffold up a toy project to learn from. conda create -n kedro0162 py...

How Kedro handles your inputs

Passing inputs into kedro is a key concept. Understanding how it accepts a single catalog key as input is quite trivial that easily makes sense, but passing a list or dictionary of catalog entries can be a bit confusing. *args/**args review # [1] Check out this post for a review of how *args **kwargs work in python. understanding python *args and **kwargs [2] python args and kwargs [3] article by @_waylonwalker [4] All Kedro inputs are catalog Entries # [5] When kedro runs your pipeline it uses the catalog to imperatively load your data, meaning that you don’t tell kedro how to load your data, you tell it where your data is and what type it is. These catalog entries are like a key-value store. You just need to give the key when setting up a node. Single Inputs # [6] These are fairly straightforward to understand. In the example below when kedro runs the pipeline it will load the input from the catalog, then pass that input to the func, then save the returned value to the out...

017

**

018

016

015

**

understanding python \args and \\*kwargs

Python *args and **kwargs are super useful tools, that when used properly can make you code much simpler and easier to maintain. Large manual conversions from a dataset to function arguments can be packed and unpacked into lists or dictionaries. Beware though, this power can lead to some really unreadable/unusable code if done wrong. /* h2 {display: block;} */ h2>img { margin: auto; width: 100%;} Python *args and **kwargs are super useful tools, that when used properly can make you code much simpler and easier to maintain. Large manual conversions from a dataset to function arguments can be packed and unpacked into lists or dictionaries. Beware though, this power can lead to some really unreadable/unusable code if done wrong. *args are for lists # [1] *args are some magical syntax that will collect function arguments into a list, or unpack a list into individual arguments. recieving *args # [2] When recieving variables as a *<varname>, commonly *args, the arguments get packed ...

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight