Posts tagged: python

Writing your first kedro Nodes

https://youtu.be/-gEwU-MrPuA Before we jump in with anything crazy, let’s make some nodes with some vanilla data structures. import node # [1] You will need to import node from kedro.pipeline to start creating nodes. from kedro.pipeline import node func # [2] The func is a callable that will take the inputs and create the outputs. inputs / outputs # [3] Inputs and outputs can be None, a single catalog entry as a string, mutiple catalog entries as a List of strings, or a dictionary of strings where the key is the keyword argument of the func and the value is the catalog entry to use for that keyword. our first node # [4] Sometimes in our pipelines our data is coming from an api where we already have python functions built to pull with. Thats ok, kedro supposrts that with inputs=None. def create_range(): return range(100) make_range = node( func=create_range, inputs=None, outputs='range' ) second node # [5] Now we have some data to work from, lets use that as our inpu...

Running your Kedro Pipeline from the command line

Running your kedro pipeline from the command line could not be any easier to get started. This is a concept that you may or may not do often depending on your workflow, but its good to have under your belt. I personally do this half the time and run from ipython half the time. In production, I mostly use docker and that is all done with this cli. https://youtu.be/ZmccpLy-OEI What is Kedro [1] 👆 Unsure what kedro is? Check out this post. Kedro run # [2] To run the whole darn project all we need to do is fire up a terminal, activate our environment, and tell kedro to run. kedro run Specific Pipelines # [3] Running a sub pipeline that we have created is as easy as telling kedro which one we want to run. kedro run --pipeline dp Single Nodes # [4] While developing a node or a small list of nodes in a larger pipeline its handy to be able to run them one at a time. Besides the use case of developing a single node I would not reccomend leaning very heavy on running single nodes, le...

kedro Virtual Environment

Avoid serious version conflict issues, and use a virtual environment [1] anytime you are running python, here are three ways you can setup a kedro virtual environment. https://youtu.be/ZSxc5VVCBhM - conda - venv - pipenv conda # [2] I prefer to use conda as my virtual environment manager of choice as it give me both the interpreter and the packages I install. I don’t have to rely on the system version of python or another tool to maintain python versions at all, I get everything in one tool. conda create -n my-project python=3.8 -y conda activate my-project python -m pip install --upgrade pip pip install -e src conda info --envs - stores environment in a root directory i.e. ~/miniconda3 - conda can use its own way to manage environments environment.yml - the python interpreter is packaged with the environment virtualenv # [3] Virtual env (venv) is another very respectable option that is built right into python, and requires no additional installs or using a different dis...

Kedro Pipeline Create

Kedro pipeline create is a command that makes creating new pipelines much easier. There is much less boilerplate that you need to write yourself. https://youtu.be/HtyIKqlEoNw creating a new pipeline # [1] The kedro cli comes with the following command to scaffold out new pipelines. Note that it will not add it to your pipeline_registry, to be covered later, you will need to add it yourself. kedro pipeline create example results # [2] The directory structure that it creates looks like this. tree src/kedro_conda/pipelines src/kedro_conda/pipelines ├── __init__.py └── example ├── __init__.py ├── nodes.py ├── pipeline.py └── README.md References: [1]: #creating-a-new-pipeline [2]: #results

Kedro Install

Kedro comes with an install command to install and manage all of your projects dependencies. https://youtu.be/IWimEs-hHQg cd into your project directory and activate env # [1] You must start by having your kedro project either cloned down from an existing project or created from kedro new. Then activate your environment. Kedro New [2] this post covers kedro new kedro Virtual Environment [3] This post covers creating your virtual environment [4] for kedro install kedro # [5] Make sure you have kedro installed in your current environment, if you dont already have it. pip install kedro==0.17.4 pip-tools # [6] Kedro uses the pip-tools package under the hood to pin dependencies in a very robust way to ensure that the project will continue to work on everyone’s machine day, including production, day in and day out. No matter what happens to the dependencies you have installed. pip-compile # [7] The command that kedro uses from pip-tools is pip-compile. It will look at what yo...

Kedro Git Init

Immediately after kedro new, before you start running kedro install or your first line of code the first thing you should always do after getting a new kedro template created is to git init. https://youtu.be/IGba3ytf_6U git init # [2] Its as simple as these three commands to get started. git init git add . git commit -m init I don’t care if this project is for learning, if it will never have a remote or not, use git. References: [1]: /glossary/git/ [2]: #git-init

Kedro New

https://youtu.be/uqiv5LAiJe0 Kedro new is simply a wrapper around the cookiecutter templating library. The kedro team maintains a ready made template that has everything you need for a kedro project. They also maintain a few kedro starters, which are very similar to the base template. What is Kedro [1] Unsure what kedro is, Check out yesterdays post on What is Kedro. pipx # [2] I reccomend using pipx when running kedro new. pipx is designed for system level cli tools so that you do not need to maintain a virtual environment [3] or worry about version conflicts, pipx manages the environment for you. The kedro team does not reccomend pipx in their docs as they already feel like there is a bit of a tool overload for folks that may be less familiar with pipx kedro new I like using pipx as it gives you better control over using a specific version or always the latest version, unlike when you run what you have on your system depends on when you last installed or upgraded. Kedro Ne...

What is Kedro

Kedro is an unopinionated Data Engineering framework that comes with a somewhat opinionated template. It gives the user a way to build pipelines that automatically take care of io through the use of abstract DataSets that the user specifies through Catalog entries. These Catalog entries are loaded, ran through a function, and saved by Nodes. The order that these Nodes are executed are determined by the Pipeline, which is a DAG. It’s the runner’s job to manage the execution of the Nodes. https://youtu.be/Wf4rnFsaFFU --- What is Kedro [1] This is an updated version of my original what-is-kedro article --- Hot Take # [2] If you are doing a series of operations to data with python, especially if you are using something as supported as pandas, you should be using a framework that gives you a pipeline as a DAG and abstracts io. Orchestrators # [3] Like I said, kedro is unopinionated it does determine where or how your data should be ran. The kedro team does support the following ...

How I Kedro

https://youtu.be/bw5_FWDVRpU Ubuntu # [1] I recently switched over to using Ubuntu, it works well pretty much out of the box for me. I am using gnome with a dark theme. Gnome Terminal # [2] I am still using the built in default gnome terminal, it just works. It does all the things that I need it to do. It supports transparency renders my fonts and allows me to highlight things well. - One Dark Theme dotfiles # [3] You can find my dotfiles [4] on github. Feel free to read through and take anything that you find useful. I would encourage you not to steal them, but to integrate the parts that you want into your own dotfiles. dotfiles are a very personal thing. They are an extension of ones fingertips designed for how you think and type. zsh # [5] I use zsh as my default shell. I like to use it as my interactive shell. It works, and does a bit better with things like tab completion out of the box. starship # [6] I use the starship prompt for my shell. It works well out of the...

Incremental Versioned Datasets in Kedro

Kedro versioned datasets can be mixed with incremental and partitioned datasets to do some timeseries analysis on how our dataset changes over time. Kedro is a very extensible and composible framework, that allows us to build solutions from the individual components that it provides. This article is a great example of how you can combine these components in unique ways to achieve some powerful results with very little work. What is Kedro [1] 👆 Unsure what kedro is? Check out this post. How does our dataset change over time?? # [2] This was a question presented to me at work. We had some plots being produces as the output of our pipeline and the user wanted the ability to compare results over time. Luckily this was asked early in the project so we were able to proactively setup versioning on the right datasets. To enable this all we needed to do now was to add versioned: true and we will be able to compare results over time. Yes kedro makes it that easy to setup. set up a proje...

Manage many git repos with ease

mu-repo pip install mu mu status --short mu rev-parse --abbrev-ref HEAD mu diff --color mu diff -U0 --color

I Started Streaming on Twitch

I recently started streaming on twitch.tv/waylonwalker [1] and it’s been a blast so far. - python - kedro - Data Science - Data Engineering - webdev - digital gardening Kedro Spaceflights # [2] It all started with kedro/issues/606 [3], Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me. [4] Inspiration # [5] My introduction to twitch came from twitch.tv/theprimeagen [6]. I watched him on YouTube, and then decided to drop into a stream. It was so fun to watch him live that I started following others in the science and tech category. - twitch.tv/teej_dv [7] Brilliant neovim core dev, I learn a bunch about nvim every time I watch. - twitch.tv/cmgriffing [8] Super Chill and engaging chat. - twitch.tv/cassidoo [9] Fantastic discussion/chat. - twitch.tv/anthonywritescode [10] Building the python ...

Upcoming Stream

I'm no longer streaming As much as I would really love to make streaming work, its really hard for my family situation to make large blocks of time work for me. https://stackoverflow.com/questions/16720541/python-string-replace-regular-expression I am starting to stream 3 days per week, before I start work in the morning. These streams will likely be me just talking through things I am already doing. Making DAGs do 🔮Magical Things | Open Source 🐍Python | kedro plugins | # [1] Science & Technology | Every Monday • 7:00 AM - 9:00 AM CDT On Monday’s I am going to be working on open source packages/plugins for kedro. - kedro-diff - test kedro-diff on piplines with history - setup deploy pipeline - deply to pypi 🌱 Digital Gardening | Blogging with 🐍Python | Building 🔮Markata a static site generator in python for waylonwalker.com # [2] Science & Technology | Every Wednesday • 7:00 AM - 9:00 AM CDT On Wednesday morning I will be working on my personal website and the static s...

Kedro Spaceflights - part 2 | Stream replay June 7, 2021

This was my seconf time ever streaming on twitch.tv/waylonwalker [1], and I completely botched my mic 2x. https://youtu.be/_7MwgKu-844 Links # [2] - Spaceflights Tutorial [3] - my spaceflights repo [4] Notes to get started # [5] pipx run kedro new cd project python -m venv .venv source .venv/bin/activate pip install kedro kedro install References: [1]: https://twitch.tv/waylonwalker [2]: #links [3]: https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html [4]: https://github.com/WaylonWalker/kedro-spaceflights [5]: #notes-to-get-started

🌱 Digital Gardening | gif to Mp4 | Stream replay June 4, 2021

https://youtu.be/I4VenHqIEng Doing some Digital Gardening on stream - Ahrefs Errors - ahrefs large images - Automatic gif to mp4 gif to mp4 # [1] After this stream all gifs on my site are converted to mp4/webm if they exist. ![tmux-navigation-2021](https://dropper.waylonwalker.com/file/a33aa542-4928-4284-91ce-ca1b73a04f0f.mp4) tmux-navigation-2021 [2] References: [1]: #gif-to-mp4 [2]: https://dropper.waylonwalker.com/file/a33aa542-4928-4284-91ce-ca1b73a04f0f.mp4

Kedro Spaceflights - part 1 | Stream replay June 4, 2021

This was my first time ever streaming on twitch.tv/waylonwalker [1]. I am excited to get going. I have been streaming early in the morning while I am still waking up, so still a bit groggy as I go. https://youtu.be/Y07UBr9Ccjs Kedro Spaceflights # [2] It all started with kedro/issues/606 [3], Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me. [4] Notes # [5] pipx run kedro new cd project python -m venv .venv source .venv/bin/activate pip install kedro kedro install References: [1]: https://twitch.tv/waylonwalker [2]: #kedro-spaceflights [3]: https://github.com/kedro-org/kedro/issues/606 [4]: https://dropper.wayl.one/file/112f93d0-f521-481b-8a78-3bc583041feb.webp [5]: #notes

Comprehensive guide to creating kedro nodes

The Kedro node is an essential part of the pipeline. It defines what catalog entries get passed in, what function gets ran, and the catalog entry to save the results under. does this link work? # [1] https://waylonwalker.com/what-is-kedro/ 👆 Unsure what kedro is? Check out this post. The node function # [2] The node function is the most common and reccomended way to define kedro nodes. It is a function that constructs and returns Node objects for you. Creating your first kedro node # [3] from kedro.pipeline import node def identity(df): "a function that returns itself" return df my_first_node = node( func=identity, inputs='raw_cars', output='int_cars', tags=['int',] ) function # [4] The func passed into node can be any callable that accepts the inputs yout have specified, and returns the correct output that you specify as your output. - any callable - a function you write - a function from a library - class constructor - lambda function - partial function - l...

Creating pypi-list with kedro

I had an idea come to me via twitter. Short one word name packages are becoming hard to find on pypi. Short one word readable package names that are not a play on words are easy to remember, easy to spell correctly, and quick to type out. Simple index # [1] I started with the simple index. Pypi provides a single page listing to every single package hosted on pypi via the simple-index [2] References: [1]: #simple-index [2]: https://pypi.org/simple/

Using Kedro In Scripts

With the latest releases of kedro 0.17.x, it is now possible to run kedro pipelines from within scripts. While I would not start a project with this technique, it will be a good tool to keep in my back pocket when I want to sprinkle in a bit of kedro goodness in existing projects. New to Kedro # [1] What is Kedro [2] If your just learning about kedro check out this post walking through it No More Rabbit Hole of Errors # [3] as of 0.17.2 I’ve tried to do this in kedro 0.16.x, and it turned into a rabbit hole of errors. First kedro needed a conf directory, if you tried to fake one in it would then ask for logging setup. These errors just kept coming to the point it wasnt worth doing and I might as well use a proper template for real projects and stick to simple function calls for things that are not a kedro project. Kedro in a script # [4] To get kedro running, you will need a pipeline, catalog, and runner at a minimum. Those who have used kedro before the pipeline will look v...

Silence Kedro Logs

Kedro can have a chatty logger. While this is super nice in production so see everything that happened during a pipeline run. This can be troublesome while trying to implement a cli extension with clean output. Silence a Python log # [1] First, how does one silence a python log? Python loggers can be retrieved by the logging module’s getLogger function. Then their log level can be changed. Much of kedro’s chattiness comes from INFO level logs. I don’t want to hear about anything for my current use case unless it’s essential, i.e., a failure. In this case, I set the log levels to ERROR as most errors should stop execution anyways. python logging levels # [2] Level Numeric value CRITICAL 50 ERROR 40 WARNING 30 INFO 20 DEBUG 10 NOTSET 0 Get or Create a logger # [3] Getting a python logger is straightforward if we know the name of the logger. The following block will grab the logger object for the logger currently registered under the name passed in. logger = logging.getLog...

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight