Posts tagged: kedro

All posts with the tag "kedro"

I Started Streaming on Twitch

I recently started streaming on twitch.tv/waylonwalker and it’s been a blast so far.

It all started with kedro/issues/606, Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me.

My introduction to twitch came from twitch.tv/theprimeagen. I watched him on YouTube, and then decided to drop into a stream. It was so fun to watch him live that I started following others in the science and tech category.

...

Kedro Spaceflights - part 1 | Stream replay June 4, 2021

This was my first time ever streaming on twitch.tv/waylonwalker. I am excited to get going. I have been streaming early in the morning while I am still waking up, so still a bit groggy as I go.

https://youtu.be/Y07UBr9Ccjs

It all started with kedro/issues/606, Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me.

Creating pypi-list with kedro

I had an idea come to me via twitter. Short one word name packages are becoming hard to find on pypi. Short one word readable package names that are not a play on words are easy to remember, easy to spell correctly, and quick to type out.

I started with the simple index. Pypi provides a single page listing to every single package hosted on pypi via the simple-index

Using Kedro In Scripts

With the latest releases of kedro 0.17.x, it is now possible to run kedro pipelines from within scripts. While I would not start a project with this technique, it will be a good tool to keep in my back pocket when I want to sprinkle in a bit of kedro goodness in existing projects.

What is Kedro

If your just learning about kedro check out this post walking through it

...

Silence Kedro Logs

Kedro can have a chatty logger. While this is super nice in production so see everything that happened during a pipeline run. This can be troublesome while trying to implement a cli extension with clean output.

First, how does one silence a python log? Python loggers can be retrieved by the logging module’s getLogger function. Then their log level can be changed. Much of kedro’s chattiness comes from INFO level logs. I don’t want to hear about anything for my current use case unless it’s essential, i.e., a failure. In this case, I set the log levels to ERROR as most errors should stop execution anyways.

Getting a python logger is straightforward if we know the name of the logger. The following block will grab the logger object for the logger currently registered under the name passed in.

...

Vim Fugitive

:G :G status :G commit :G add % :Gdiff :G push :Glog

Add current file and commit with diff in a split #

function! s:GitAdd() exe "G add %" exe "G diff --staged" exe "only" exe "G commit" endfunction :command! GitAdd :call s:GitAdd() nnoremap gic :GitAdd<CR> 

:on[ly] #

C-W o

:on[ly] will make the current buffer the only one on the screen. This is super helpful as many of fugitive commands will open in a split by default.

cycle through the jumplist

...

Kedro pipeline_registry.py

With the realease of kedro==0.17.2 came a new module in the project template pipeline_registry.py. Here are some notes that I learned while playing with this new module.

You should now have something that looks like this in your src/<package-name>/pipeline_registry.py.

"""Project pipelines.""" from typing import Dict from kedro.pipeline import Pipeline def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a ``Pipeline`` object. """ return {"__default__": Pipeline([])}

pipeline_registry only works in kedro>=0.17.2

...

Minimal Kedro Pipeline

How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It’s only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py.

📝 Note this is only a composable pipeline, not a full project, it does not contain a catalog or runner.

I have everything for this post hosted in this gihub repo, you can fork it, clone it, or just follow along.

...

Kedro - My Data Is Not A Table

In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask.

These containers for data contain many convenient methods to manipulate table like data structures. Sometimes we leverage other data types, namely vanilla types like lists and dicts, or even numpy data types.

What is Kedro

...

Testing Data Pipelines

Lint/Format/Doc ¶ black flake8 interrogate mypy Pipeline Assertions ¶ pipeline constructs pipeline as expected nodes pipeline has minimum nodes test minimum tags test alternate tags Catalog Assertions ¶ test catalog follows naming structure Node Tests ¶ test function does the correct operations on test data Great Expectations ¶

reasons-to-kedro

There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benefits for teams to collaborate, develop, and deploy data pipelines

What is Kedro

Kedro makes it super easy to get started with their cli that utilizes cookiecutter under the hood.

...

Reasons to Kedro

Reasons to Kedro ¶ # collaboration Sharable catalog small nodes over monolithic notebooks catalog easily load anything without needing to run No need to write read/write code pipeline No need to keep execution order in your head easily run a slice of a pipeline plugins pip install make your own hooks flexible expandable cli Reasons Not to Kedro ¶ # Already utilizing another DAG framework Data is not in a widely supported format Micro short-lived project Large Project / Deadline Use a lower profile project to learn first Team not willing to change Need minimal dependencies God Project - kedro owns everything??
1 min read

What's New in Kedro 0.16.6

Kedro 0.16.6 is out! Let’s take a look through the release notes

This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible.

Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies.

...

A brain dump of stories

I started making stories as kind of a brain dump a few times per day and posting them to [LinkedIn](https://www.linkedin.com/in/waylonwalker/(https://www.linkedin.com/in/waylonwalker/). Here are the last 11 days of stories.

I store all the stories on my website with the hopes of doing something with them on my own platform eventually. For now it makes it easy to make these posts.

cd static/stories ls | xargs -I {} echo '![](https://waylonwalker.com/stories/{})'

Stories 10-10-2020 - 10-21-2020 #