Posts tagged: kedro

All posts with the tag "kedro"

What's New in Kedro 0.16.4

If we take a look at the release notes I see one major feature improvement on the list, auto-discovery of hooks.

## Major features and improvements * Enabled auto-discovery of hooks implementations coming from installed plugins.

This one comes a bit surprising as it was just casually mentioned in #435

auto enabled plugins mentioned in issue 435

...

Kedro Catalog

I am exploring a kedro catalog meta data hook, these are some notes about what I am thinking.

try pandas method -> try spark -> try dict/list -> none

Is there an easy way to create a nosql database in memory from a a list of dictionaries?

Gracefully adopt kedro, the catalog

While using the catalog alone will not reap all of the benefits of the framework, it does get you and your project ready for the full framework eventually. For me the full benefit of the catalog comes when you combine it with the pipeline and dont even touch read/write steps at all.

Taking a step into kedro by adopting the catalog first will give you a way to organize all of your data loads in one place, and stop manually writing read/write code, which can be different for each data and storage type. You just don’t need to think about it.

“can be dropped into kedro later” Let’s talk a bit more about that

...

How to find things in your kedro catalog

kedro 0.16.2 just dropped last week with a long-awaited feature… catalog search! I went as far as monkey patching this into each of my projects. I work jump between a few really big projects that have tons of datasets. Being able to quickly search for what I need is so useful.

The kedro data catalog is a key component to the kedro framework. It handles all data loading and saving for you. It is configurable and hackable. Having all your data connections listed in one place make it so easy to pick your project up and move it to a completely new environment. That sweet imperative loading style saves so much read/write overhead. I can load all my data with a single command whether it’s in amazon s3, google cloud platform, or a local file.

Just like with most of these articles, I am going to create a conda environment so that I don’t break any existing projects and scaffold up a toy project to learn from.

...

004

🔥 #kedrotips use find-kedro to assembly your pipelines

1 min

002

** 0.3.0 just launched with _ support 🎉

1 min

Kedro Static Viz 0.3.0 is out with Hooks Support

kedro-static-viz is out with support for the newly released hooks feature. This means that you can have kedro-static-viz automatically deploy a full gatsby site before_pipeline_run keeping your visualization always up to date.

Even though it is a static site there is no functionality lost. The only thing that’s missing is the flask server. With kedro-static-viz you can deploy your visualization to a number of static hosting providers such as GitHub pages free of charge with wicked fast performance

Even though it’s built on gatsbyjs the full site builds in under 2s even on slower hardware. This is because the site is already pre-rendered and stripped of any excess. It’s zipped up right into the python package and is typically used with the cli, but now can be used with python, or as a hook as well.

...

Brainstorming Kedro Hooks

This post is a 🧠 branstorming work in progress. I will likely use it as a storage location/brain dump of hook ideas.

If you are completely unsure what kedro is be sure to check out my what is kedro post

hooks are executed in reverse order of the hooks list.

...

3 min read

Create Custom Kedro Dataset

Kedro provides an efficient way to build out data catalogs with their yaml api. It allows you to be very declaritive about loading and saving your data. For the most part you just need to tell Kedro what connector to use and its filepath. When running Kedro takes care of all of the read/write, you just reference the catalog key.

Under the hood there is an AbstractDataSet that each connector inherits from. It sets up a lot of the behind the scenes structure for us so that we dont have to. For the most part kedro has connectors for about anything that you want to load, csv, parquet, sql, json, from about anywhere, http, s3, localfile system are just some of the examples.

Here is a DataSet implementation from their docs. Here you can see the barebones example straight from the docs. Parameters from the yaml catalog will get passed in

creating the kedro-preflight hook

Kedro Hooks Intro - kedro hooks are an exciting upcoming feature of kedro 0.16.0. They allow you to hook into catalog_created,pipeline_run, and node_run(nouns). With a before, or after (adjective). This really reminds me of reacts lifecycle hooks, that let you hook into various state of react web components. This is going to make kedro so extendable by the community. I am super pumped to see what the community is able to do with this ability.

kedro hooks are an exciting upcoming feature of kedro 0.16.0. They allow you to hook into catalog_created,pipeline_run, and node_run(nouns). With a before, or after (adjective). This really reminds me of reacts lifecycle hooks, that let you hook into various state of react web components. This is going to make kedro so extendable by the community. I am super pumped to see what the community is able to do with this ability.

What is Kedro

...

📝 Kedro Preflight Notes

This is a very rough idea for a kedro package to prevent time lost to get partway through a pipeline run only to realize that you dont have access to data or resources.

📢 Announcing find-kedro

find-kedro is a small library to enhance your kedro experience. It looks through your modules to find kedro pipelines, nodes, and iterables (lists, sets, tuples) of nodes. It then assembles them into a dictionary of pipelines, each module will create a separate pipeline, and __default__ being a combination of all pipelines. This format is compatible with the kedro _create_pipelines format.

Build-Docs

kedro is a ✨ fantastic project that allows for super-fast prototyping of data pipelines, while yielding production-ready pipelines. find-kedro enhances this experience by adding a pytest like node/pipeline discovery eliminating the need to bubble up pipelines through modules.

...

3 min read

What is Kedro

What is Kedro

This is my original what-is-kedro article. There is a brand new one

Kedro is an open source data pipeline framework. It provides guardrails to set your project up right from the start without needing to know deeply how to setup your own python library for data pipelining. It includes really great ways to manipulate catalogs and pipelines. This article will cover the 10K view of kedro, future articles will dive deper into each one.

...

Kedro

See all of my kedro related posts in [[ tag/kedro ]].

I am tweeting out most of these snippets as I add them, you can find them all here #kedrotips.

Below are some quick snippets/notes for when using kedro to build data pipelines. So far I am just compiling snippets. Eventually I will create several posts on kedro. These are mostly things that I use In my everyday with kedro. Some are a bit more essoteric. Some are helpful when writing production code, some are useful more usefule for exploration.

...