Designing a "Router" for kedro
==============================

I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me...

Date: October 8, 2020

## nodes_global

I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called `nodes_global`. It worked fairly well but did have some issues around being set as a global variable.

_But..._

One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don't need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don't necessarily end up using it.

_Also..._

With this pattern all of the nodes needed to be importable by the module containing nodes_global. I find that this becomes a big hurdle for new pipelines coming from jupyter to overcome and can be most infuriating when their nodes aren't getting ran after they added them.

What is Kedro [1]

> If you are a bit unsure about what kedro is make sure to check out my [what-is-kedro](https://waylonwalker.com/what-is-kedro/) article.

## @node(inputs='a_raw_cars', outputs='b_int_cars')

I set off to design something that was flask-like. Around November I had something working. You could simply start creating functions. and decorate these functions with a decorator just like with flask. I even had it setup to autoname the nodes things like `create_b_int_cars`.

_But...._

This did not lend well to pulling in functions from a library or dynamically creating nodes. I didn't realize how few nodes I actually make in my pipelines that are a 1:1 relationship between the node and function in real work. Most examples work this way, but for some reason when I step into a project we end up pulling a lot of functions out of existing libraries, or dynamically creating many datasets from a list of options.

## pytest inspired

_simplicity_

The final design ended up being suggested by a colleague of mine who is not using kedro, but is a brilliant python dev. The idea was to walk through the project like pytest does looking for modules and variables with a certain pattern (`node`, or `pipeline`).

I have been using this since April and am loving it. It has have very little change since first release. When I create a new module, that automatically becomes a new pipeline in my `pipelines` dict and all of the variables with the name node get scrapped up and put into a single pipeline.

_Beginner Friendly_

Just like with pytest. You just start hacking in modules ending with `_nodes.py` with nodes in them and they just appear in your final pipeline.

## How to use it

The [readme](https://github.com/WaylonWalker/find-kedro) has some great examples.

## Install it

``` python
pip install find-kedro
```

## Enable it

Enable it by changing one line in your run.py

_run.py_

``` python
from kedro.context import KedroContext
from find_kedro import find_kedro

class ProjectContext(KedroContext):
 def _get_pipelines(self) -> Pipeline:
 return find_kedro()
```

Or if your using the new `hooks.py` method. Again no need to import all of your nodes.

_hooks.py_

``` python
class ProjectHooks:
 @hook_impl
 def register_pipelines(self) -> Dict[str, Pipeline]:
 """Register the project's pipeline.
 Returns:
 A mapping from a pipeline name to a ``Pipeline`` object.
 """

 return find_kedro()
```

## Use it

Check out the [readme](https://github.com/WaylonWalker/find-kedro) for more examples, but this one is the one that I use and recommend most often. This method helps keep nodes close to functions that are designed for them.

_my_nodes.py_

``` python
# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

nodes = []

def split_data(df: pd.DataFrame, ratio: float) -> Dict[str, pd.DataFrame]:
 ...

nodes.append(
 node(
 split_data,
 ["example_iris_data", "params:example_test_data_ratio"],
 dict(
 train_x="example_train_x",
 train_y="example_train_y",
 test_x="example_test_x",
 test_y="example_test_y",
 ),
 )
)
```

## Want a simple guide to get started with find kedro

 [2]

In [this doc](https://find.kedro.dev/examples/iris/) I transform the kedro iris template to find-kedro.

## Ready to start using kedro

If you still have not tried out kedro, it's easier than you think. Check out [create-new-kedro-project](https://waylonwalker.com/create-new-kedro-project/) to get a project started in just a few minutes.

Create New Kedro Project [3]

References:
[1]: /what-is-kedro/
[2]: https://find.kedro.dev/examples/iris/
[3]: /create-new-kedro-project/