Cache a python function with lru_cache
======================================

The easiest way to speed up any code is to run less code. A common technique to reduce the amount of repative work is to implement a cache such that the next...

Date: March 28, 2022

The easiest way to speed up any code is to run less code. A common technique
to reduce the amount of repative work is to implement a cache such that the
next time you need the same work done, you don't need to recompute anything you
can simply retrieve it from a cache.

## lru_cache

The easiest and most common to setup in python is a builtin functools.lru_cache.

```python
from functools import lru_cache

@lru_cache
def get_cars():
 print('pulling cars data')
 return pd.read_csv("https://waylonwalker.com/cars.csv", storage_options = {'User-Agent': 'Mozilla/5.0'})
```

## when to use lru_cache

Any time you have a function where you expect the same results each time a
function is called with the same inputs, you _can_ use lru_cache.

> when same *args, **kwargs always return the same value

lru_cache only works for one python process. If you are running multiple
subprocesses, or running the same script over and over, lru_cache will not
work.

> lru_cache only caches in a single python process

## max_size

lru_cache can take an optional parameter `maxsize` to set the size of your
cache. By default its set to `128`, if you want to store more or less items in
your cache you can adjust this value.

The `get_cars` example is a bit of a unique one. As
[anthonywritescode](https://www.youtube.com/watch?v=K0Q5twtYxWY) points out
this implementation is behaving like a singleton, and we can optimize the size
of the cache by allocating exactly how many items we will ever have in it by
setting its value to 1.

```python
from functools import lru_cache

@lru_cache(maxsize=1)
def get_cars():
 print('pulling cars data')
 return pd.read_csv("https://waylonwalker.com/cars.csv", storage_options = {'User-Agent': 'Mozilla/5.0'})
```

## My example stretches the rule a little bit

The example above does a web request. As a Data Engineer I often write scripts
that run for a short time then stop. I do not expect the output of this
function to change during the runtime of this job, and if it did I may actually
want them to match anyways.

> web request do change their output

If I were building webapps, or some sort of process that was running for a long
time. Something that starts and waits for work, this may not be a good
application of lru_cache. If this process is running for days or months my
assumption that the request does not change is no longer valid.

## There's also a typed kwarg for lru_cache

This one is new to me but you can cache not only on the value, but the type of
the value being passed into your function.

> (from the docstring)
> If *typed* is True, arguments of different types will be cached separately.
> For example, f(3.0) and f(3) will be treated as distinct calls with distinct
> results.