I work with a lot of transactional timeseries data that includes categories. I often want to create timeseries plots with each category as its own line. This is the method that I use almost data to achieve this result. Typically the data that am working with changes very slowly and trends happen over years not days or weeks. Plotting daily/weekly data tends to be noisy and hides the trend. I use this pattern because it works well with my data and is easy to explain to my stakeholders.
import pandas as pd import numpy as np % matplotlib inline
Lets Fake some data ¶ #
Here I am trying to simulate a subset of a large transactional data set. This could be something like sales data, production data, hourly billing, anything that has a date, category, and value. Since we generated this data we know that it is clean. I am still going to assume that it contains some nulls, and an irregular date range.
n = 365*5 cols = {'level_0': 'date', 'level_1': 'item', 0: 'qty', } data = (pd.DataFrame(np.random.randint(0, 10, size=(n, 4)), columns=['paper', 'pencils', 'note cards', 'markers'], index=pd.date_range('1/1/2017', periods=n, freq='d'), ) .stack() .to_frame() .reset_index() .rename(columns=cols)) data.head()
...