Getting started#

This tutorial teaches you how to load, manipulate, and visualize time series datasets. You can follow it sequentially or jump to specific questions as needed.

Installation#

Cumulative requires Python 3.10+ and depends on MLtraq 0.1.36+, and Pandas 1.5.3+, which are installed as dependencies. To install:

pip install cumulative --upgrade

Examples#

The code examples are fully self-contained to reproduce the outputs. This example shows the Cumulative version used to compile this tutorial. Make sure to have the latest release installed.

Cumulative version

import cumulative

print(cumulative.__version__)

Output

0.1.23

Key concepts#

Collections of time series#

The Cumulative class handles collections of time series of varying length and their transformations. The data is stored as a Pandas dataframe, using NumPy arrays as cell values.

Example of time series collection

import numpy as np
import pandas as pd

from cumulative import Cumulative

df = pd.DataFrame(
    {
        "base.x": [np.array([0, 1, 2, 3, 4, 5]), np.array([0, 1, 2, 3])],
        "base.y": [
            np.array([10, 20, 30, 40, 50, 60]),
            np.array([5, 15, 25, 35]),
        ],
        "base.category": ["A", "B"],
    }
)


c = Cumulative(df)
print(c.df)

Output

               base.x                    base.y base.category
0  [0, 1, 2, 3, 4, 5]  [10, 20, 30, 40, 50, 60]             A
1        [0, 1, 2, 3]           [5, 15, 25, 35]             B

Dataframe columns#

The column names are organized hierarchically, using the dot as separator. Prefixes can be used as source and destination of transformations. Naming conventions:
The column suffixes .x and .y represent the X, Y values of the series.
The column prefix base. is the default source and destination of transformations.

Transformations#

Transformations are applied to a subset of columns with a source prefix and might result in additional columns with a destination prefix or reordered rows. One of the simplest transformation is copy.

Example of copy transform

import pandas as pd

from cumulative import Cumulative

df = pd.DataFrame(
    {
        "base.x": [[0, 1, 2, 3, 4, 5], [0, 1, 2, 3]],
        "base.y": [[10, 20, 30, 40, 50, 60], [5, 15, 25, 35]],
    }
)

c = Cumulative(df)
c.copy(src="base", dst="test")

print("Type: ", type(c.df))
print("--")
print(c.df)

Output

Type:  <class 'pandas.core.frame.DataFrame'>
--
               base.x                    base.y              test.x                    test.y
0  [0, 1, 2, 3, 4, 5]  [10, 20, 30, 40, 50, 60]  [0, 1, 2, 3, 4, 5]  [10, 20, 30, 40, 50, 60]
1        [0, 1, 2, 3]           [5, 15, 25, 35]        [0, 1, 2, 3]           [5, 15, 25, 35]

Pipelines#

Transformations are the base element to construct transformation pipelines. In the following example, we apply a cumsum operation to the base. prefix arrays x and y, saving the result to the C prefix. The C prefix arrays are then piped as input for the minmax scaler, with destination prefix S. The result is a data frame with additional columns for each step. If source or destination is omitted, the default prefix base. is used.

Example of piped transforms

import pandas as pd

from cumulative import Cumulative

df = pd.DataFrame(
    {
        "base.x": [[0, 1, 2, 3, 4, 5], [0, 1, 2, 3]],
        "base.y": [[10, 20, 30, 40, 50, 60], [5, 15, 25, 35]],
    }
)

c = Cumulative(df)
c.cumsum(src="base", dst="C").scale(src="C", dst="S", kind="xy")

print(c.df.iloc[0])

Output

base.x                   [0, 1, 2, 3, 4, 5]
base.y             [10, 20, 30, 40, 50, 60]
C.x                      [0, 1, 2, 3, 4, 5]
C.y             [10, 30, 60, 100, 150, 210]
S.x          [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
S.x.min                                   0
S.x.max                                   5
S.y        [0.0, 0.1, 0.25, 0.45, 0.7, 1.0]
S.y.min                                  10
S.y.max                                 210
Name: 0, dtype: object

Tip

By default, all destination columns with the destination prefix are dropped before adding the new ones, ensuring a clean and consistent state.