Multiple horizons predictive modeling

Multiple horizons predictive modeling#

Environment setup#

We need to install some extra dependencies for this notebook if needed (when running jupyterlite).

%pip install -q https://pypi.anaconda.org/ogrisel/simple/polars/1.24.0/polars-1.24.0-cp39-abi3-emscripten_3_1_58_wasm32.whl
%pip install -q skrub altair holidays plotly nbformat

ERROR: polars-1.24.0-cp39-abi3-emscripten_3_1_58_wasm32.whl is not a supported wheel on this platform.

Note: you may need to restart the kernel to use updated packages.

Note: you may need to restart the kernel to use updated packages.

import datetime
import warnings

import altair
import cloudpickle
import pyarrow  # noqa: F401
import tzdata  # noqa: F401

from tutorial_helpers import plot_horizon_forecast

# Ignore warnings from pkg_resources triggered by Python 3.13's multiprocessing.
warnings.filterwarnings("ignore", category=UserWarning, module="pkg_resources")

with open("feature_engineering_pipeline.pkl", "rb") as f:
    feature_engineering_pipeline = cloudpickle.load(f)


features = feature_engineering_pipeline["features"]
targets = feature_engineering_pipeline["targets"]
prediction_time = feature_engineering_pipeline["prediction_time"]
horizons = feature_engineering_pipeline["horizons"]
target_column_name_pattern = feature_engineering_pipeline["target_column_name_pattern"]

Predicting multiple horizons with a grid of single output models#

It is really common to predict values for multiple horizons at once. The most naive approach is to train as many models as there are horizons. To achieve this, scikit-learn provides a meta-estimator called MultiOutputRegressor that can be used to train a single model that predicts multiple horizons at once.

In short, we only need to provide multiple targets where each column corresponds to an horizon and this meta-estimator will train an independent model for each column. However, we could expect that the quality of the forecast might degrade as the horizon increases.

Let’s train a gradient boosting regressor for each horizon.

from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import HistGradientBoostingRegressor

multioutput_predictions = features.skb.apply(
    MultiOutputRegressor(
        estimator=HistGradientBoostingRegressor(random_state=0), n_jobs=-1
    ),
    y=targets.skb.drop(cols=["prediction_time", "load_mw"]).skb.mark_as_y(),
)

Now, let’s just rename the columns for the predictions to make it easier to plot the horizon forecast.

target_column_names = [target_column_name_pattern.format(horizon=h) for h in horizons]
predicted_target_column_names = [
    f"predicted_{target_column_name}" for target_column_name in target_column_names
]
named_predictions = multioutput_predictions.rename(
    {k: v for k, v in zip(target_column_names, predicted_target_column_names)}
)

Let’s plot the horizon forecast on a training data to check the validity of the output.

plot_at_time = datetime.datetime(2021, 4, 19, 0, 0, tzinfo=datetime.timezone.utc)
plot_horizon_forecast(
    targets,
    named_predictions,
    plot_at_time,
    target_column_name_pattern,
).skb.preview()

On this curve, the red line corresponds to the observed values past to the the date for which we would like to forecast. The orange line corresponds to the observed values for the next 24 hours and the blue line corresponds to the predicted values for the next 24 hours.

Since we are using a strong model and very few training data to check the validity we observe that our model perfectly fits the training data.

So, we are now ready to assess the performance of this multi-output model and we need to cross-validate it. Since we do not want to aggregate the metrics for the different horizons, we need to create a scikit-learn scorer in which we set multioutput="raw_values" to get the scores for each horizon.

Passing this scorer to the cross_validate function returns all horizons scores.

from sklearn.model_selection import TimeSeriesSplit


max_train_size = 2 * 52 * 24 * 7  # max ~2 years of training data
test_size = 24 * 7 * 24  # 24 weeks of test data
gap = 7 * 24  # 1 week gap between train and test sets
ts_cv_5 = TimeSeriesSplit(
    n_splits=5, max_train_size=max_train_size, test_size=test_size, gap=gap
)

from sklearn.metrics import r2_score, mean_absolute_percentage_error


def multioutput_scorer(regressor, X, y, score_func, score_name):
    y_pred = regressor.predict(X)
    return {
        f"{score_name}_horizon_{h}h": score
        for h, score in enumerate(
            score_func(y, y_pred, multioutput="raw_values"), start=1
        )
    }


def scoring(regressor, X, y):
    return {
        **multioutput_scorer(regressor, X, y, mean_absolute_percentage_error, "mape"),
        **multioutput_scorer(regressor, X, y, r2_score, "r2"),
    }


multioutput_cv_results = multioutput_predictions.skb.cross_validate(
    cv=ts_cv_5,
    scoring=scoring,
    return_train_score=True,
    verbose=1,
    n_jobs=-1,
)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.

/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py:782: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(

[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  1.8min finished

One thing that we observe is that training such multi-output model is expensive. It is expected since each horizon involves a different model and thus a training.

multioutput_cv_results.round(3)

	fit_time	score_time	test_mape_horizon_1h	train_mape_horizon_1h	test_mape_horizon_2h	train_mape_horizon_2h	test_mape_horizon_3h	train_mape_horizon_3h	test_mape_horizon_4h	train_mape_horizon_4h	...	test_r2_horizon_20h	train_r2_horizon_20h	test_r2_horizon_21h	train_r2_horizon_21h	test_r2_horizon_22h	train_r2_horizon_22h	test_r2_horizon_23h	train_r2_horizon_23h	test_r2_horizon_24h	train_r2_horizon_24h
0	71.588	1.734	0.013	0.007	0.020	0.011	0.025	0.012	0.027	0.013	...	0.949	0.993	0.958	0.994	0.959	0.994	0.962	0.995	0.963	0.995
1	78.411	1.538	0.016	0.008	0.025	0.012	0.029	0.014	0.030	0.014	...	0.970	0.992	0.974	0.993	0.975	0.994	0.977	0.994	0.977	0.994
2	78.416	1.585	0.016	0.009	0.023	0.013	0.025	0.015	0.027	0.015	...	0.960	0.991	0.970	0.993	0.971	0.993	0.973	0.993	0.973	0.993
3	77.688	1.649	0.012	0.010	0.018	0.014	0.021	0.015	0.023	0.016	...	0.973	0.989	0.977	0.992	0.980	0.993	0.979	0.993	0.979	0.993
4	19.288	0.451	0.013	0.010	0.019	0.014	0.023	0.016	0.026	0.016	...	0.969	0.989	0.974	0.992	0.976	0.993	0.978	0.993	0.978	0.993

5 rows × 98 columns

Instead of reading the results in the table, we can plot the scores depending on the type of data and the metric.

import itertools
from IPython.display import display

for metric_name, dataset_type in itertools.product(["mape", "r2"], ["train", "test"]):
    columns = multioutput_cv_results.columns[
        multioutput_cv_results.columns.str.startswith(f"{dataset_type}_{metric_name}")
    ]
    data_to_plot = multioutput_cv_results[columns]
    data_to_plot.columns = [
        col.replace(f"{dataset_type}_", "")
        .replace(f"{metric_name}_", "")
        .replace("_", " ")
        for col in columns
    ]

    data_long = data_to_plot.melt(var_name="horizon", value_name="score")
    chart = (
        altair.Chart(
            data_long,
            title=f"{dataset_type.title()} {metric_name.upper()} scores by horizon",
        )
        .mark_boxplot(extent="min-max")
        .encode(
            x=altair.X(
                "horizon:N",
                title="Horizon",
                sort=altair.Sort(
                    [f"horizon {h}h" for h in range(1, data_to_plot.shape[1])]
                ),
            ),
            y=altair.Y("score:Q", title=f"{metric_name.upper()} Score"),
            color=altair.Color("horizon:N", legend=None),
        )
    )

    display(chart)

An interesting and unexpected observation is that the MAPE error on the test data is first increases and then decreases once past the horizon 18h. We would not necessarily expect this behaviour.

Native multi-output handling using `RandomForestRegressor`#

In the previous section, we showed how to wrap a HistGradientBoostingRegressor in a MultiOutputRegressor to predict multiple horizons. With such a strategy, it means that we trained independent HistGradientBoostingRegressor, one for each horizon.

RandomForestRegressor natively supports multi-output regression: instead of independently training a model per horizon, it will train a joint model that predicts all horizons at once.

Repeat the previous analysis using a RandomForestRegressor. Fix the parameter min_samples_leaf to 30 to limit the depth.

Once you created the model, plot the horizon forecast for a given date and time. In addition, compute the cross-validated predictions and plot the R2 and MAPE scores for each horizon.

Does this model perform better or worse than the previous model?

from sklearn.ensemble import RandomForestRegressor

# Write your code here.
#
#
#
#
#
#
#
#
#
#
#

multioutput_predictions_rf = features.skb.apply(
    RandomForestRegressor(min_samples_leaf=30, random_state=0, n_jobs=-1),
    y=targets.skb.drop(cols=["prediction_time", "load_mw"]).skb.mark_as_y(),
)

named_predictions_rf = multioutput_predictions_rf.rename(
    {k: v for k, v in zip(target_column_names, predicted_target_column_names)}
)

plot_at_time = datetime.datetime(2021, 4, 24, 0, 0, tzinfo=datetime.timezone.utc)
plot_horizon_forecast(
    targets,
    named_predictions_rf,
    plot_at_time,
    target_column_name_pattern,
).skb.preview()

multioutput_cv_results_rf = multioutput_predictions_rf.skb.cross_validate(
    cv=ts_cv_5,
    scoring=scoring,
    return_train_score=True,
    verbose=1,
    n_jobs=-1,
)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.

[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.2min finished

multioutput_cv_results_rf.round(3)

	fit_time	score_time	test_mape_horizon_1h	train_mape_horizon_1h	test_mape_horizon_2h	train_mape_horizon_2h	test_mape_horizon_3h	train_mape_horizon_3h	test_mape_horizon_4h	train_mape_horizon_4h	...	test_r2_horizon_20h	train_r2_horizon_20h	test_r2_horizon_21h	train_r2_horizon_21h	test_r2_horizon_22h	train_r2_horizon_22h	test_r2_horizon_23h	train_r2_horizon_23h	test_r2_horizon_24h	train_r2_horizon_24h
0	85.378	0.384	0.031	0.023	0.035	0.026	0.036	0.027	0.038	0.029	...	0.883	0.947	0.885	0.949	0.884	0.948	0.881	0.947	0.876	0.945
1	105.237	0.526	0.033	0.023	0.038	0.026	0.040	0.028	0.042	0.029	...	0.914	0.950	0.918	0.950	0.918	0.949	0.919	0.948	0.918	0.946
2	109.540	0.209	0.028	0.024	0.033	0.027	0.035	0.029	0.037	0.030	...	0.901	0.948	0.904	0.949	0.904	0.947	0.904	0.946	0.901	0.944
3	109.844	0.211	0.023	0.024	0.027	0.028	0.029	0.030	0.031	0.031	...	0.901	0.943	0.904	0.945	0.904	0.944	0.904	0.942	0.902	0.940
4	46.749	0.078	0.031	0.024	0.035	0.027	0.038	0.029	0.041	0.031	...	0.891	0.944	0.894	0.945	0.896	0.944	0.896	0.942	0.893	0.940

5 rows × 98 columns

import itertools
from IPython.display import display

for metric_name, dataset_type in itertools.product(["mape", "r2"], ["train", "test"]):
    columns = multioutput_cv_results_rf.columns[
        multioutput_cv_results_rf.columns.str.startswith(
            f"{dataset_type}_{metric_name}"
        )
    ]
    data_to_plot = multioutput_cv_results_rf[columns]
    data_to_plot.columns = [
        col.replace(f"{dataset_type}_", "")
        .replace(f"{metric_name}_", "")
        .replace("_", " ")
        for col in columns
    ]

    data_long = data_to_plot.melt(var_name="horizon", value_name="score")
    chart = (
        altair.Chart(
            data_long,
            title=f"{dataset_type.title()} {metric_name.upper()} Scores by Horizon",
        )
        .mark_boxplot(extent="min-max")
        .encode(
            x=altair.X(
                "horizon:N",
                title="Horizon",
                sort=altair.Sort(
                    [f"horizon {h}h" for h in range(1, data_to_plot.shape[1])]
                ),
            ),
            y=altair.Y("score:Q", title=f"{metric_name.upper()} Score"),
            color=altair.Color("horizon:N", legend=None),
        )
    )

    display(chart)

We observe that the performance of the RandomForestRegressor is not better in terms of scores or computational cost. The trend of the scores along the horizon is also different from the HistGradientBoostingRegressor: the scores worsen as the horizon increases.

Multiple horizons predictive modeling

Contents

Multiple horizons predictive modeling#

Environment setup#

Predicting multiple horizons with a grid of single output models#

Native multi-output handling using RandomForestRegressor#

Native multi-output handling using `RandomForestRegressor`#