Feature Engineering

Feature Engineering Module

This module provides classes for time series feature engineering and transformation, including autoregressive features, differencing, seasonality, and trend decomposition.

class predspot.feature_engineering.AR(lags, tfreq, debug=False)[source]

Bases: TimeSeriesFeatures

Autoregressive features implementation.

apply_ts_decomposition(ts)[source]

Apply autoregressive transformation (identity).

Parameters:

ts (pandas.Series) – Input time series

Returns:

Original time series

Return type:

pandas.Series

property label

Feature label for autoregressive features

Type:

str

set_transform_request(*, stseries: bool | None | str = '$UNCHANGED$') AR

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

stseries (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for stseries parameter in transform.

Returns:

self – The updated object.

Return type:

object

class predspot.feature_engineering.Diff(lags, tfreq, debug=False)[source]

Bases: TimeSeriesFeatures

Difference features implementation.

apply_ts_decomposition(ts)[source]

Apply difference transformation.

Parameters:

ts (pandas.Series) – Input time series

Returns:

Differenced time series

Return type:

pandas.Series

property label

Feature label for difference features

Type:

str

set_transform_request(*, stseries: bool | None | str = '$UNCHANGED$') Diff

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

stseries (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for stseries parameter in transform.

Returns:

self – The updated object.

Return type:

object

class predspot.feature_engineering.FeatureScaling(estimator, debug=False)[source]

Bases: TransformerMixin, BaseEstimator

Feature scaling transformer.

Parameters:
  • estimator – Scikit-learn compatible scaling estimator

  • debug (bool, optional) – Enable debug printing. Defaults to False

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') FeatureScaling

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(x)[source]

Transform features using the scaling estimator.

Parameters:

x (pandas.DataFrame) – Input features

Returns:

Scaled features

Return type:

pandas.DataFrame

class predspot.feature_engineering.Seasonality(lags, tfreq, debug=False)[source]

Bases: TimeSeriesFeatures

Seasonal decomposition features implementation.

apply_ts_decomposition(ts)[source]

Extract seasonal component from time series.

Parameters:

ts (pandas.Series) – Input time series

Returns:

Seasonal component

Return type:

pandas.Series

property label

Feature label for seasonal features

Type:

str

set_transform_request(*, stseries: bool | None | str = '$UNCHANGED$') Seasonality

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

stseries (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for stseries parameter in transform.

Returns:

self – The updated object.

Return type:

object

class predspot.feature_engineering.TimeSeriesFeatures(lags, tfreq, debug=False)[source]

Bases: BaseEstimator, TransformerMixin

Base class for time series feature engineering.

Parameters:
  • lags (int) – Number of time lags to use for feature creation

  • tfreq (str) – Time frequency (‘D’ for daily, ‘W’ for weekly, ‘M’ for monthly)

  • debug (bool, optional) – Enable debug printing. Defaults to False

Raises:

AssertionError – If lags is not a positive integer or tfreq is invalid

abstract apply_ts_decomposition(ts)[source]

Apply time series decomposition.

Parameters:

ts (pandas.Series) – Input time series

Returns:

Transformed time series

Return type:

pandas.Series

property label

Feature label identifier

Type:

str

property lags

Number of time lags

Type:

int

make_lag_df(ts)[source]

Create lagged features dataframe.

Parameters:

ts (pandas.Series) – Input time series

Returns:

(lag_df, aligned_ts) - Lagged features and aligned original series

Return type:

tuple

Raises:

AssertionError – If series length is less than number of lags

set_transform_request(*, stseries: bool | None | str = '$UNCHANGED$') TimeSeriesFeatures

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

stseries (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for stseries parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(stseries)[source]

Transform the input series into lagged features.

Parameters:

stseries (pandas.Series) – Input time series with multi-index (time, places)

Returns:

Transformed features

Return type:

pandas.DataFrame

class predspot.feature_engineering.Trend(lags, tfreq, debug=False)[source]

Bases: TimeSeriesFeatures

Trend decomposition features implementation.

apply_ts_decomposition(ts)[source]

Extract trend component from time series.

Parameters:

ts (pandas.Series) – Input time series

Returns:

Trend component

Return type:

pandas.Series

property label

Feature label for trend features

Type:

str

set_transform_request(*, stseries: bool | None | str = '$UNCHANGED$') Trend

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

stseries (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for stseries parameter in transform.

Returns:

self – The updated object.

Return type:

object