ML Modelling¶
Machine Learning Modelling Module
This module provides classes for machine learning model pipelines, feature selection, and prediction functionality for crime density forecasting.
- class predspot.ml_modelling.FeatureSelection(estimator, debug=False)[source]
Bases:
TransformerMixin
,BaseEstimator
Feature selection transformer.
- Parameters:
estimator – Scikit-learn compatible feature selector
debug (bool, optional) – Enable debug printing. Defaults to False
- fit(x, y=None)[source]
Fit the feature selector.
- Parameters:
x (pandas.DataFrame) – Input features
y (pandas.Series, optional) – Target variable
- Returns:
The fitted instance
- Return type:
self
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') FeatureSelection
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') FeatureSelection
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter intransform
.- Returns:
self – The updated object.
- Return type:
object
- transform(x)[source]
Transform features using the feature selector.
- Parameters:
x (pandas.DataFrame) – Input features
- Returns:
Selected features
- Return type:
pandas.DataFrame
- class predspot.ml_modelling.Model(estimator, debug=False)[source]
Bases:
RegressorMixin
,BaseEstimator
Model wrapper for crime density prediction.
- Parameters:
estimator – Scikit-learn compatible regression estimator
debug (bool, optional) – Enable debug printing. Defaults to False
- fit(x, y=None)[source]
Fit the regression model.
- Parameters:
x (pandas.DataFrame) – Input features
y (pandas.Series, optional) – Target variable
- Returns:
The fitted instance
- Return type:
self
- predict(x)[source]
Make predictions using the fitted model.
- Parameters:
x (pandas.DataFrame) – Input features
- Returns:
Predictions with ‘crime_density’ column
- Return type:
pandas.DataFrame
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') Model
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') Model
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Model
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class predspot.ml_modelling.PredictionPipeline(mapping, fextraction, estimator, debug=False)[source]
Bases:
RegressorMixin
,BaseEstimator
Complete pipeline for crime density prediction.
- Parameters:
mapping – Spatial mapping transformer
fextraction – Feature extraction transformer
estimator – Scikit-learn compatible pipeline or estimator
debug (bool, optional) – Enable debug printing. Defaults to False
- property dataset
Current dataset being used
- Type:
Dataset
- evaluate(scoring, cv=5)[source]
Evaluate model performance using time series cross-validation.
- Parameters:
scoring (str) – Scoring metric (‘r2’ or ‘mse’)
cv (int) – Number of cross-validation folds
- Returns:
Scores for each fold
- Return type:
list
- Raises:
Exception – If scoring metric is invalid
- property feature_importances
Get feature importance scores.
- Returns:
Feature importance scores
- Return type:
pandas.DataFrame
- Raises:
Exception – If model hasn’t been fitted or doesn’t support feature importances
- fit(dataset, y=None)[source]
Fit the complete prediction pipeline.
- Parameters:
dataset – Input dataset containing crimes and study area
y – Ignored, present for scikit-learn compatibility
- Returns:
The fitted instance
- Return type:
self
- Raises:
Exception – If fitting fails
- property grid
Spatial grid used for mapping
- Type:
GeoDataFrame
- predict()[source]
Make predictions for the next time step.
- Returns:
Predictions for next time step
- Return type:
pandas.DataFrame
- set_fit_request(*, dataset: bool | None | str = '$UNCHANGED$') PredictionPipeline
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
dataset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
dataset
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictionPipeline
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- property stseries
Spatio-temporal series
- Type:
pandas.Series