Pipeline¶

Pipeline Module

This module provides the main pipeline functionality for crime prediction, including data loading, preprocessing, and model execution.

Example

>>> from predspot.pipeline import generate_testdata, run_prediction_pipeline
>>> crime_data, study_area = generate_testdata(10000, '2020-01-01', '2020-12-31')
>>> results = run_prediction_pipeline(crime_data, study_area)

predspot.pipeline.evaluate_pipeline(pipeline, scoring='r2', cv=5, debug=False)[source]

Evaluate the prediction pipeline using cross-validation.

Parameters:

pipeline (PredictionPipeline) – Fitted prediction pipeline
scoring (str, optional) – Scoring metric (‘r2’ or ‘mse’). Defaults to ‘r2’
cv (int, optional) – Number of cross-validation folds. Defaults to 5
debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

Cross-validation scores

Return type:

list

Example

>>> scores = evaluate_pipeline(fitted_pipeline, scoring='r2', cv=5)

predspot.pipeline.generate_testdata(n_points, start_time, end_time, debug=False)[source]

Generate synthetic crime data for testing.

Parameters:

n_points (int) – Number of crime incidents to generate
start_time (str) – Start date in ‘YYYY-MM-DD’ format
end_time (str) – End date in ‘YYYY-MM-DD’ format
debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

(crimes_df, study_area_gdf) - Generated crime data and study area

Return type:

tuple

Example

>>> crimes, area = generate_testdata(1000, '2020-01-01', '2020-12-31')

predspot.pipeline.run_prediction_pipeline(crime_data, study_area, crime_tags=None, time_range=None, tfreq='M', grid_resolution=250, debug=False)[source]

Run the complete crime prediction pipeline.

Parameters:

crime_data (pandas.DataFrame) – Crime incident data
study_area (geopandas.GeoDataFrame) – Study area boundaries
crime_tags (list, optional) – List of crime types to include
time_range (list, optional) – Time range as [‘HH:MM’, ‘HH:MM’]
tfreq (str, optional) – Time frequency (‘M’, ‘W’, ‘D’). Defaults to ‘M’
grid_resolution (float, optional) – Spatial grid resolution in km. Defaults to 250
debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

(predictions, pipeline) - Predicted crime densities and fitted pipeline

Return type:

tuple

Raises:

ValueError – If input data is invalid or missing required columns

Predspot

Navigation

Related Topics

Pipeline¶