Pipeline¶
Pipeline Module
This module provides the main pipeline functionality for crime prediction, including data loading, preprocessing, and model execution.
Example
>>> from predspot.pipeline import generate_testdata, run_prediction_pipeline
>>> crime_data, study_area = generate_testdata(10000, '2020-01-01', '2020-12-31')
>>> results = run_prediction_pipeline(crime_data, study_area)
- predspot.pipeline.evaluate_pipeline(pipeline, scoring='r2', cv=5, debug=False)[source]
Evaluate the prediction pipeline using cross-validation.
- Parameters:
pipeline (PredictionPipeline) – Fitted prediction pipeline
scoring (str, optional) – Scoring metric (‘r2’ or ‘mse’). Defaults to ‘r2’
cv (int, optional) – Number of cross-validation folds. Defaults to 5
debug (bool, optional) – Enable debug printing. Defaults to False
- Returns:
Cross-validation scores
- Return type:
list
Example
>>> scores = evaluate_pipeline(fitted_pipeline, scoring='r2', cv=5)
- predspot.pipeline.generate_testdata(n_points, start_time, end_time, debug=False)[source]
Generate synthetic crime data for testing.
- Parameters:
n_points (int) – Number of crime incidents to generate
start_time (str) – Start date in ‘YYYY-MM-DD’ format
end_time (str) – End date in ‘YYYY-MM-DD’ format
debug (bool, optional) – Enable debug printing. Defaults to False
- Returns:
(crimes_df, study_area_gdf) - Generated crime data and study area
- Return type:
tuple
Example
>>> crimes, area = generate_testdata(1000, '2020-01-01', '2020-12-31')
- predspot.pipeline.run_prediction_pipeline(crime_data, study_area, crime_tags=None, time_range=None, tfreq='M', grid_resolution=250, debug=False)[source]
Run the complete crime prediction pipeline.
- Parameters:
crime_data (pandas.DataFrame) – Crime incident data
study_area (geopandas.GeoDataFrame) – Study area boundaries
crime_tags (list, optional) – List of crime types to include
time_range (list, optional) – Time range as [‘HH:MM’, ‘HH:MM’]
tfreq (str, optional) – Time frequency (‘M’, ‘W’, ‘D’). Defaults to ‘M’
grid_resolution (float, optional) – Spatial grid resolution in km. Defaults to 250
debug (bool, optional) – Enable debug printing. Defaults to False
- Returns:
(predictions, pipeline) - Predicted crime densities and fitted pipeline
- Return type:
tuple
- Raises:
ValueError – If input data is invalid or missing required columns