Pipeline

Pipeline Module

This module provides the main pipeline functionality for crime prediction, including data loading, preprocessing, and model execution.

Example

>>> from predspot.pipeline import generate_testdata, run_prediction_pipeline
>>> crime_data, study_area = generate_testdata(10000, '2020-01-01', '2020-12-31')
>>> results = run_prediction_pipeline(crime_data, study_area)
predspot.pipeline.evaluate_pipeline(pipeline, scoring='r2', cv=5, debug=False)[source]

Evaluate the prediction pipeline using cross-validation.

Parameters:
  • pipeline (PredictionPipeline) – Fitted prediction pipeline

  • scoring (str, optional) – Scoring metric (‘r2’ or ‘mse’). Defaults to ‘r2’

  • cv (int, optional) – Number of cross-validation folds. Defaults to 5

  • debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

Cross-validation scores

Return type:

list

Example

>>> scores = evaluate_pipeline(fitted_pipeline, scoring='r2', cv=5)
predspot.pipeline.generate_testdata(n_points, start_time, end_time, debug=False)[source]

Generate synthetic crime data for testing.

Parameters:
  • n_points (int) – Number of crime incidents to generate

  • start_time (str) – Start date in ‘YYYY-MM-DD’ format

  • end_time (str) – End date in ‘YYYY-MM-DD’ format

  • debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

(crimes_df, study_area_gdf) - Generated crime data and study area

Return type:

tuple

Example

>>> crimes, area = generate_testdata(1000, '2020-01-01', '2020-12-31')
predspot.pipeline.run_prediction_pipeline(crime_data, study_area, crime_tags=None, time_range=None, tfreq='M', grid_resolution=250, debug=False)[source]

Run the complete crime prediction pipeline.

Parameters:
  • crime_data (pandas.DataFrame) – Crime incident data

  • study_area (geopandas.GeoDataFrame) – Study area boundaries

  • crime_tags (list, optional) – List of crime types to include

  • time_range (list, optional) – Time range as [‘HH:MM’, ‘HH:MM’]

  • tfreq (str, optional) – Time frequency (‘M’, ‘W’, ‘D’). Defaults to ‘M’

  • grid_resolution (float, optional) – Spatial grid resolution in km. Defaults to 250

  • debug (bool, optional) – Enable debug printing. Defaults to False

Returns:

(predictions, pipeline) - Predicted crime densities and fitted pipeline

Return type:

tuple

Raises:

ValueError – If input data is invalid or missing required columns