Dataset Preparation

Dataset Preparation Module

This module provides functionality for preparing and managing crime datasets along with their corresponding study areas. It handles spatial data processing and visualization of crime incidents.

class predspot.dataset_preparation.Dataset(crimes, study_area, debug=False)[source]

Bases: object

A class to handle crime datasets and their associated study areas.

Parameters:
  • crimes (pandas.DataFrame) – DataFrame containing crime data with required columns: ‘tag’, ‘t’ (timestamp), ‘lon’ (longitude), and ‘lat’ (latitude)

  • study_area (geopandas.GeoDataFrame) – GeoDataFrame defining the study area boundaries

  • debug (bool, optional) – Enable debug printing. Defaults to False.

crimes

Processed crime data with geometry

Type:

geopandas.GeoDataFrame

study_area

Study area boundaries

Type:

geopandas.GeoDataFrame

property crimes

Get the crime incidents data.

Returns:

The processed crime incidents data

Return type:

geopandas.GeoDataFrame

plot(ax=None, crime_samples=1000, **kwargs)[source]

Plot the study area and crime incidents.

Parameters:
  • ax (matplotlib.axes.Axes, optional) – Matplotlib axes for plotting

  • crime_samples (int, optional) – Number of crime samples to plot. Defaults to 1000

  • **kwargs – Additional keyword arguments for plotting study_area: kwargs for study area plot crimes: kwargs for crime incidents plot

Returns:

The plot axes

Return type:

matplotlib.axes.Axes

property shape

Get the shapes of the dataset components.

Returns:

Dictionary containing the shapes of crimes and study_area DataFrames

Return type:

dict

property study_area

Get the study area boundaries.

Returns:

The study area boundaries

Return type:

geopandas.GeoDataFrame

train_test_split(test_size=0.25)[source]

Split the dataset into training and testing sets.

Parameters:

test_size (float) – Proportion of the dataset to include in the test split. Must be between 0 and 1. Defaults to 0.25.

Returns:

(train_dataset, test_dataset) - Two Dataset objects containing the splits

Return type:

tuple

Raises:

AssertionError – If test_size is not between 0 and 1