Dataset Preparation¶
Dataset Preparation Module
This module provides functionality for preparing and managing crime datasets along with their corresponding study areas. It handles spatial data processing and visualization of crime incidents.
- class predspot.dataset_preparation.Dataset(crimes, study_area, debug=False)[source]
Bases:
object
A class to handle crime datasets and their associated study areas.
- Parameters:
crimes (pandas.DataFrame) – DataFrame containing crime data with required columns: ‘tag’, ‘t’ (timestamp), ‘lon’ (longitude), and ‘lat’ (latitude)
study_area (geopandas.GeoDataFrame) – GeoDataFrame defining the study area boundaries
debug (bool, optional) – Enable debug printing. Defaults to False.
- crimes
Processed crime data with geometry
- Type:
geopandas.GeoDataFrame
- study_area
Study area boundaries
- Type:
geopandas.GeoDataFrame
- property crimes
Get the crime incidents data.
- Returns:
The processed crime incidents data
- Return type:
geopandas.GeoDataFrame
- plot(ax=None, crime_samples=1000, **kwargs)[source]
Plot the study area and crime incidents.
- Parameters:
ax (matplotlib.axes.Axes, optional) – Matplotlib axes for plotting
crime_samples (int, optional) – Number of crime samples to plot. Defaults to 1000
**kwargs – Additional keyword arguments for plotting study_area: kwargs for study area plot crimes: kwargs for crime incidents plot
- Returns:
The plot axes
- Return type:
matplotlib.axes.Axes
- property shape
Get the shapes of the dataset components.
- Returns:
Dictionary containing the shapes of crimes and study_area DataFrames
- Return type:
dict
- property study_area
Get the study area boundaries.
- Returns:
The study area boundaries
- Return type:
geopandas.GeoDataFrame
- train_test_split(test_size=0.25)[source]
Split the dataset into training and testing sets.
- Parameters:
test_size (float) – Proportion of the dataset to include in the test split. Must be between 0 and 1. Defaults to 0.25.
- Returns:
(train_dataset, test_dataset) - Two Dataset objects containing the splits
- Return type:
tuple
- Raises:
AssertionError – If test_size is not between 0 and 1