Binning Module¶
The binning module provides functions to aggregate and analyze spatial data using various discrete global grid systems (DGGS) and arbitrary polygons. It supports statistical analysis, data categorization, and spatial binning operations.
Overview¶
The binning module supports multiple types of spatial binning:
- DGGS-based binning: Bin data into hexagonal, triangular, or square grid cells
- Polygon-based binning: Bin data into arbitrary polygon features
- Statistical aggregation: Compute various statistics on binned data
- Categorical analysis: Group and analyze data by categories
Supported Statistics¶
The module supports the following statistical operations:
- count: Number of points in each bin
- sum: Sum of numeric values
- min: Minimum value
- max: Maximum value
- mean: Arithmetic mean
- median: Median value
- std: Standard deviation
- var: Variance
- range: Range (max - min)
- minority: Least frequent value
- majority: Most frequent value
- variety: Number of unique values
Supported DGGS Systems¶
The module supports binning with the following discrete global grid systems:
- H3: Uber's hexagonal hierarchical geospatial indexing system
- S2: Google's spherical geometry library
- A5: Pentagonal DGGS
- RHEALPix: Equal-area hierarchical triangular mesh
- ISEA4T: Icosahedral Snyder Equal Area Aperture 4 Triangle
- QTM: Quaternary Triangular Mesh
- OLC: Open Location Code (Plus Codes)
- Geohash: Hierarchical spatial data structure
- Tilecode: Hierarchical tiling system
- Quadkey: Microsoft's hierarchical spatial index
Core Functions¶
H3 Binning¶
h3_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Binning via H3 grid generation within points' bbox + spatial join, then pandas groupby. Supports custom stats (range, variety, minority, majority). Non-point geometries are ignored.
S2 Binning¶
s2_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Grid + spatial join + groupby approach for S2 binning (like a5bin).
s2bin_cli()
¶
Command-line interface for s2bin conversion.
This function provides a command-line interface for binning point data to S2 grid cells. It parses command-line arguments and calls the main s2bin function.
Usage
python s2bin.py -i input.shp -r 10 -stats count -f geojson -o output.geojson
Parameters:
Name | Type | Description | Default |
---|---|---|---|
-i,
|
--input
|
Input file path, URL, or other vector file formats |
required |
-r,
|
--resolution
|
S2 resolution [0..30] |
required |
-stats,
|
--statistics
|
Statistic to compute (count, min, max, sum, mean, median, std, var, range, minority, majority, variety) |
required |
-category,
|
--category
|
Optional category field for grouping |
required |
-field,
|
--field
|
Numeric field to compute statistics (required if stats != 'count') |
required |
-o,
|
--output
|
Output file path (optional, will auto-generate if not provided) |
required |
-f,
|
--output_format
|
Output output_format (geojson, gpkg, parquet, csv, shapefile) |
required |
A5 Binning¶
a5_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin point data into A5 grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.
Returns a GeoDataFrame with A5 cell stats and geometry.
a5bin(data, resolution, stats='count', category=None, numeric_field=None, output_format='gpd', **kwargs)
¶
Bin point data into A5 grid cells and compute statistics from various input formats.
This is the main function that handles binning of point data to A5 grid cells. It supports multiple input formats including file paths, URLs, DataFrames, GeoDataFrames, GeoJSON dictionaries, and lists of features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Input data in one of the following formats: - File path (str): Path to vector file (shapefile, GeoJSON, etc.) - URL (str): URL to vector data - pandas.DataFrame: DataFrame with geometry column - geopandas.GeoDataFrame: GeoDataFrame - dict: GeoJSON dictionary - list: List of GeoJSON feature dictionaries |
required | |
resolution
|
int
|
A5 resolution level [0..29] (0=coarsest, 29=finest) |
required |
stats
|
str
|
Statistic to compute: |
'count'
|
category
|
str
|
Category field for grouping statistics |
None
|
numeric_field
|
str
|
Numeric field to compute statistics (required if stats != 'count') |
None
|
output_format
|
str
|
Output output_format ('geojson', 'gpkg', 'parquet', 'csv', 'shapefile') |
'gpd'
|
output_path
|
str
|
Output file path. If None, uses default naming |
required |
**kwargs
|
Additional arguments passed to geopandas read functions |
{}
|
Returns:
Type | Description |
---|---|
dict or str: Output in the specified output_format. Returns file path if output_path is specified, |
|
otherwise returns the data directly. |
Raises:
Type | Description |
---|---|
ValueError
|
If input data type is not supported or conversion fails |
TypeError
|
If resolution is not an integer |
Example
Bin from file¶
result = a5bin("cities.shp", 10, "count")
Bin from GeoDataFrame¶
import geopandas as gpd gdf = gpd.read_file("cities.shp") result = a5bin(gdf, 10, "mean", numeric_field="population")
Bin from GeoJSON dict¶
geojson = {"type": "FeatureCollection", "features": [...]} result = a5bin(geojson, 10, "sum", numeric_field="value")
a5bin_cli()
¶
Command-line interface for a5bin conversion.
This function provides a command-line interface for binning point data to A5 grid cells. It parses command-line arguments and calls the main a5bin function.
Usage
python a5bin.py -i input.shp -r 10 -stats count -f geojson -o output.geojson
Parameters:
Name | Type | Description | Default |
---|---|---|---|
-i,
|
--input
|
Input file path, URL, or other vector file formats |
required |
-r,
|
--resolution
|
A5 resolution [0..29] |
required |
-stats,
|
--statistics
|
Statistic to compute (count, min, max, sum, mean, median, std, var, range, minority, majority, variety) |
required |
-category,
|
--category
|
Optional category field for grouping |
required |
-field,
|
--field
|
Numeric field to compute statistics (required if stats != 'count') |
required |
-o,
|
--output
|
Output file path (optional, will auto-generate if not provided) |
required |
-f,
|
--output_format
|
Output output_format (geojson, gpkg, parquet, csv, shapefile) |
required |
RHEALPix Binning¶
ISEA4T Binning¶
isea4t_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin point data into ISEA4T grid cells using grid generation + spatial join and aggregate with pandas groupby. Supports custom stats (range, variety, minority, majority). Only Point/MultiPoint geometries are considered.
QTM Binning¶
OLC Binning¶
olc_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin point data into OLC grid cells using grid generation + spatial join and aggregate with pandas groupby. Supports custom stats (range, variety, minority, majority). Only Point/MultiPoint geometries are considered.
Geohash Binning¶
geohash_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin point data into Geohash grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.
Returns a GeoDataFrame with Geohash cell stats and geometry.
Tilecode Binning¶
Quadkey Binning¶
Polygon Binning¶
polygon_bin(polygon_data, point_data, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin points into provided polygons using spatial join + pandas groupby aggregation. No grid generation is performed; the input polygons are used directly.
DGGAL Binning¶
dggal_bin(data, dggs_type, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)
¶
Bin point data into DGGAL grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.
This avoids per-point subprocess calls and is significantly faster.
Returns a GeoDataFrame with DGGAL cell stats and geometry.
dggalbin(data, dggs_type, resolution, stats='count', category=None, numeric_field=None, output_format='gpd', **kwargs)
¶
Bin point data into DGGAL grid cells and compute statistics from various input formats.
dggalbin_cli()
¶
Command-line interface for DGGAL binning.
Usage Examples¶
Basic H3 Binning¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Categorical Binning¶
1 2 3 4 5 6 7 8 9 |
|
Polygon Binning¶
1 2 3 4 5 6 7 8 |
|
Output Formats¶
All binning functions support multiple output formats:
- GeoPandas:
gpd
/geopandas
/gdf
/geodataframe
(returns a GeoDataFrame) - GeoJSON dict:
geojson_dict
/json_dict
(returns a Python dict) - GeoJSON file:
geojson
/json
- Shapefile:
shapefile
/shp
- GeoPackage:
gpkg
/geopackage
- Parquet:
parquet
/geoparquet
- CSV:
csv
1 2 3 4 5 6 7 8 9 |
|
Parameters¶
Common Parameters¶
- data: Input data (file path, DataFrame, GeoDataFrame, or GeoJSON)
- resolution: Grid resolution (integer, range varies by DGGS)
- stats: Statistical operation to perform
- category: Optional field for categorical grouping
- numeric_field: Field name for numeric calculations
- lat_col: Latitude column name (default: "lat")
- lon_col: Longitude column name (default: "lon")
- output_format: Output file format
Note: When writing files, outputs are saved in the current folder with an auto-generated name like <base>_<system>bin_<resolution>.<ext>
.
Notes on specific functions:
polygonbin
: uses parameter namesstat
andfield_name
(instead ofstats
andnumeric_field
).dggalbin
: requiresdggs_type
to specify the model (e.g.,gnosis
,isea3h
,rhealpix
, ...).
Resolution Ranges¶
- H3: 0-15
- S2: 0-30
- A5: 0-29
- RHEALPix: 0-15
- ISEA4T: 0-39
- QTM: 1-24
- OLC: 2, 4, 6, 8, 10-15
- Geohash: 1-12
- Tilecode: 0-29
- Quadkey: 0-29
Data Input Formats¶
The binning functions accept various input formats:
- File paths:
.geojson
,.shp
,.gpkg
,.csv
,.parquet
- GeoDataFrame: Direct GeoPandas DataFrame
- DataFrame: Pandas DataFrame with lat/lon columns
- GeoJSON: Dictionary or string representation
- List of dictionaries: Point features with coordinates
Output¶
All binning functions return or write a dataset that includes:
- geometry: Cell or polygon geometries
- DGGS identifier: Column name varies by system (e.g.,
h3
,s2
,a5
,isea4t
,geohash
,tilecode
,quadkey
;zoneID
for DGGAL). For polygon binning, original polygon attributes are preserved. - resolution: Grid resolution used (for DGGS binning)
- statistics: Computed statistical values
- category columns: Per-category statistics when
category
is provided (e.g.,<category>_count
,<category>_mean
)
All outputs use CRS EPSG:4326
by default and can be exported to multiple spatial formats.