Skip to content

Binning Module

The binning module provides functions to aggregate and analyze spatial data using various discrete global grid systems (DGGS) and arbitrary polygons. It supports statistical analysis, data categorization, and spatial binning operations.

Overview

The binning module supports multiple types of spatial binning:

  • DGGS-based binning: Bin data into hexagonal, triangular, or square grid cells
  • Polygon-based binning: Bin data into arbitrary polygon features
  • Statistical aggregation: Compute various statistics on binned data
  • Categorical analysis: Group and analyze data by categories

Supported Statistics

The module supports the following statistical operations:

  • count: Number of points in each bin
  • sum: Sum of numeric values
  • min: Minimum value
  • max: Maximum value
  • mean: Arithmetic mean
  • median: Median value
  • std: Standard deviation
  • var: Variance
  • range: Range (max - min)
  • minority: Least frequent value
  • majority: Most frequent value
  • variety: Number of unique values

Supported DGGS Systems

The module supports binning with the following discrete global grid systems:

  • H3: Uber's hexagonal hierarchical geospatial indexing system
  • S2: Google's spherical geometry library
  • A5: Pentagonal DGGS
  • RHEALPix: Equal-area hierarchical triangular mesh
  • ISEA4T: Icosahedral Snyder Equal Area Aperture 4 Triangle
  • QTM: Quaternary Triangular Mesh
  • OLC: Open Location Code (Plus Codes)
  • Geohash: Hierarchical spatial data structure
  • Tilecode: Hierarchical tiling system
  • Quadkey: Microsoft's hierarchical spatial index

Core Functions

H3 Binning

h3_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Binning via H3 grid generation within points' bbox + spatial join, then pandas groupby. Supports custom stats (range, variety, minority, majority). Non-point geometries are ignored.

S2 Binning

s2_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Grid + spatial join + groupby approach for S2 binning (like a5bin).

s2bin_cli()

Command-line interface for s2bin conversion.

This function provides a command-line interface for binning point data to S2 grid cells. It parses command-line arguments and calls the main s2bin function.

Usage

python s2bin.py -i input.shp -r 10 -stats count -f geojson -o output.geojson

Parameters:

Name Type Description Default
-i, --input

Input file path, URL, or other vector file formats

required
-r, --resolution

S2 resolution [0..30]

required
-stats, --statistics

Statistic to compute (count, min, max, sum, mean, median, std, var, range, minority, majority, variety)

required
-category, --category

Optional category field for grouping

required
-field, --field

Numeric field to compute statistics (required if stats != 'count')

required
-o, --output

Output file path (optional, will auto-generate if not provided)

required
-f, --output_format

Output output_format (geojson, gpkg, parquet, csv, shapefile)

required
Example

Bin shapefile to S2 cells at resolution 10 with count statistics

python s2bin.py -i cities.shp -r 10 -stats count -f geojson

A5 Binning

a5_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin point data into A5 grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.

Returns a GeoDataFrame with A5 cell stats and geometry.

a5bin(data, resolution, stats='count', category=None, numeric_field=None, output_format='gpd', **kwargs)

Bin point data into A5 grid cells and compute statistics from various input formats.

This is the main function that handles binning of point data to A5 grid cells. It supports multiple input formats including file paths, URLs, DataFrames, GeoDataFrames, GeoJSON dictionaries, and lists of features.

Parameters:

Name Type Description Default
data

Input data in one of the following formats: - File path (str): Path to vector file (shapefile, GeoJSON, etc.) - URL (str): URL to vector data - pandas.DataFrame: DataFrame with geometry column - geopandas.GeoDataFrame: GeoDataFrame - dict: GeoJSON dictionary - list: List of GeoJSON feature dictionaries

required
resolution int

A5 resolution level [0..29] (0=coarsest, 29=finest)

required
stats str

Statistic to compute:
- 'count': Count of points in each cell - 'sum': Sum of field values - 'min': Minimum field value - 'max': Maximum field value - 'mean': Mean field value - 'median': Median field value - 'std': Standard deviation of field values - 'var': Variance of field values - 'range': Range of field values - 'minority': Least frequent value - 'majority': Most frequent value - 'variety': Number of unique values

'count'
category str

Category field for grouping statistics

None
numeric_field str

Numeric field to compute statistics (required if stats != 'count')

None
output_format str

Output output_format ('geojson', 'gpkg', 'parquet', 'csv', 'shapefile')

'gpd'
output_path str

Output file path. If None, uses default naming

required
**kwargs

Additional arguments passed to geopandas read functions

{}

Returns:

Type Description

dict or str: Output in the specified output_format. Returns file path if output_path is specified,

otherwise returns the data directly.

Raises:

Type Description
ValueError

If input data type is not supported or conversion fails

TypeError

If resolution is not an integer

Example

Bin from file

result = a5bin("cities.shp", 10, "count")

Bin from GeoDataFrame

import geopandas as gpd gdf = gpd.read_file("cities.shp") result = a5bin(gdf, 10, "mean", numeric_field="population")

Bin from GeoJSON dict

geojson = {"type": "FeatureCollection", "features": [...]} result = a5bin(geojson, 10, "sum", numeric_field="value")

a5bin_cli()

Command-line interface for a5bin conversion.

This function provides a command-line interface for binning point data to A5 grid cells. It parses command-line arguments and calls the main a5bin function.

Usage

python a5bin.py -i input.shp -r 10 -stats count -f geojson -o output.geojson

Parameters:

Name Type Description Default
-i, --input

Input file path, URL, or other vector file formats

required
-r, --resolution

A5 resolution [0..29]

required
-stats, --statistics

Statistic to compute (count, min, max, sum, mean, median, std, var, range, minority, majority, variety)

required
-category, --category

Optional category field for grouping

required
-field, --field

Numeric field to compute statistics (required if stats != 'count')

required
-o, --output

Output file path (optional, will auto-generate if not provided)

required
-f, --output_format

Output output_format (geojson, gpkg, parquet, csv, shapefile)

required
Example

Bin shapefile to A5 cells at resolution 10 with count statistics

python a5bin.py -i cities.shp -r 10 -stats count -f geojson

RHEALPix Binning

ISEA4T Binning

isea4t_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin point data into ISEA4T grid cells using grid generation + spatial join and aggregate with pandas groupby. Supports custom stats (range, variety, minority, majority). Only Point/MultiPoint geometries are considered.

QTM Binning

OLC Binning

olc_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin point data into OLC grid cells using grid generation + spatial join and aggregate with pandas groupby. Supports custom stats (range, variety, minority, majority). Only Point/MultiPoint geometries are considered.

Geohash Binning

geohash_bin(data, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin point data into Geohash grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.

Returns a GeoDataFrame with Geohash cell stats and geometry.

Tilecode Binning

Quadkey Binning

Polygon Binning

polygon_bin(polygon_data, point_data, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin points into provided polygons using spatial join + pandas groupby aggregation. No grid generation is performed; the input polygons are used directly.

DGGAL Binning

dggal_bin(data, dggs_type, resolution, stats='count', category=None, numeric_field=None, lat_col='lat', lon_col='lon', **kwargs)

Bin point data into DGGAL grid cells and compute statistics using a single grid generation + spatial join, followed by pandas groupby aggregation.

This avoids per-point subprocess calls and is significantly faster.

Returns a GeoDataFrame with DGGAL cell stats and geometry.

dggalbin(data, dggs_type, resolution, stats='count', category=None, numeric_field=None, output_format='gpd', **kwargs)

Bin point data into DGGAL grid cells and compute statistics from various input formats.

dggalbin_cli()

Command-line interface for DGGAL binning.

Usage Examples

Basic H3 Binning

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from vgrid.binning import h3bin

# Bin points by count
result = h3bin(
    data="points.geojson",
    resolution=8,
    stats="count"
)

# Bin points with numeric statistics
result = h3bin(
    data="points.geojson",
    resolution=8,
    stats="mean",
    numeric_field="temperature"
)

Categorical Binning

1
2
3
4
5
6
7
8
9
from vgrid.binning import h3bin

# Bin by category with statistics
result = h3bin(
    data="points.geojson",
    resolution=8,
    stats="count",
    category="type"
)

Polygon Binning

1
2
3
4
5
6
7
8
from vgrid.binning import polygonbin

# Bin points into administrative boundaries
result = polygonbin(
    polygon_data="admin_boundaries.geojson",
    point_data="points.geojson",
    stats="count"
)

Output Formats

All binning functions support multiple output formats:

  • GeoPandas: gpd / geopandas / gdf / geodataframe (returns a GeoDataFrame)
  • GeoJSON dict: geojson_dict / json_dict (returns a Python dict)
  • GeoJSON file: geojson / json
  • Shapefile: shapefile / shp
  • GeoPackage: gpkg / geopackage
  • Parquet: parquet / geoparquet
  • CSV: csv
1
2
3
4
5
6
7
8
9
from vgrid.binning import h3bin

# Return a GeoDataFrame
gdf = h3bin(
    data="points.geojson",
    resolution=8,
    stats="count",
    output_format="gpd"
)

Parameters

Common Parameters

  • data: Input data (file path, DataFrame, GeoDataFrame, or GeoJSON)
  • resolution: Grid resolution (integer, range varies by DGGS)
  • stats: Statistical operation to perform
  • category: Optional field for categorical grouping
  • numeric_field: Field name for numeric calculations
  • lat_col: Latitude column name (default: "lat")
  • lon_col: Longitude column name (default: "lon")
  • output_format: Output file format

Note: When writing files, outputs are saved in the current folder with an auto-generated name like <base>_<system>bin_<resolution>.<ext>.

Notes on specific functions:

  • polygonbin: uses parameter names stat and field_name (instead of stats and numeric_field).
  • dggalbin: requires dggs_type to specify the model (e.g., gnosis, isea3h, rhealpix, ...).

Resolution Ranges

  • H3: 0-15
  • S2: 0-30
  • A5: 0-29
  • RHEALPix: 0-15
  • ISEA4T: 0-39
  • QTM: 1-24
  • OLC: 2, 4, 6, 8, 10-15
  • Geohash: 1-12
  • Tilecode: 0-29
  • Quadkey: 0-29

Data Input Formats

The binning functions accept various input formats:

  • File paths: .geojson, .shp, .gpkg, .csv, .parquet
  • GeoDataFrame: Direct GeoPandas DataFrame
  • DataFrame: Pandas DataFrame with lat/lon columns
  • GeoJSON: Dictionary or string representation
  • List of dictionaries: Point features with coordinates

Output

All binning functions return or write a dataset that includes:

  • geometry: Cell or polygon geometries
  • DGGS identifier: Column name varies by system (e.g., h3, s2, a5, isea4t, geohash, tilecode, quadkey; zoneID for DGGAL). For polygon binning, original polygon attributes are preserved.
  • resolution: Grid resolution used (for DGGS binning)
  • statistics: Computed statistical values
  • category columns: Per-category statistics when category is provided (e.g., <category>_count, <category>_mean)

All outputs use CRS EPSG:4326 by default and can be exported to multiple spatial formats.