Reads OBIS occurrences, bins them to H3 cells, and writes an authoritative
DuckDB file with two layers consumed by the h3t tile service:
idx_h3(res, cell_id, n, sp, shannon, simpson, es)— precomputed all-taxa indicators for resolutions 1-7 (fast default tile layers).occ_h3(res, cell_id, aphiaid, phylum, class, "order", family, genus, species, date_year, records)— species-level counts at resolution tiers 3/5/7 for on-the-fly taxon/year-filtered queries.
Usage
build_obis_h3_duckdb(
src,
path_duckdb,
region_bbox = NULL,
esn = 50L,
s3_region = "us-east-1",
s3_anonymous = TRUE,
memory_limit = NULL,
threads = NULL,
temp_dir = NULL,
overwrite = TRUE
)Arguments
- src
occurrence source. Either a
data.frameof occurrences, or a character vector of parquet path(s)/glob(s) readable by DuckDB (e.g."s3://obis-open-data/occurrence/*.parquet"). Must expose columnsdecimalLongitude,decimalLatitude,speciesand (optionally)aphiaid,phylum,class,order,family,genus,date_year,records,dropped,absence. Missing taxonomic columns are filled NULL.- path_duckdb
output DuckDB file path.
- region_bbox
optional
c(lon_min, lat_min, lon_max, lat_max)to restrict to a region (recommended for a first/demo build).- esn
expected sample size for ES(n); default 50 (matches ES50).
- s3_region
AWS region for
s3://sources (default"us-east-1").- s3_anonymous
use anonymous S3 access for public buckets (default TRUE).
- memory_limit
optional DuckDB
memory_limit(e.g."10GB"). Strongly recommended whensrcis a parquet/S3 glob: a global OBIS scan will otherwise exhaust RAM and can wedge the host. Leave a few GB headroom below physical RAM.- threads
optional DuckDB thread cap (e.g.
2L) to bound CPU/RAM.- temp_dir
optional directory for DuckDB to spill to disk when it exceeds
memory_limit. Needs ample free space (a global build can spill many GB); point it at a roomy volume, not/tmp.- overwrite
overwrite an existing
path_duckdb(default TRUE).
Details
The indicator math (ES50, Shannon, Simpson, richness) is the SQL translation
of calc_indicators() (esn = 50 by default), validated by the package
tests.