gridstats¶
gridstats is an Oceanum library for computing gridded statistics over large oceanographic and climate datasets. It is built on xarray and dask for lazy, out-of-core computation and is driven entirely by YAML configuration files.
Key features¶
- YAML-driven — define your full pipeline in a config file, run it with one command
- Out-of-core — processes arbitrarily large datasets via dask without loading into memory
- Extensible — register custom stat functions and loaders with a simple decorator or entry point
- Multiple output formats — write results to NetCDF or Zarr
- CF-compliant metadata — output variables are automatically annotated with standard names, units and long names
Quick example¶
Write a config file:
config.yml
source:
type: xarray
urlpath: gs://my-bucket/hindcast.zarr
engine: zarr
mapping:
tps: tp # rename variables on load
output:
outfile: ./wave_stats.zarr
calls:
- func: mean
dim: time
data_vars: [hs, tp]
- func: quantile
dim: time
data_vars: [hs]
q: [0.5, 0.90, 0.95, 0.99]
- func: rpv
dim: time
data_vars: [hs]
return_periods: [10, 50, 100]
distribution: gumbel_r
duration: 24
Run it:
Installation¶
To include intake catalog support and cloud storage (GCS):
Next steps¶
- Getting Started — a complete walkthrough
- Configuration — full YAML schema reference
- Operations — all built-in stat functions
- Custom Plugins — add your own loaders and stats