Changelog¶
2.3.0 (unreleased)¶
2.2.0 (2026-05-04)¶
New Features¶
-
modestat — new generic histogram-based mode stat. Returns the centre of the most-occupied bin along the reduction dimension. Accepts explicitbinsedges and an optionalweight_varfor energy- or Hs-weighted modes. Supports the samegroupparameter as all other aggregations. Particularly suited for ordinal data such as the Douglas Sea Scale — pass half-integer edges (e.g.[-0.5, 0.5, ..., 9.5]) to get one bin per integer degree. Dask-compatible viaapply_ufuncwithvectorize=True. -
Output masking — new
mask:block on the output config applies a spatial mask to all output variables before writing. Two types are supported:notnull(keep points where a source variable is non-null) andthreshold(keep points satisfying a numerical condition). Both accept an optionalisel:dict to reduce the source variable to a 2-D slice before the mask is computed, and xarray broadcasting ensures the mask is applied correctly across any extra dimensions (time, quantile, direction).
Documentation¶
- Douglas scales — corrected scale descriptions in the
douglas_seadocs table (e.g. "Rippled" → "Calm (rippled)", "Wavelets" → "Smooth") and added a full degree/description/range table to thedouglas_swelldocs, including a note on degree 9. - Crossing seas — rewrote the
crossing_seasdocs section: added a direction convention warning (all inputs must use the same coming-from or going-to convention), guidance on threshold choices (Li 2016 defaults and operational variants), and the full Li (2016) reference with DOI.
Bug Fixes¶
- Douglas sea scale — fixed an off-by-one error where all degrees were shifted one lower than the correct Douglas classification (e.g. a 3 m sea was reported as degree 4 "moderate" instead of degree 5 "rough"). Degree 9 (phenomenal, Hs > 14 m) was also never assigned.
- Crossing seas — changed
hs_thresholddefault from0.0to0.5m. A zero threshold allowed the function to flag crossing seas in near-calm conditions where partition directions are undefined and noisy. The Li (2016) criterion is meaningful only when both systems are energetically significant;0.5m is the standard operational floor.
Improvements¶
- Douglas sea scale — replaced the 9-iteration
xr.whereloop with a singlexr.apply_ufunc(np.digitize, …)call. Produces a flat dask graph (one task per chunk instead of nine) and runs ~1.5× faster on eager arrays; gains are larger under dask where the stacked task graph was a scheduling bottleneck. - Douglas swell scale —
DOUGLAS_SWELL_BINSreplaced withDOUGLAS_SWELL_INTERVALS, a degree-keyed dict ofpd.Intervalpairs matching the original onstats definition. Degree 9 ("confused") is documented as a crossing-seas condition that cannot be assigned from height/wavelength thresholds alone; it was also unreachable in the original (pd.Interval(inf, inf)is never satisfied by any finite value). - Douglas swell scale — replaced the 9-iteration
xr.whereloop withxr.apply_ufunc(_classify_swell)backed by twonp.digitizecalls and a 3×3 LUT. Eager compute cost is unchanged (2D classification has similar work to the loop), but the dask task graph shrinks ~7× (54 → 8 tasks per chunk), reducing graph construction and scheduler overhead on large multi-chunk hindcasts.
2.1.0 (2026-05-01)¶
New Features¶
- Derived variables — new top-level
gridstats.derivedsubpackage computes secondary variables (wind speed/direction, current speed/direction, wave parameters, wave orbital velocity, sky conditions) directly from source data before stats are applied. - Modal direction — new
modal_directionstat implemented as a vectorised ufunc. - Zarr append / consolidate — new
appendandconsolidateoutput options enable parallel Zarr writes across tiled or distributed runs. - Custom dataset metadata — new
global_attrsfield on the output config block lets you inject arbitrary CF-compliant global attributes into the output dataset.
Bug Fixes¶
- Derived variables were silently dropped when
data_varswas set on a call; this is now fixed so derived variables survive variable selection.
Improvements¶
- Info-level logging added for loader calls and derived variable definitions, making it easier to trace pipeline execution.
- Docs restructured: configuration, loaders, and derived variables each have their own section with per-topic pages; MathJax support added for equation rendering.
2.0.0 (2026-03-25)¶
A near-complete rewrite that renamed the library from onstats to gridstats and replaced the legacy architecture with a modern, well-typed stack.
Breaking Changes¶
- Library renamed from
onstatstogridstats; all imports must be updated. - Configuration schema rewritten using Pydantic v2 — existing YAML configs require migration (see the Configuration docs).
slice_dictreplaced by explicitsel/iselfields on source config.- Source config now uses a discriminated union: set
urlpathfor xarray orcatalog+dataset_idfor intake — the oldtype:field is gone. - CLI commands changed:
gridstats run config.ymlandgridstats list-stats(powered by Typer instead of the old argparse CLI).
New Features¶
- Pipeline orchestrator — new
Pipelineclass handles loading, stat dispatch, tiling, directional sectorisation, and output in a single object. - Registry with entry-point plugins —
@register_stat/@register_loaderdecorators; third-party packages can contribute stats or loaders via thegridstats.stats/gridstats.loadersentry-point groups. - Spatial tiling —
tiles:on any call keeps peak memory bounded for large grids. - flox integration —
use_floxper-call toggle (defaultTrue); automatically disabled forquantileto avoid ~2× memory overhead. - open_kwargs on
XarraySourceConfig— pass arbitrary kwargs toxarray.open_dataset. - CF attributes —
gridstats/attributes.ymlmaps variable names to CF standard_name / long_name / units; applied automatically on output. - Packaging migrated to
pyproject.toml; mkdocs documentation site published to GitHub Pages; GitHub Actions CI/CD added.
1.2.0 (2022-06-26)¶
New Features¶
- CF attribute support — variable attributes (standard_name, long_name, units) can now be declared in config and are written to the output dataset.
- Global attributes — pipeline-level
global_attrsfor dataset-level metadata.
Bug Fixes¶
- Fixed crash with non-uniform chunk sizes during output writing.
1.1.0 (2022-05-17)¶
New Features¶
- Probability stats — new module with exceedance and non-exceedance probability
functions (
pexc,pcount). - Allow transposing, fixing dtype, and setting chunks on output variables.
- Dask cluster can now be enabled or disabled individually per call.
chunkscan be specified when opening datasets from an intake catalog.
Bug Fixes¶
- Fixed chunking when applying a land mask.
- Fixed bug in distribution function output.
Internal Changes¶
- Removed deprecated KMZ stats code.
1.0.0 (2021-11-25)¶
New Features¶
- Return period values (RPV) — new
rpvstats module supporting groupby time aggregations. - Distribution stats — histogram / empirical distribution functions.
- Count function — count non-NaN observations.
- Directional sectorisation —
nsector/ directional bin support for computing stats per direction sector. - Stepwise decorator — compute stats in time chunks to bound peak memory on large datasets.
- Dask cluster — explicit cluster setup with configurable worker count (including
all/halfshortcuts). - Logging throughout the pipeline.
- Refactored to a CLI entry point.
0.1.0 (2021-06-18)¶
First release on PyPI.