Skip to content

Aggregations

Standard reduction operations wrapping xarray's built-in methods. All support temporal grouping via group.


mean

Arithmetic mean along dim.

- func: mean
  dim: time
  data_vars: [hs, tp, wspd]
  group: month           # optional

max

Maximum value along dim.

- func: max
  dim: time
  data_vars: [hs]
  group: season

min

Minimum value along dim.

- func: min
  dim: time
  data_vars: [hs]

std

Standard deviation along dim.

- func: std
  dim: time
  data_vars: [hs, tp]

count

Count of non-NaN values along dim. Useful as a data-availability metric.

- func: count
  dim: time
  data_vars: [hs]

quantile

Quantiles at one or more levels.

Parameter Type Default Description
q list[float] Quantile levels in [0, 1].

The output has a quantile dimension.

- func: quantile
  dim: time
  data_vars: [hs]
  q: [0.5, 0.75, 0.90, 0.95, 0.99]
  chunks:
    time: -1         # quantile requires the full time axis in one chunk
    latitude: 50
    longitude: 50
  tiles:
    latitude: 10     # process 10 rows at a time if memory is tight
  use_flox: false    # flox uses ~2× memory for quantile; disable on large grids

Note

quantile loads the entire time axis into memory per spatial chunk. Use tiles to limit peak memory usage on large grids.

flox memory usage

When flox is installed, xarray uses it for groupby reductions by default. For most operations this is faster, but for quantile flox's implementation uses approximately 2× more memory than the native xarray path. On large grids (e.g. global or regional hindcasts) this can cause out-of-memory errors. Set use_flox: false on any quantile call that processes a large dataset.


pcount

Percentage of non-NaN values (0–100). Indicates data coverage.

- func: pcount
  dim: time
  data_vars: [hs, tp]

API reference

gridstats.ops.aggregations.mean(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset

Arithmetic mean along a dimension.

Wraps xr.Dataset.mean. Any extra keyword arguments are forwarded to xarray.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
**kwargs Any

Forwarded to xr.Dataset.mean (e.g. skipna, keep_attrs).

{}

Returns:

Type Description
Dataset

Reduced dataset. Gains a group dimension when group is set.

gridstats.ops.aggregations.max(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset

Maximum value along a dimension.

Wraps xr.Dataset.max. Any extra keyword arguments are forwarded to xarray.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
**kwargs Any

Forwarded to xr.Dataset.max (e.g. skipna, keep_attrs).

{}

Returns:

Type Description
Dataset

Reduced dataset. Gains a group dimension when group is set.

gridstats.ops.aggregations.min(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset

Minimum value along a dimension.

Wraps xr.Dataset.min. Any extra keyword arguments are forwarded to xarray.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
**kwargs Any

Forwarded to xr.Dataset.min (e.g. skipna, keep_attrs).

{}

Returns:

Type Description
Dataset

Reduced dataset. Gains a group dimension when group is set.

gridstats.ops.aggregations.std(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset

Standard deviation along a dimension.

Wraps xr.Dataset.std. Any extra keyword arguments are forwarded to xarray.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
**kwargs Any

Forwarded to xr.Dataset.std (e.g. skipna, ddof).

{}

Returns:

Type Description
Dataset

Reduced dataset. Gains a group dimension when group is set.

gridstats.ops.aggregations.count(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset

Count of non-NaN values along a dimension.

Wraps xr.Dataset.count. Useful as a data-availability metric.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
**kwargs Any

Forwarded to xr.Dataset.count.

{}

Returns:

Type Description
Dataset

Reduced dataset with integer counts. Gains a group dimension when group is set.

gridstats.ops.aggregations.quantile(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, q: list[float], **kwargs: Any) -> xr.Dataset

Quantiles along a dimension.

Wraps xr.Dataset.quantile.

Note

Quantile computation requires the entire time axis to be in memory. Use chunks: {time: -1} together with tiles on the call to control peak memory usage on large grids. On large grids also set use_flox: false on the call — flox's quantile path uses roughly 2× the memory of the native xarray implementation.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'
group str | None

Time component for grouped climatology: 'month', 'season', or 'year'.

None
q list[float]

Quantile level(s) to compute, in [0, 1].

required
**kwargs Any

Forwarded to xr.Dataset.quantile (e.g. method, keep_attrs).

{}

Returns:

Type Description
Dataset

Reduced dataset with a quantile dimension. Gains a group dimension when group is set.

gridstats.ops.aggregations.pcount(data: xr.Dataset, *, dim: str = 'time', **kwargs) -> xr.Dataset

Percentage of non-NaN values along a dimension.

Values are in [0, 100]. Useful for reporting data coverage.

Parameters:

Name Type Description Default
data Dataset

Input dataset.

required
dim str

Dimension to reduce along.

'time'

Returns:

Type Description
Dataset

Dataset with values in [0, 100].