Aggregations¶
Standard reduction operations wrapping xarray's built-in methods. All support temporal grouping via group.
mean¶
Arithmetic mean along dim.
max¶
Maximum value along dim.
min¶
Minimum value along dim.
std¶
Standard deviation along dim.
count¶
Count of non-NaN values along dim. Useful as a data-availability metric.
quantile¶
Quantiles at one or more levels.
| Parameter | Type | Default | Description |
|---|---|---|---|
q |
list[float] | — | Quantile levels in [0, 1]. |
The output has a quantile dimension.
- func: quantile
dim: time
data_vars: [hs]
q: [0.5, 0.75, 0.90, 0.95, 0.99]
chunks:
time: -1 # quantile requires the full time axis in one chunk
latitude: 50
longitude: 50
tiles:
latitude: 10 # process 10 rows at a time if memory is tight
use_flox: false # flox uses ~2× memory for quantile; disable on large grids
Note
quantile loads the entire time axis into memory per spatial chunk. Use tiles to limit peak memory usage on large grids.
flox memory usage
When flox is installed, xarray uses it for groupby reductions by default. For most operations this is faster, but for quantile flox's implementation uses approximately 2× more memory than the native xarray path. On large grids (e.g. global or regional hindcasts) this can cause out-of-memory errors. Set use_flox: false on any quantile call that processes a large dataset.
pcount¶
Percentage of non-NaN values (0–100). Indicates data coverage.
API reference¶
gridstats.ops.aggregations.mean(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
Arithmetic mean along a dimension.
Wraps xr.Dataset.mean.
Any extra keyword arguments are forwarded to xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset. Gains a |
gridstats.ops.aggregations.max(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
Maximum value along a dimension.
Wraps xr.Dataset.max.
Any extra keyword arguments are forwarded to xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset. Gains a |
gridstats.ops.aggregations.min(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
Minimum value along a dimension.
Wraps xr.Dataset.min.
Any extra keyword arguments are forwarded to xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset. Gains a |
gridstats.ops.aggregations.std(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
Standard deviation along a dimension.
Wraps xr.Dataset.std.
Any extra keyword arguments are forwarded to xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset. Gains a |
gridstats.ops.aggregations.count(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
Count of non-NaN values along a dimension.
Wraps xr.Dataset.count.
Useful as a data-availability metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset with integer counts. Gains a |
gridstats.ops.aggregations.quantile(data: xr.Dataset, *, dim: str = 'time', group: str | None = None, q: list[float], **kwargs: Any) -> xr.Dataset
¶
Quantiles along a dimension.
Wraps xr.Dataset.quantile.
Note
Quantile computation requires the entire time axis to be in memory.
Use chunks: {time: -1} together with tiles on the call to control
peak memory usage on large grids. On large grids also set
use_flox: false on the call — flox's quantile path uses roughly 2×
the memory of the native xarray implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
group
|
str | None
|
Time component for grouped climatology: |
None
|
q
|
list[float]
|
Quantile level(s) to compute, in [0, 1]. |
required |
**kwargs
|
Any
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Reduced dataset with a |
gridstats.ops.aggregations.pcount(data: xr.Dataset, *, dim: str = 'time', **kwargs) -> xr.Dataset
¶
Percentage of non-NaN values along a dimension.
Values are in [0, 100]. Useful for reporting data coverage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
dim
|
str
|
Dimension to reduce along. |
'time'
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with values in [0, 100]. |