Distributions¶

Joint histograms (counts) of two or three variables. Results are not normalised — they are raw integer counts that can be normalised downstream.

Bin specification¶

All distribution functions accept bin specifications as either:

A dict with start, stop (optional), step keys — equivalent to numpy.arange(start, stop+step, step). If stop is omitted it is inferred from the data maximum.
A list or array of explicit bin edges.

bins1: {start: 0, step: 0.5}           # 0, 0.5, 1.0, ...  (stop from data)
bins2: {start: 0, stop: 20, step: 1.0} # 0, 1, 2, ..., 20
bins3: [0, 90, 180, 270, 360]           # explicit edges

`distribution3`¶

3-D joint histogram over three variables (typically Hs × Tp × Dir).

Parameter	Type	Default	Description
`var1`	str	`"hs"`	First variable name.
`var2`	str	`"tp"`	Second variable name.
`var3`	str	`"dpm"`	Third variable name.
`bins1/2/3`	dict or list	—	Bin specifications for each variable.
`isdir1/2/3`	bool	`false/false/true`	Whether each variable is directional (values wrap at 360°).
`group`	str	`null`	Time grouping (e.g. `month` for monthly climatological distributions).

Output variable: dist with dimensions (var1, var2, var3).

- func: distribution3
  dim: time
  data_vars: [hs, tp, dpm]
  var1: hs
  var2: tp
  var3: dpm
  bins1: {start: 0, step: 0.5}
  bins2: {start: 0, step: 1.0}
  bins3: {start: 0, stop: 360, step: 45}
  isdir3: true
  group: month

`distribution2`¶

2-D joint histogram over two variables (typically speed × direction).

Parameter	Type	Default	Description
`var1`	str	`"wspd"`	First variable name.
`var2`	str	`"wdir"`	Second variable name.
`bins1/2`	dict or list	—	Bin specifications.
`isdir1/2`	bool	`false/true`	Whether each variable is directional.
`group`	str	`null`	Time grouping.

Output variable: dist2 with dimensions (var1, var2).

- func: distribution2
  dim: time
  data_vars: [wspd, wdir]
  var1: wspd
  var2: wdir
  bins1: {start: 0, step: 1.0}
  bins2: {start: 0, stop: 360, step: 22.5}
  isdir2: true
  group: month

`distribution3_timestep`¶

Memory-efficient variant of distribution3 that accumulates histogram counts in time chunks. Suitable for very long timeseries where loading the full dataset at once would exceed memory limits.

All parameters are the same as distribution3, plus:

Parameter	Type	Default	Description
`freq`	str	`"30d"`	Time chunk size as a pandas frequency string (e.g. `"10d"`, `"1ME"`).

- func: distribution3_timestep
  dim: time
  data_vars: [hs, tp, dpm]
  var1: hs
  var2: tp
  var3: dpm
  bins1: {start: 0, step: 0.5}
  bins2: {start: 0, step: 1.0}
  bins3: {start: 0, stop: 360, step: 45}
  group: month
  freq: 10d           # load 10 days at a time

Tip

Prefer distribution3 for datasets that fit comfortably in memory — it is faster because it avoids repeated I/O. Use distribution3_timestep only for multi-decade datasets.

API reference¶

`gridstats.ops.distribution.distribution2(data: xr.Dataset, *, dim: str = 'time', var1: str = 'wspd', var2: str = 'wdir', bins1: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins2: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset` ¶

2-D joint histogram over two variables (e.g. speed × direction).

Results are raw integer counts (not normalised).

Parameters:

Name	Type	Description	Default
`data`	`Dataset`	Input dataset containing `var1`, `var2`.	required
`dim`	`str`	Time dimension name.	`'time'`
`var1`	`str`	Name of the first variable (default `'wspd'`).	`'wspd'`
`var2`	`str`	Name of the second variable, often directional (default `'wdir'`).	`'wdir'`
`bins1`	`dict[str, Any] \| list`	Bin specification for `var1`.	`{'start': 0, 'step': 1.0}`
`bins2`	`dict[str, Any] \| list`	Bin specification for `var2`.	`{'start': 0, 'stop': 360, 'step': 45}`
`isdir1`	`bool`	Whether `var1` is directional.	`False`
`isdir2`	`bool`	Whether `var2` is directional (default `True`).	`True`
`group`	`str \| None`	Time component to group by (e.g. `'month'`).	`None`

Returns:

Type	Description
`Dataset`	Dataset with variable `dist2` and dimensions `(var1, var2)`.
`Dataset`	Coordinates are bin-centre values.

`gridstats.ops.distribution.distribution3(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset` ¶

3-D joint histogram over three variables (e.g. Hs × Tp × Dir).

Results are raw integer counts (not normalised). Bin specifications can be a dict with start, stop (optional, inferred from data max), and step keys, or a plain list of explicit bin edges.

Parameters:

Name	Type	Description	Default
`data`	`Dataset`	Input dataset containing `var1`, `var2`, `var3`.	required
`dim`	`str`	Time dimension name.	`'time'`
`var1`	`str`	Name of the first variable (default `'hs'`).	`'hs'`
`var2`	`str`	Name of the second variable (default `'tp'`).	`'tp'`
`var3`	`str`	Name of the third variable, often directional (default `'dpm'`).	`'dpm'`
`bins1`	`dict[str, Any] \| list`	Bin specification for `var1`.	`{'start': 0, 'step': 0.5}`
`bins2`	`dict[str, Any] \| list`	Bin specification for `var2`.	`{'start': 0, 'step': 1.0}`
`bins3`	`dict[str, Any] \| list`	Bin specification for `var3`.	`{'start': 0, 'stop': 360, 'step': 45}`
`isdir1`	`bool`	Whether `var1` is directional (wraps at 360°).	`False`
`isdir2`	`bool`	Whether `var2` is directional.	`False`
`isdir3`	`bool`	Whether `var3` is directional (default `True`).	`True`
`group`	`str \| None`	Time component to group by (e.g. `'month'`).	`None`

Returns:

Type	Description
`Dataset`	Dataset with variable `dist` and dimensions `(var1, var2, var3)`.
`Dataset`	Coordinates are bin-centre values.

`gridstats.ops.distribution.distribution3_timestep(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, freq: str = '30d', group: str | None = None, **kwargs: Any) -> xr.Dataset` ¶

3-D joint histogram accumulated over time chunks to limit memory use.

Splits the time axis into windows of size freq, computes a histogram for each window, then sums the counts. Suitable for multi-decade datasets that cannot be fully rechunked into memory.

Prefer distribution3 when the dataset fits comfortably in memory — it is faster because it avoids repeated I/O.

Parameters:

Name	Type	Description	Default
`data`	`Dataset`	Input dataset containing `var1`, `var2`, `var3`.	required
`dim`	`str`	Time dimension name.	`'time'`
`var1`	`str`	Name of the first variable (default `'hs'`).	`'hs'`
`var2`	`str`	Name of the second variable (default `'tp'`).	`'tp'`
`var3`	`str`	Name of the third variable (default `'dpm'`).	`'dpm'`
`bins1`	`dict[str, Any] \| list`	Bin specification for `var1`.	`{'start': 0, 'step': 0.5}`
`bins2`	`dict[str, Any] \| list`	Bin specification for `var2`.	`{'start': 0, 'step': 1.0}`
`bins3`	`dict[str, Any] \| list`	Bin specification for `var3`.	`{'start': 0, 'stop': 360, 'step': 45}`
`isdir1`	`bool`	Whether `var1` is directional.	`False`
`isdir2`	`bool`	Whether `var2` is directional.	`False`
`isdir3`	`bool`	Whether `var3` is directional (default `True`).	`True`
`freq`	`str`	Pandas-compatible frequency string for time-chunking (e.g. `'30d'`, `'1ME'`).	`'30d'`
`group`	`str \| None`	Time component to group by (e.g. `'month'`).	`None`

Returns:

Type	Description
`Dataset`	Dataset with accumulated joint distribution counts, same structure as
`Dataset`	`distribution3`.

Distributions¶

Bin specification¶

distribution3¶

distribution2¶

distribution3_timestep¶

API reference¶

`distribution3`¶

`distribution2`¶

`distribution3_timestep`¶