Skip to content

Distributions

Joint histograms (counts) of two or three variables. Results are not normalised — they are raw integer counts that can be normalised downstream.


Bin specification

All distribution functions accept bin specifications as either:

  • A dict with start, stop (optional), step keys — equivalent to numpy.arange(start, stop+step, step). If stop is omitted it is inferred from the data maximum.
  • A list or array of explicit bin edges.
bins1: {start: 0, step: 0.5}           # 0, 0.5, 1.0, ...  (stop from data)
bins2: {start: 0, stop: 20, step: 1.0} # 0, 1, 2, ..., 20
bins3: [0, 90, 180, 270, 360]           # explicit edges

distribution3

3-D joint histogram over three variables (typically Hs × Tp × Dir).

Parameter Type Default Description
var1 str "hs" First variable name.
var2 str "tp" Second variable name.
var3 str "dpm" Third variable name.
bins1/2/3 dict or list Bin specifications for each variable.
isdir1/2/3 bool false/false/true Whether each variable is directional (values wrap at 360°).
group str null Time grouping (e.g. month for monthly climatological distributions).

Output variable: dist with dimensions (var1, var2, var3).

- func: distribution3
  dim: time
  data_vars: [hs, tp, dpm]
  var1: hs
  var2: tp
  var3: dpm
  bins1: {start: 0, step: 0.5}
  bins2: {start: 0, step: 1.0}
  bins3: {start: 0, stop: 360, step: 45}
  isdir3: true
  group: month

distribution2

2-D joint histogram over two variables (typically speed × direction).

Parameter Type Default Description
var1 str "wspd" First variable name.
var2 str "wdir" Second variable name.
bins1/2 dict or list Bin specifications.
isdir1/2 bool false/true Whether each variable is directional.
group str null Time grouping.

Output variable: dist2 with dimensions (var1, var2).

- func: distribution2
  dim: time
  data_vars: [wspd, wdir]
  var1: wspd
  var2: wdir
  bins1: {start: 0, step: 1.0}
  bins2: {start: 0, stop: 360, step: 22.5}
  isdir2: true
  group: month

distribution3_timestep

Memory-efficient variant of distribution3 that accumulates histogram counts in time chunks. Suitable for very long timeseries where loading the full dataset at once would exceed memory limits.

All parameters are the same as distribution3, plus:

Parameter Type Default Description
freq str "30d" Time chunk size as a pandas frequency string (e.g. "10d", "1ME").
- func: distribution3_timestep
  dim: time
  data_vars: [hs, tp, dpm]
  var1: hs
  var2: tp
  var3: dpm
  bins1: {start: 0, step: 0.5}
  bins2: {start: 0, step: 1.0}
  bins3: {start: 0, stop: 360, step: 45}
  group: month
  freq: 10d           # load 10 days at a time

Tip

Prefer distribution3 for datasets that fit comfortably in memory — it is faster because it avoids repeated I/O. Use distribution3_timestep only for multi-decade datasets.


API reference

gridstats.ops.distribution.distribution2(data: xr.Dataset, *, dim: str = 'time', var1: str = 'wspd', var2: str = 'wdir', bins1: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins2: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset

2-D joint histogram over two variables (e.g. speed × direction).

Results are raw integer counts (not normalised).

Parameters:

Name Type Description Default
data Dataset

Input dataset containing var1, var2.

required
dim str

Time dimension name.

'time'
var1 str

Name of the first variable (default 'wspd').

'wspd'
var2 str

Name of the second variable, often directional (default 'wdir').

'wdir'
bins1 dict[str, Any] | list

Bin specification for var1.

{'start': 0, 'step': 1.0}
bins2 dict[str, Any] | list

Bin specification for var2.

{'start': 0, 'stop': 360, 'step': 45}
isdir1 bool

Whether var1 is directional.

False
isdir2 bool

Whether var2 is directional (default True).

True
group str | None

Time component to group by (e.g. 'month').

None

Returns:

Type Description
Dataset

Dataset with variable dist2 and dimensions (var1, var2).

Dataset

Coordinates are bin-centre values.

gridstats.ops.distribution.distribution3(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset

3-D joint histogram over three variables (e.g. Hs × Tp × Dir).

Results are raw integer counts (not normalised). Bin specifications can be a dict with start, stop (optional, inferred from data max), and step keys, or a plain list of explicit bin edges.

Parameters:

Name Type Description Default
data Dataset

Input dataset containing var1, var2, var3.

required
dim str

Time dimension name.

'time'
var1 str

Name of the first variable (default 'hs').

'hs'
var2 str

Name of the second variable (default 'tp').

'tp'
var3 str

Name of the third variable, often directional (default 'dpm').

'dpm'
bins1 dict[str, Any] | list

Bin specification for var1.

{'start': 0, 'step': 0.5}
bins2 dict[str, Any] | list

Bin specification for var2.

{'start': 0, 'step': 1.0}
bins3 dict[str, Any] | list

Bin specification for var3.

{'start': 0, 'stop': 360, 'step': 45}
isdir1 bool

Whether var1 is directional (wraps at 360°).

False
isdir2 bool

Whether var2 is directional.

False
isdir3 bool

Whether var3 is directional (default True).

True
group str | None

Time component to group by (e.g. 'month').

None

Returns:

Type Description
Dataset

Dataset with variable dist and dimensions (var1, var2, var3).

Dataset

Coordinates are bin-centre values.

gridstats.ops.distribution.distribution3_timestep(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, freq: str = '30d', group: str | None = None, **kwargs: Any) -> xr.Dataset

3-D joint histogram accumulated over time chunks to limit memory use.

Splits the time axis into windows of size freq, computes a histogram for each window, then sums the counts. Suitable for multi-decade datasets that cannot be fully rechunked into memory.

Prefer distribution3 when the dataset fits comfortably in memory — it is faster because it avoids repeated I/O.

Parameters:

Name Type Description Default
data Dataset

Input dataset containing var1, var2, var3.

required
dim str

Time dimension name.

'time'
var1 str

Name of the first variable (default 'hs').

'hs'
var2 str

Name of the second variable (default 'tp').

'tp'
var3 str

Name of the third variable (default 'dpm').

'dpm'
bins1 dict[str, Any] | list

Bin specification for var1.

{'start': 0, 'step': 0.5}
bins2 dict[str, Any] | list

Bin specification for var2.

{'start': 0, 'step': 1.0}
bins3 dict[str, Any] | list

Bin specification for var3.

{'start': 0, 'stop': 360, 'step': 45}
isdir1 bool

Whether var1 is directional.

False
isdir2 bool

Whether var2 is directional.

False
isdir3 bool

Whether var3 is directional (default True).

True
freq str

Pandas-compatible frequency string for time-chunking (e.g. '30d', '1ME').

'30d'
group str | None

Time component to group by (e.g. 'month').

None

Returns:

Type Description
Dataset

Dataset with accumulated joint distribution counts, same structure as

Dataset

distribution3.