Distributions¶
Joint histograms (counts) of two or three variables. Results are not normalised — they are raw integer counts that can be normalised downstream.
Bin specification¶
All distribution functions accept bin specifications as either:
- A dict with
start,stop(optional),stepkeys — equivalent tonumpy.arange(start, stop+step, step). Ifstopis omitted it is inferred from the data maximum. - A list or array of explicit bin edges.
bins1: {start: 0, step: 0.5} # 0, 0.5, 1.0, ... (stop from data)
bins2: {start: 0, stop: 20, step: 1.0} # 0, 1, 2, ..., 20
bins3: [0, 90, 180, 270, 360] # explicit edges
distribution3¶
3-D joint histogram over three variables (typically Hs × Tp × Dir).
| Parameter | Type | Default | Description |
|---|---|---|---|
var1 |
str | "hs" |
First variable name. |
var2 |
str | "tp" |
Second variable name. |
var3 |
str | "dpm" |
Third variable name. |
bins1/2/3 |
dict or list | — | Bin specifications for each variable. |
isdir1/2/3 |
bool | false/false/true |
Whether each variable is directional (values wrap at 360°). |
group |
str | null |
Time grouping (e.g. month for monthly climatological distributions). |
Output variable: dist with dimensions (var1, var2, var3).
- func: distribution3
dim: time
data_vars: [hs, tp, dpm]
var1: hs
var2: tp
var3: dpm
bins1: {start: 0, step: 0.5}
bins2: {start: 0, step: 1.0}
bins3: {start: 0, stop: 360, step: 45}
isdir3: true
group: month
distribution2¶
2-D joint histogram over two variables (typically speed × direction).
| Parameter | Type | Default | Description |
|---|---|---|---|
var1 |
str | "wspd" |
First variable name. |
var2 |
str | "wdir" |
Second variable name. |
bins1/2 |
dict or list | — | Bin specifications. |
isdir1/2 |
bool | false/true |
Whether each variable is directional. |
group |
str | null |
Time grouping. |
Output variable: dist2 with dimensions (var1, var2).
- func: distribution2
dim: time
data_vars: [wspd, wdir]
var1: wspd
var2: wdir
bins1: {start: 0, step: 1.0}
bins2: {start: 0, stop: 360, step: 22.5}
isdir2: true
group: month
distribution3_timestep¶
Memory-efficient variant of distribution3 that accumulates histogram counts in time chunks. Suitable for very long timeseries where loading the full dataset at once would exceed memory limits.
All parameters are the same as distribution3, plus:
| Parameter | Type | Default | Description |
|---|---|---|---|
freq |
str | "30d" |
Time chunk size as a pandas frequency string (e.g. "10d", "1ME"). |
- func: distribution3_timestep
dim: time
data_vars: [hs, tp, dpm]
var1: hs
var2: tp
var3: dpm
bins1: {start: 0, step: 0.5}
bins2: {start: 0, step: 1.0}
bins3: {start: 0, stop: 360, step: 45}
group: month
freq: 10d # load 10 days at a time
Tip
Prefer distribution3 for datasets that fit comfortably in memory — it is faster because it avoids repeated I/O. Use distribution3_timestep only for multi-decade datasets.
API reference¶
gridstats.ops.distribution.distribution2(data: xr.Dataset, *, dim: str = 'time', var1: str = 'wspd', var2: str = 'wdir', bins1: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins2: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
2-D joint histogram over two variables (e.g. speed × direction).
Results are raw integer counts (not normalised).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset containing |
required |
dim
|
str
|
Time dimension name. |
'time'
|
var1
|
str
|
Name of the first variable (default |
'wspd'
|
var2
|
str
|
Name of the second variable, often directional (default |
'wdir'
|
bins1
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'step': 1.0}
|
bins2
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'stop': 360, 'step': 45}
|
isdir1
|
bool
|
Whether |
False
|
isdir2
|
bool
|
Whether |
True
|
group
|
str | None
|
Time component to group by (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with variable |
Dataset
|
Coordinates are bin-centre values. |
gridstats.ops.distribution.distribution3(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
3-D joint histogram over three variables (e.g. Hs × Tp × Dir).
Results are raw integer counts (not normalised). Bin specifications can
be a dict with start, stop (optional, inferred from data max), and
step keys, or a plain list of explicit bin edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset containing |
required |
dim
|
str
|
Time dimension name. |
'time'
|
var1
|
str
|
Name of the first variable (default |
'hs'
|
var2
|
str
|
Name of the second variable (default |
'tp'
|
var3
|
str
|
Name of the third variable, often directional (default |
'dpm'
|
bins1
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'step': 0.5}
|
bins2
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'step': 1.0}
|
bins3
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'stop': 360, 'step': 45}
|
isdir1
|
bool
|
Whether |
False
|
isdir2
|
bool
|
Whether |
False
|
isdir3
|
bool
|
Whether |
True
|
group
|
str | None
|
Time component to group by (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with variable |
Dataset
|
Coordinates are bin-centre values. |
gridstats.ops.distribution.distribution3_timestep(data: xr.Dataset, *, dim: str = 'time', var1: str = 'hs', var2: str = 'tp', var3: str = 'dpm', bins1: dict[str, Any] | list = {'start': 0, 'step': 0.5}, bins2: dict[str, Any] | list = {'start': 0, 'step': 1.0}, bins3: dict[str, Any] | list = {'start': 0, 'stop': 360, 'step': 45}, isdir1: bool = False, isdir2: bool = False, isdir3: bool = True, freq: str = '30d', group: str | None = None, **kwargs: Any) -> xr.Dataset
¶
3-D joint histogram accumulated over time chunks to limit memory use.
Splits the time axis into windows of size freq, computes a histogram
for each window, then sums the counts. Suitable for multi-decade datasets
that cannot be fully rechunked into memory.
Prefer distribution3 when the dataset fits comfortably in memory — it
is faster because it avoids repeated I/O.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset containing |
required |
dim
|
str
|
Time dimension name. |
'time'
|
var1
|
str
|
Name of the first variable (default |
'hs'
|
var2
|
str
|
Name of the second variable (default |
'tp'
|
var3
|
str
|
Name of the third variable (default |
'dpm'
|
bins1
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'step': 0.5}
|
bins2
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'step': 1.0}
|
bins3
|
dict[str, Any] | list
|
Bin specification for |
{'start': 0, 'stop': 360, 'step': 45}
|
isdir1
|
bool
|
Whether |
False
|
isdir2
|
bool
|
Whether |
False
|
isdir3
|
bool
|
Whether |
True
|
freq
|
str
|
Pandas-compatible frequency string for time-chunking (e.g. |
'30d'
|
group
|
str | None
|
Time component to group by (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with accumulated joint distribution counts, same structure as |
Dataset
|
|