Command-Line Interface#

zarrio provides a comprehensive command-line interface for converting data to Zarr format.

Basic Usage#

The basic syntax for the CLI is:

zarrio [OPTIONS] COMMAND [ARGS]...

Getting Help#

To get help on the available commands:

zarrio --help

To get help on a specific command:

zarrio COMMAND --help

Version Information#

To check the version of zarrio:

zarrio --version

Convert Command#

The convert command converts data to Zarr format.

zarrio convert [OPTIONS] INPUT OUTPUT

Options:#

--chunking CHUNKING

Chunking specification (e.g., ‘time:100,lat:50,lon:100’)

--compression COMPRESSION

Compression specification (e.g., ‘blosc:zstd:3’)

--packing

Enable data packing

–packing-bits {8,16,32}

Number of bits for packing (default: 16)

--packing-manual-ranges PACKING_MANUAL_RANGES

Manual min/max ranges as JSON string (e.g., ‘{“temperature”: {“min”: 0, “max”: 100}}’)

--packing-auto-buffer-factor PACKING_AUTO_BUFFER_FACTOR

Buffer factor for automatically calculated ranges (default: 0.01)

--packing-check-range-exceeded

Check if data exceeds specified ranges (default: True)

–packing-range-exceeded-action {warn,error,ignore}

Action when data exceeds range (default: warn)

--variables VARIABLES

Comma-separated list of variables to include

--drop-variables DROP_VARIABLES

Comma-separated list of variables to exclude

--attrs ATTRS

Additional global attributes as JSON string

--time-dim TIME_DIM

Name of time dimension (default: time)

--target-chunk-size-mb TARGET_CHUNK_SIZE_MB

Target chunk size in MB for intelligent chunking (default: 50) Use this option to configure chunk sizes for different environments: - Local development: 10-25 MB - Production servers: 50-100 MB - Cloud environments: 100-200 MB

--datamesh-datasource DATAMESH_DATASOURCE

Datamesh datasource configuration as JSON string

--datamesh-token DATAMESH_TOKEN

Datamesh token for authentication

--datamesh-service DATAMESH_SERVICE

Datamesh service URL

--config CONFIG

Configuration file (YAML or JSON)

  • --access-pattern TEXT: Expected access pattern (temporal, spatial, balanced)

  • -v, --verbose: Increase verbosity (use -v, -vv, or -vvv)

Examples:#

Convert a single NetCDF file to Zarr:

zarrio convert input.nc output.zarr

Convert with chunking:

zarrio convert input.nc output.zarr --chunking "time:100,lat:50,lon:100"

Convert with compression:

zarrio convert input.nc output.zarr --compression "blosc:zstd:3"

Convert with data packing:

zarrio convert input.nc output.zarr --packing --packing-bits 16

Convert with variable selection:

zarrio convert input.nc output.zarr --variables "temperature,pressure"

Convert with variable exclusion:

zarrio convert input.nc output.zarr --drop-variables "humidity"

Convert with additional attributes:

zarrio convert input.nc output.zarr --attrs '{"title": "Demo dataset", "source": "zarrio"}'

Convert with configuration file:

zarrio convert input.nc output.zarr --config config.yaml

Append Command#

The append command appends data to an existing Zarr store.

zarrio append [OPTIONS] INPUT ZARR

Options:#

  • --chunking TEXT: Chunking specification (e.g., ‘time:100,lat:50,lon:100’)

  • --variables TEXT: Comma-separated list of variables to include

  • --drop-variables TEXT: Comma-separated list of variables to exclude

  • --append-dim TEXT: Dimension to append along (default: time)

  • --time-dim TEXT: Name of time dimension (default: time)

  • --config PATH: Configuration file (YAML or JSON)

  • -v, --verbose: Increase verbosity (use -v, -vv, or -vvv)

Examples:#

Append data to an existing Zarr store:

zarrio append new_data.nc existing.zarr

Append with variable selection:

zarrio append new_data.nc existing.zarr --variables "temperature,pressure"

Append with chunking:

zarrio append new_data.nc existing.zarr --chunking "time:50,lat:25,lon:50"

Create-Template Command#

The create-template command creates a template Zarr archive for parallel writing.

zarrio create-template [OPTIONS] TEMPLATE OUTPUT

Options:#

  • --chunking TEXT: Chunking specification (e.g., ‘time:100,lat:50,lon:100’)

  • --compression TEXT: Compression specification (e.g., ‘blosc:zstd:3’)

  • --packing: Enable data packing

  • --packing-bits INTEGER: Number of bits for packing (8, 16, or 32)

  • --packing-manual-ranges TEXT: Manual min/max ranges as JSON string

  • --packing-auto-buffer-factor FLOAT: Buffer factor for automatically calculated ranges

  • --packing-check-range-exceeded: Check if data exceeds specified ranges

  • --packing-range-exceeded-action [warn|error|ignore]: Action when data exceeds range

  • --global-start TEXT: Start time for full archive (e.g., ‘2020-01-01’)

  • --global-end TEXT: End time for full archive (e.g., ‘2023-12-31’)

  • --freq TEXT: Time frequency (e.g., ‘1D’, ‘1H’, inferred if not provided)

  • --metadata-only: Create metadata only (compute=False)

  • --time-dim TEXT: Name of time dimension (default: time)

  • --intelligent-chunking: Enable intelligent chunking based on full archive dimensions

  • --access-pattern [temporal|spatial|balanced]: Access pattern for chunking optimization (default: balanced)

  • --config PATH: Configuration file (YAML or JSON)

  • -v, --verbose: Increase verbosity (use -v, -vv, or -vvv)

Examples:#

Create template for parallel writing:

zarrio create-template template.nc archive.zarr \\
    --global-start 2020-01-01 \\
    --global-end 2023-12-31

Create template with intelligent chunking:

zarrio create-template template.nc archive.zarr \\
    --global-start 2020-01-01 \\
    --global-end 2023-12-31 \\
    --intelligent-chunking \\
    --access-pattern temporal

Create template with chunking:

zarrio create-template template.nc archive.zarr \
    --global-start 2020-01-01 \
    --global-end 2023-12-31 \
    --chunking "time:100,lat:50,lon:100"

Create template with compression:

zarrio create-template template.nc archive.zarr \
    --global-start 2020-01-01 \
    --global-end 2023-12-31 \
    --compression "blosc:zstd:3"

Create template with data packing:

zarrio create-template template.nc archive.zarr \
    --global-start 2020-01-01 \
    --global-end 2023-12-31 \
    --packing --packing-bits 16

Create template with manual packing ranges:

zarrio create-template template.nc archive.zarr \
    --global-start 2020-01-01 \
    --global-end 2023-12-31 \
    --packing --packing-manual-ranges '{"temperature": {"min": -50, "max": 50}}'

Create template with automatic range calculation:

zarrio create-template template.nc archive.zarr\n        --global-start 2020-01-01\n        --global-end 2023-12-31\n        --packing --packing-auto-buffer-factor 0.05
zarrio analyze input.nc --interactive

Parallel Processing Example#

To process thousands of NetCDF files in parallel:

  1. Create template:

zarrio create-template template.nc archive.zarr \\
    --global-start 2020-01-01 \\
    --global-end 2023-12-31
  1. Write regions in parallel processes:

# Process 1
zarrio write-region data_2020.nc archive.zarr

# Process 2
zarrio write-region data_2021.nc archive.zarr

# Process 3
zarrio write-region data_2022.nc archive.zarr

# Process 4
zarrio write-region data_2023.nc archive.zarr

Configuration File Support#

zarrio supports configuration files in YAML or JSON format:

YAML Example:#

# config.yaml
chunking:
  time: 150
  lat: 60
  lon: 120
compression:
  method: blosc:zstd:2
  clevel: 2
packing:
  enabled: true
  bits: 16
variables:
  include:
    - temperature
    - pressure
  exclude:
    - humidity
attrs:
  title: YAML Config Demo
  version: 1.0
time:
  dim: time
  append_dim: time
retries_on_missing: 3
missing_check_vars: all

Usage:#

zarrio convert input.nc output.zarr --config config.yaml

JSON Example:#

{
  "chunking": {
    "time": 125,
    "lat": 55,
    "lon": 110
  },
  "compression": {
    "method": "blosc:lz4:1",
    "clevel": 1
  },
  "packing": {
    "enabled": true,
    "bits": 8
  },
  "time": {
    "global_start": "2020-01-01",
    "global_end": "2023-12-31"
  },
  "attrs": {
    "title": "JSON Config Demo",
    "version": "1.0"
  }
}

Usage:#

zarrio convert input.nc output.zarr --config config.json

Compression Expectations#

When working with scientific datasets, it’s important to understand what to expect from compression:

  • Compression ratios of 1.0-1.1x are normal for scientific datasets

  • Speed improvements (20-40% faster) are often more valuable than size reductions

  • The primary benefit of compression is reduced I/O time, not file size reduction

  • For significant size reductions, consider using data packing in combination with compression

For more detailed information about compression expectations, see the main documentation.

Analyze Command#

The analyze command analyzes NetCDF files and provides optimization recommendations.

zarrio analyze [OPTIONS] INPUT

Options:#

  • --target-chunk-size-mb INTEGER: Target chunk size in MB for analysis (default: 50)

  • --test-performance: Show theoretical performance benefits

  • --run-tests: Run actual performance tests to measure real-world benefits

  • -i, --interactive: Interactive mode to guide configuration setup

  • -v, --verbose: Increase verbosity (use -v, -vv, or -vvv)

Examples:#

Analyze a NetCDF file:

zarrio analyze input.nc

Analyze with theoretical performance testing:

zarrio analyze input.nc --test-performance

Analyze with actual performance testing:

zarrio analyze input.nc --run-tests

Analyze with custom target chunk size:

zarrio analyze input.nc --target-chunk-size-mb 100

Analyze with interactive configuration setup:

zarrio analyze input.nc --interactive
zarrio analyze input.nc --interactive