Docker Support#

zarrio can be easily deployed and run using Docker containers. This is particularly useful for:

  • Ensuring consistent environments across different systems

  • Simplifying deployment

  • Running zarrio in cloud environments

  • Isolating dependencies

  • Parallel processing workflows

Docker Images#

Two Docker images are provided:

  1. Development Image: Includes all development tools and dependencies

  2. Production Image: Minimal image optimized for production use

Building Images#

To build the Docker images:

Development Image#

# Build the development image
docker build -t zarrio:dev .

Production Image#

# Build the production image
docker build -f Dockerfile.prod -t zarrio:latest .

Using Docker Compose#

# Build all services
docker-compose build

Running Containers#

Basic Usage#

# Run the container with default help command
docker run --rm zarrio:latest

# Convert a NetCDF file to Zarr (assuming files are in the current directory)
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr

Development Container#

# Run a bash shell in the development container
docker run --rm -it -v $(pwd):/app zarrio:dev bash

# Or using docker-compose
docker-compose run --rm zarrio-dev bash

Production Container#

# Run the production container
docker run --rm -v $(pwd):/data zarrio:latest --help

# Convert files using the production container
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr

Volume Mounting#

To work with files on your host system, you need to mount volumes when running the container:

# Mount current directory as /data in the container
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr

# For Windows (PowerShell)
docker run --rm -v ${PWD}:/data zarrio:latest convert /data/input.nc /data/output.zarr

# For Windows (Command Prompt)
docker run --rm -v %cd%:/data zarrio:latest convert /data/input.nc /data/output.zarr

Create a data directory in your project root to store input/output files that will be accessible from the container.

Parallel Processing with Docker#

Docker is particularly useful for parallel processing workflows:

# Create template
docker run --rm -v $(pwd):/data zarrio:latest create-template /data/template.nc /data/archive.zarr \
    --global-start 2020-01-01 \
    --global-end 2023-12-31

# Process multiple files in parallel containers
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2020.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2021.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2022.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2023.nc /data/archive.zarr &

# Wait for all processes to complete
wait

Docker Compose Usage#

# Start a development shell
docker-compose run --rm zarrio-dev bash

# Run the application with specific arguments
docker-compose run --rm zarrio convert /data/input.nc /data/output.zarr

# Build all services
docker-compose build

Security Features#

The production Docker image includes several security features:

  1. Non-root User: Runs as a dedicated onzarr user instead of root

  2. Minimal Dependencies: Only includes necessary runtime dependencies

  3. File Ownership: Proper file ownership management

  4. Slim Base Image: Uses python:3.10-slim for reduced attack surface

Customization#

You can customize the Docker images by modifying the Dockerfiles:

  1. For development: Dockerfile

  2. For production: Dockerfile.prod

Common customizations might include:

  • Adding additional system dependencies

  • Changing the base image

  • Adding specific environment variables

  • Modifying the entrypoint or default command

Troubleshooting#

Common issues and solutions:

  1. Permission Errors: Ensure the data directory has appropriate permissions

  2. File Not Found: Verify file paths and volume mounting

  3. Memory Issues: Monitor container memory usage for large datasets

  4. Network Issues: For cloud storage, ensure network access from containers

Example with Debugging:

docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr --log-level DEBUG