Docker Support#
zarrio can be easily deployed and run using Docker containers. This is particularly useful for:
Ensuring consistent environments across different systems
Simplifying deployment
Running zarrio in cloud environments
Isolating dependencies
Parallel processing workflows
Docker Images#
Two Docker images are provided:
Development Image: Includes all development tools and dependencies
Production Image: Minimal image optimized for production use
Building Images#
To build the Docker images:
Development Image#
# Build the development image
docker build -t zarrio:dev .
Production Image#
# Build the production image
docker build -f Dockerfile.prod -t zarrio:latest .
Using Docker Compose#
# Build all services
docker-compose build
Running Containers#
Basic Usage#
# Run the container with default help command
docker run --rm zarrio:latest
# Convert a NetCDF file to Zarr (assuming files are in the current directory)
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr
Development Container#
# Run a bash shell in the development container
docker run --rm -it -v $(pwd):/app zarrio:dev bash
# Or using docker-compose
docker-compose run --rm zarrio-dev bash
Production Container#
# Run the production container
docker run --rm -v $(pwd):/data zarrio:latest --help
# Convert files using the production container
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr
Volume Mounting#
To work with files on your host system, you need to mount volumes when running the container:
# Mount current directory as /data in the container
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr
# For Windows (PowerShell)
docker run --rm -v ${PWD}:/data zarrio:latest convert /data/input.nc /data/output.zarr
# For Windows (Command Prompt)
docker run --rm -v %cd%:/data zarrio:latest convert /data/input.nc /data/output.zarr
Create a data
directory in your project root to store input/output files that will be accessible from the container.
Parallel Processing with Docker#
Docker is particularly useful for parallel processing workflows:
# Create template
docker run --rm -v $(pwd):/data zarrio:latest create-template /data/template.nc /data/archive.zarr \
--global-start 2020-01-01 \
--global-end 2023-12-31
# Process multiple files in parallel containers
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2020.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2021.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2022.nc /data/archive.zarr &
docker run --rm -v $(pwd):/data zarrio:latest write-region /data/data_2023.nc /data/archive.zarr &
# Wait for all processes to complete
wait
Docker Compose Usage#
# Start a development shell
docker-compose run --rm zarrio-dev bash
# Run the application with specific arguments
docker-compose run --rm zarrio convert /data/input.nc /data/output.zarr
# Build all services
docker-compose build
Security Features#
The production Docker image includes several security features:
Non-root User: Runs as a dedicated
onzarr
user instead of rootMinimal Dependencies: Only includes necessary runtime dependencies
File Ownership: Proper file ownership management
Slim Base Image: Uses
python:3.10-slim
for reduced attack surface
Customization#
You can customize the Docker images by modifying the Dockerfiles:
For development:
Dockerfile
For production:
Dockerfile.prod
Common customizations might include:
Adding additional system dependencies
Changing the base image
Adding specific environment variables
Modifying the entrypoint or default command
Troubleshooting#
Common issues and solutions:
Permission Errors: Ensure the
data
directory has appropriate permissionsFile Not Found: Verify file paths and volume mounting
Memory Issues: Monitor container memory usage for large datasets
Network Issues: For cloud storage, ensure network access from containers
Example with Debugging:
docker run --rm -v $(pwd):/data zarrio:latest convert /data/input.nc /data/output.zarr --log-level DEBUG