Troubleshooting
This guide helps you diagnose and resolve common issues when using rompy-oceanum. It covers authentication problems, configuration errors, pipeline failures, and performance issues.
Quick Diagnostics
If you’re experiencing issues, start with these diagnostic commands:
# Check version and installation
rompy-oceanum --version
# Validate configuration
rompy-oceanum config validate
# Test connectivity
rompy-oceanum config test-connection
# Check current configuration
rompy-oceanum config show --hide-secrets
Common Issues
Authentication Problems
Symptoms: - “Authentication failed” errors - “Invalid token” messages - 401/403 HTTP errors
Solutions:
Verify Token Format:
# Check if token is set echo ${PRAX_TOKEN:0:20}... # Show first 20 characters # Token should be a long alphanumeric string # If empty or wrong format, obtain new token
Check Token Validity:
# Test connection with current token rompy-oceanum config test-connection # If fails, token may be expired or invalid
Verify Environment Variables:
# Check all required variables env | grep PRAX_ # Should show: # PRAX_TOKEN=your-token # PRAX_ORG=your-org # PRAX_PROJECT=your-project # PRAX_BASE_URL=https://prax.oceanum.science
Common Token Issues:
# Wrong variable name export PRAX_TOKEN="your-token" # Not PRAX_API_TOKEN # Extra whitespace (common copy-paste error) export PRAX_TOKEN="$(echo 'your-token' | xargs)" # Wrong base URL export PRAX_BASE_URL="https://prax.oceanum.science" # Not .com
Configuration Errors
Symptoms: - “Configuration validation failed” - YAML syntax errors - Missing required fields
Solutions:
Validate YAML Syntax:
# Check YAML is valid python -c "import yaml; yaml.safe_load(open('config.yaml'))" # Common YAML issues: # - Inconsistent indentation # - Missing quotes around special characters # - Incorrect nesting
Validate rompy-oceanum Configuration:
# Use built-in validation rompy-oceanum config validate config.yaml # Fix reported errors one by one
Check File Permissions:
# Configuration file should be readable ls -la ~/.rompy-oceanum/config.yaml # Fix permissions if needed chmod 600 ~/.rompy-oceanum/config.yaml
Common Configuration Issues:
# Incorrect (missing required fields): prax: org: "my-org" # Correct: prax: base_url: "https://prax.oceanum.science" org: "my-org" project: "my-project"
Pipeline Submission Failures
Symptoms: - “Pipeline not found” errors - Submission timeouts - “Invalid pipeline configuration”
Solutions:
Verify Pipeline Exists:
# List available pipelines rompy-oceanum pipelines --org your-org --project your-project # Check exact pipeline name (case-sensitive)
Check Model Configuration:
# Validate rompy configuration first python -c " import rompy config = rompy.ModelRun.from_file('your-config.yaml') print('Configuration valid') "
Verify Resource Requirements:
Check if your model requires more resources than pipeline allows
Large models may need special pipeline templates
Contact admin for resource limit increases
Debug Submission:
# Enable debug mode for detailed error info rompy-oceanum --debug run --config your-config.yaml --pipeline your-pipeline
Pipeline Execution Failures
Symptoms: - Pipeline status shows “failed” - Unexpected termination - Resource limit errors
Solutions:
Check Pipeline Logs:
# View all logs rompy-oceanum logs your-run-id # Search for errors rompy-oceanum logs your-run-id --grep "ERROR\|FAILED\|Exception" # Check specific stages rompy-oceanum logs your-run-id --stage model-execution
Common Execution Issues:
Memory Errors:
# Look for "Out of memory" or "OOMKilled" rompy-oceanum logs your-run-id --grep "memory\|OOM" # Solution: Use pipeline with more memory or optimize model
Timeout Errors:
# Check for timeout messages rompy-oceanum logs your-run-id --grep "timeout\|exceeded" # Solution: Increase timeout or optimize model
Input Data Issues:
# Check for file not found errors rompy-oceanum logs your-run-id --grep "No such file\|FileNotFound" # Solution: Verify input data paths and availability
Resource Monitoring:
# Get detailed status including resource usage rompy-oceanum status your-run-id --json
Network and Connectivity Issues
Symptoms: - Connection timeouts - “Network unreachable” errors - Intermittent failures
Solutions:
Check Network Connectivity:
# Test basic connectivity curl -I https://prax.oceanum.science # Test with authentication curl -H "Authorization: Bearer $PRAX_TOKEN" \ https://prax.oceanum.science/health
Increase Timeouts:
# Use longer timeout for unstable connections rompy-oceanum run \ --config your-config.yaml \ --pipeline your-pipeline \ --timeout 7200 # 2 hours
Check Firewall/Proxy:
Ensure HTTPS (port 443) access to oceanum.science domains
Configure proxy settings if behind corporate firewall
Whitelist Oceanum IP ranges if needed
Retry Logic:
# rompy-oceanum has built-in retry logic # For additional reliability, use wrapper script: #!/bin/bash for i in {1..3}; do if rompy-oceanum run --config config.yaml --pipeline pipeline; then break fi echo "Attempt $i failed, retrying..." sleep 60 done
Output Download Issues
Symptoms: - Download timeouts - Partial downloads - “File not found” errors
Solutions:
Check Pipeline Completion:
# Ensure pipeline completed successfully rompy-oceanum status your-run-id # Only download from completed pipelines
Verify Output Files:
# Check what outputs are available rompy-oceanum status your-run-id --json | jq '.outputs'
Resume Interrupted Downloads:
# Use --force to overwrite partial downloads rompy-oceanum download your-run-id --force --verify
Download Specific Files:
# Download only needed files to save time/space rompy-oceanum download your-run-id --files "*.nc"
Performance Issues
Slow Pipeline Execution
Diagnosis:
# Monitor execution progress
rompy-oceanum status your-run-id --watch
# Check resource usage in logs
rompy-oceanum logs your-run-id --grep "CPU\|Memory\|Progress"
Solutions:
Optimize Model Configuration: - Reduce grid resolution for testing - Simplify physics options - Use shorter simulation periods
Use Appropriate Pipeline: - Choose pipeline templates matching your resource needs - Use GPU-enabled pipelines for supported models - Consider parallel execution options
Monitor Resource Usage:
# Add resource monitoring to your workflow import time from rompy_oceanum.client import PraxClient client = PraxClient() result = client.get_run_status(run_id) while not result.is_complete(): status = result.get_detailed_status() print(f"CPU: {status.get('cpu_usage', 'N/A')}") print(f"Memory: {status.get('memory_usage', 'N/A')}") time.sleep(60)
High Memory Usage
Symptoms: - “Out of memory” errors - Pipeline termination - Slow execution
Solutions:
Reduce Model Size:
# Reduce grid resolution grid = SwanGrid( x0=115.0, y0=-35.0, dx=0.1, # Increase from 0.05 dy=0.1, # Increase from 0.05 nx=50, # Reduce from 100 ny=40 # Reduce from 80 )
Use Memory-Efficient Options:
# In SWAN configuration physics: whitecapping: true breaking: true memory_optimization: true # If available
Request More Memory:
# Use high-memory pipeline if available rompy-oceanum run \ --config config.yaml \ --pipeline swan-highmem-from-rompy
Debugging Techniques
Enable Debug Mode
# Global debug flag
rompy-oceanum --debug command args
# Environment variable
export ROMPY_OCEANUM_DEBUG=1
# Configuration file
echo "debug: true" >> ~/.rompy-oceanum/config.yaml
Verbose Logging
import logging
from rompy_oceanum import configure_logging
# Enable detailed logging
configure_logging(level=logging.DEBUG)
# Your rompy-oceanum code here
Inspect API Requests
# Enable HTTP request logging
import logging
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
# Now run your rompy-oceanum code
Test with Mock Mode
# Use mock mode for testing without actual API calls
export ROMPY_OCEANUM_MOCK_MODE=1
rompy-oceanum run --config test-config.yaml --pipeline test-pipeline
Error Codes Reference
Exit Codes
0: Success
1: General error
2: Command line usage error
3: Authentication error
4: Configuration error
5: Network error
6: Pipeline error
7: File system error
HTTP Status Codes
400: Bad request (check configuration)
401: Unauthorized (check token)
403: Forbidden (check permissions)
404: Not found (check pipeline/run ID)
429: Rate limited (wait and retry)
500: Server error (contact support)
503: Service unavailable (try later)
Advanced Troubleshooting
Memory Dumps
For severe issues, collect memory dumps:
import tracemalloc
import gc
# Start tracing
tracemalloc.start()
# Your problematic code
try:
result = model_run.pipeline(...)
except Exception as e:
# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory: {current / 1024 / 1024:.1f} MB")
print(f"Peak memory: {peak / 1024 / 1024:.1f} MB")
# Force garbage collection
gc.collect()
raise
Network Debugging
Capture network traffic for debugging:
# Install mitmproxy
pip install mitmproxy
# Start proxy
mitmdump --port 8080 --save-stream-file traffic.log &
# Configure rompy-oceanum to use proxy
export HTTPS_PROXY=http://localhost:8080
rompy-oceanum run --config config.yaml --pipeline pipeline
# Stop proxy and analyze traffic.log
System Information Collection
Collect system information for support:
# Create diagnostic report
cat > diagnostic_report.txt << EOF
# rompy-oceanum Diagnostic Report
Date: $(date)
## System Information
OS: $(uname -a)
Python: $(python --version)
## Package Versions
$(pip list | grep -E "(rompy|oceanum|pydantic)")
## Configuration
$(rompy-oceanum config show --hide-secrets)
## Environment Variables
$(env | grep -E "(PRAX|DATAMESH|ROMPY)")
## Recent Logs
$(tail -50 ~/.rompy-oceanum/logs/rompy-oceanum.log)
EOF
Getting Support
Before Contacting Support
Check this troubleshooting guide
Search existing GitHub issues
Enable debug mode and collect error messages
Test with minimal configuration
Verify network connectivity
What to Include in Support Requests
Error messages (full text, not screenshots)
Configuration files (with secrets redacted)
System information (OS, Python version, package versions)
Steps to reproduce the issue
Expected vs actual behavior
Debug logs if available
Support Channels
GitHub Issues: https://github.com/rom-py/rompy-oceanum/issues
Documentation: https://rompy-oceanum.readthedocs.io/
rompy Community: https://rompy.readthedocs.io/
Quick Reference
Diagnostic Commands
# Quick health check
rompy-oceanum config test-connection
# Validate configuration
rompy-oceanum config validate
# Check recent runs
rompy-oceanum list --limit 5
# Monitor active run
rompy-oceanum status --watch run-id
# View error logs
rompy-oceanum logs run-id --grep ERROR
Common Fixes
# Fix authentication
export PRAX_TOKEN="your-new-token"
rompy-oceanum config test-connection
# Reset configuration
rm ~/.rompy-oceanum/config.yaml
rompy-oceanum config init
# Clean cache
rompy-oceanum clean --all
# Force download
rompy-oceanum download run-id --force
Prevention Tips
Regular Testing: Test configurations in development before production
Monitor Resources: Track memory and CPU usage patterns
Update Regularly: Keep rompy-oceanum and dependencies updated
Backup Configurations: Version control your configuration files
Document Changes: Keep track of configuration modifications
Monitor Logs: Set up log monitoring for early issue detection
Remember: Most issues are configuration-related and can be resolved by carefully checking authentication, network connectivity, and configuration syntax.