Docker Deployment Guide
This guide covers deploying Soliplex Ingester using Docker Compose for production and development environments. While it can function on its own, it is mostly intended as a guide for creating customized deplolyments to match expected configurations.
Table of Contents
- Quick Start
- Service Overview
- Prerequisites
- Configuration
- Service Details
- Authentication Setup
- Production Deployment
- Monitoring and Maintenance
- Troubleshooting
Quick Start
Starting Services
-
Navigate to docker directory:
-
Start all services:
-
Verify services are running:
-
Access the application:
- Web UI: http://localhost:8002
- API Documentation: http://localhost:8002/docs
- PostgreSQL: localhost:5432
-
SeaweedFS: http://localhost:8333 (S3) / http://localhost:9333 (Admin)
-
View logs:
-
Stop services:
Service Overview
Architecture Diagram
graph TB
subgraph "Docker Network (soliplex_net)"
Ingester["Soliplex Ingester<br/>:8000→8002<br/>(API + Worker)"]
Postgres[("PostgreSQL<br/>:5432<br/>(Database)")]
HAProxy["HAProxy<br/>:5004<br/>(Load Balancer)"]
Ollama["Ollama<br/>(GPU)<br/>(Embeddings)"]
Docling1["Docling 1<br/>(GPU)"]
Docling2["Docling 2<br/>(GPU)"]
Docling3["Docling 3<br/>(GPU)"]
SeaweedFS[("SeaweedFS<br/>:8333 (S3)<br/>:9333 (Admin)")]
Ingester -->|Stores metadata| Postgres
Ingester -->|Parse requests| HAProxy
Ingester -->|Embeddings| Ollama
Ingester -.->|Optional S3| SeaweedFS
HAProxy -->|Round-robin<br/>Cookie-based| Docling1
HAProxy -->|Load balanced| Docling2
HAProxy -->|Load balanced| Docling3
end
User([User]) -->|http://localhost:8002| Ingester
style Ingester fill:#4a90e2,stroke:#2e5c8a,color:#fff
style Postgres fill:#336791,stroke:#224466,color:#fff
style HAProxy fill:#3ba13b,stroke:#2d7a2d,color:#fff
style Ollama fill:#ff6b6b,stroke:#cc5555,color:#fff
style Docling1 fill:#f39c12,stroke:#c27d0e,color:#fff
style Docling2 fill:#f39c12,stroke:#c27d0e,color:#fff
style Docling3 fill:#f39c12,stroke:#c27d0e,color:#fff
style SeaweedFS fill:#9b59b6,stroke:#7a4691,color:#fff
Required Services
| Service | Purpose | Required |
|---|---|---|
| soliplex_ingester | Main application (API + Worker) | Yes |
| postgres | Document and workflow database | Yes |
Optional Services
| Service | Purpose | Required |
|---|---|---|
| haproxy | Load balancer for Docling instances | No (but recommended for production) |
| docling (1-3) | PDF parsing with GPU acceleration | No (can parse without GPU or use external service) |
| ollama_img | Embedding generation with GPU | No (can use external embedding service) |
| seaweedfs | S3-compatible object storage | No (can use filesystem or cloud S3) |
Resource Requirements
Minimum (Development - No GPU): - CPU: 4 cores - RAM: 8 GB - Disk: 20 GB
Recommended (Production with GPU): - CPU: 16+ cores - RAM: 64+ GB - GPU: NVIDIA GPU with 24+ GB VRAM (for Docling + Ollama) - Disk: 100+ GB
Prerequisites
Required Software
-
Docker Engine 20.10+
-
Docker Compose 2.0+
For GPU Support
- NVIDIA Container Toolkit
Install NVIDIA Container Toolkit for GPU support:
Ubuntu/Debian:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU access:
Configuration
Environment Variables
Create a .env file in the docker/ directory:
# Database Configuration
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_secure_password_here
POSTGRES_DB=soliplex_attrib
DOC_DB_URL=postgresql+psycopg://soliplex_attrib:soliplex_attrib@postgres:5432/soliplex_attrib
# Storage Configuration
FILE_STORE_TARGET=fs # Options: fs, db, s3
FILE_STORE_DIR=/var/soliplex/file_store
LANCEDB_DIR=/var/soliplex/lancedb
# Worker Configuration
WORKER_TASK_COUNT=10 # Number of concurrent workflow steps
INGEST_WORKER_CONCURRENCY=20 # Document ingestion concurrency
# Docling Configuration
DOCLING_SERVER_URL=http://haproxy:5004/v1
DOCLING_HTTP_TIMEOUT=1200 # Timeout in seconds
DOCLING_CONCURRENCY=4 # Concurrent Docling requests
# Ollama Configuration (if using Ollama for embeddings)
OLLAMA_BASE_URL=http://ollama_img:11434
# SeaweedFS Configuration (if using S3 storage)
S3_ENDPOINT_URL=http://seaweedfs:8333
S3_ACCESS_KEY_ID=your_access_key
S3_SECRET_ACCESS_KEY=your_secret_key
S3_BUCKET_NAME=soliplex-artifacts
# Logging
LOG_LEVEL=INFO # Options: DEBUG, INFO, WARNING, ERROR
# API Configuration
API_KEY_ENABLED=false # Enable API key authentication
See .env.docker.example for a complete template.
Volume Management
The docker-compose configuration creates persistent volumes:
volumes:
postgres_data: # PostgreSQL database files
seaweedfs_data: # SeaweedFS object storage
docling_artifacts: # Docling temporary artifacts
ollama_img_data: # Ollama model files
Local bind mounts:
./file_store:/var/soliplex/file_store # Document artifacts
./lancedb:/var/soliplex/lancedb # Vector database
Backup volumes:
# Backup PostgreSQL
docker-compose exec postgres pg_dump -U postgres soliplex_attrib > backup.sql
# Backup volumes
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres_data.tar.gz -C /data .
Restore volumes:
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine sh -c "cd /data && tar xzf /backup/postgres_data.tar.gz"
GPU Configuration
The docker-compose.yml configures GPU access for Docling and Ollama services.
Important: Adjust device_ids based on your hardware:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['3'] # ← Change this to match your GPU ID
capabilities: [gpu]
Check available GPUs:
Example output:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
Multiple services sharing one GPU:
The default configuration runs 3 Docling instances + Ollama all on GPU 3. This works if: - GPU has sufficient VRAM (24+ GB recommended) - Memory limits are properly configured - Workload is I/O bound (services wait on data)
Distribute across GPUs:
# Docling on GPU 0
device_ids: ['0']
# Docling_2 on GPU 1
device_ids: ['1']
# Docling_3 on GPU 2
device_ids: ['2']
# Ollama on GPU 3
device_ids: ['3']
Network Configuration
All services run on the soliplex_net bridge network for internal communication.
Port Mappings:
| Host Port | Container Port | Service | Purpose |
|---|---|---|---|
| 8002 | 8000 | soliplex_ingester | Web UI & API |
| 5432 | 5432 | postgres | Database |
| 5004 | 5004 | haproxy | Docling load balancer |
| 5000 | 5001 | docling | Direct Docling access |
| 5001 | 5001 | docling_2 | Direct Docling access |
| 8333 | 8333 | seaweedfs | S3 API |
| 9333 | 9333 | seaweedfs | Admin UI |
Change ports if conflicts exist:
Service Details
Soliplex Ingester
Configuration:
soliplex_ingester:
image: soliplex_ingester:latest
environment:
DOC_DB_URL: postgresql+psycopg://soliplex_attrib:soliplex_attrib@postgres:5432/soliplex_attrib
FILE_STORE_TARGET: fs
FILE_STORE_DIR: /var/soliplex/file_store
LANCEDB_DIR: /var/soliplex/lancedb
WORKER_TASK_COUNT: 10
DOCLING_SERVER_URL: http://haproxy:5004/v1
DOCLING_HTTP_TIMEOUT: 1200
DOCLING_CONCURRENCY: 4
ports:
- "8002:8000"
volumes:
- ./file_store:/var/soliplex/file_store
- ./lancedb:/var/soliplex/lancedb
Building the image:
Scaling workers:
The default configuration runs the API server with an integrated worker. For production, separate them:
# API Server
soliplex_api:
image: soliplex_ingester:latest
command: si-cli serve --host 0.0.0.0 --workers 4
# ... rest of config
# Workers (scale as needed)
soliplex_worker:
image: soliplex_ingester:latest
command: si-cli worker
deploy:
replicas: 3
# ... rest of config
PostgreSQL
Configuration:
postgres:
image: postgres:18-trixie
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_INITDB_ARGS: "-A scram-sha-256"
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql
- ./pgsql/config/init.sql:/docker-entrypoint-initdb.d/init.sql
Initialization Script:
The init.sql script creates users and databases:
-- Create application user
CREATE USER soliplex_attrib WITH PASSWORD 'soliplex_attrib';
-- Create database
CREATE DATABASE soliplex_attrib OWNER soliplex_attrib;
-- Grant permissions
GRANT ALL PRIVILEGES ON DATABASE soliplex_attrib TO soliplex_attrib;
Production Security:
⚠️ The example uses weak passwords for development. For production:
- Use strong, randomly generated passwords
- Store credentials in Docker secrets or environment files
- Restrict network access
- Enable SSL/TLS connections
Connection string format:
Docling Services
Docling converts PDF documents to markdown and structured JSON.
Configuration (per instance):
docling:
image: ghcr.io/docling-project/docling-serve-cu128
environment:
DOCLING_SERVE_ENG_LOC_NUM_WORKERS: 4
DOCLING_SERVE_ARTIFACTS_PATH: "/artifacts"
DOCLING_NUM_THREADS: 16
UVICORN_WORKERS: 1
PYTORCH_CUDA_ALLOC_CONF: "expandable_segments:True"
DOCLING_SERVE_ENABLE_UI: 1
DOCLING_SERVE_MAX_SYNC_WAIT: 9999
NVIDIA_VISIBLE_DEVICES: "all"
DOCLING_SERVE_ENABLE_REMOTE_SERVICES: True
restart: "unless-stopped"
runtime: "nvidia"
volumes:
- docling_artifacts:/artifacts
deploy:
resources:
limits:
memory: 32000M
reservations:
devices:
- driver: nvidia
device_ids: ['3']
capabilities: [gpu]
Key Environment Variables:
DOCLING_SERVE_ENG_LOC_NUM_WORKERS: 4- Layout analysis workersDOCLING_NUM_THREADS: 16- Processing threadsPYTORCH_CUDA_ALLOC_CONF: "expandable_segments:True"- GPU memory optimizationDOCLING_SERVE_MAX_SYNC_WAIT: 9999- Long timeout for large documents
Memory Management:
⚠️ Docling is prone to memory leaks with long-running processes.
Mitigation strategies:
- Memory limits:
memory: 32000Mprevents OOM killing other services - Restart policy:
restart: unless-stoppedrecovers from crashes - Load balancing: Multiple instances with HAProxy provide redundancy
- Health checks: HAProxy routes around unhealthy instances
Without GPU:
Remove GPU configuration and use CPU-only image:
docling:
image: ghcr.io/docling-project/docling-serve # CPU-only
runtime: null # Remove nvidia runtime
deploy:
resources:
limits:
memory: 16000M
# Remove GPU reservation
HAProxy Load Balancer
HAProxy distributes requests across multiple Docling instances.
Configuration:
haproxy:
image: docker.io/library/haproxy:3.3-alpine
ports:
- 5004:5004
volumes:
- ./haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
HAProxy Configuration (haproxy/haproxy.cfg):
global
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 1200000ms
timeout server 1200000ms
frontend docling_frontend
bind *:5004
default_backend docling_backend
backend docling_backend
balance roundrobin
cookie SERVERID insert indirect nocache
server docling1 docling:5001 check cookie docling1
server docling2 docling_2:5001 check cookie docling2
server docling3 docling_3:5001 check cookie docling3
Load Balancing Strategy:
- Round-robin: Distributes requests evenly
- Cookie-based persistence: Same client goes to same server
- Health checks: Removes failed servers from rotation
Why cookie-based persistence?
Docling parsing is stateful for multi-step conversions. The ingester client uses cookies to ensure all requests for a document go to the same Docling instance.
Ollama
Ollama provides embedding generation for vector search.
Configuration:
ollama_img:
image: ollama/ollama:latest
container_name: ollama_img
volumes:
- ollama_img_data:/root/.ollama
restart: always
deploy:
resources:
limits:
memory: 32000M
reservations:
devices:
- driver: nvidia
device_ids: ['3']
capabilities: [gpu]
Pull models:
docker-compose exec ollama_img ollama pull nomic-embed-text
docker-compose exec ollama_img ollama pull llama2
List installed models:
Ingester Configuration:
Set in .env:
SeaweedFS
SeaweedFS provides S3-compatible object storage for artifacts.
Configuration:
seaweedfs:
image: ghcr.io/chrislusf/seaweedfs
command: server -s3 -s3.config=/config/config.json
ports:
- 8333:8333 # S3 API
- 9333:9333 # Admin UI
volumes:
- seaweedfs_data:/data
- ./seaweedfs/config:/config
Initialization:
The seaweedfs-init container runs setup:
seaweedfs-init:
image: ghcr.io/chrislusf/seaweedfs
entrypoint: ["/bin/sh"]
volumes:
- ./seaweedfs/config/init.sh:/init.sh
command: ["/init.sh"]
Example init.sh:
#!/bin/sh
# Wait for SeaweedFS to start
sleep 5
# Create bucket
curl -X PUT http://seaweedfs:8333/soliplex-artifacts
# Configure credentials (in config.json)
Ingester Configuration:
Set in .env:
FILE_STORE_TARGET=s3
S3_ENDPOINT_URL=http://seaweedfs:8333
S3_ACCESS_KEY_ID=your_access_key
S3_SECRET_ACCESS_KEY=your_secret_key
S3_BUCKET_NAME=soliplex-artifacts
Alternative: Use Cloud S3
FILE_STORE_TARGET=s3
S3_ENDPOINT_URL=https://s3.amazonaws.com
S3_ACCESS_KEY_ID=your_aws_key
S3_SECRET_ACCESS_KEY=your_aws_secret
S3_BUCKET_NAME=your-bucket-name
AWS_REGION=us-east-1
Authentication Setup
For production deployments with authentication, use docker-compose.auth.yml.
OAuth2 Proxy Stack
The auth configuration adds: - NGINX - Reverse proxy with SSL termination - OAuth2 Proxy - OIDC authentication - Soliplex Ingester - Configured to trust proxy headers
Start with authentication:
Configuration
- Create
.env.authfile:
See docker/.env.auth.example:
# OIDC Provider Configuration
OAUTH2_PROVIDER=oidc
OAUTH2_OIDC_ISSUER_URL=https://your-oidc-provider.com
OAUTH2_CLIENT_ID=your_client_id
OAUTH2_CLIENT_SECRET=your_client_secret
OAUTH2_REDIRECT_URL=https://your-domain.com/oauth2/callback
# OAuth2 Proxy Configuration
OAUTH2_COOKIE_SECRET=random_32_char_secret_here
OAUTH2_COOKIE_DOMAIN=your-domain.com
# Soliplex Ingester Configuration
AUTH_TRUST_PROXY_HEADERS=true
AUTH_USER_HEADER=X-Forwarded-User
AUTH_EMAIL_HEADER=X-Forwarded-Email
- Configure NGINX:
Edit docker/nginx/nginx.conf for your domain and SSL certificates.
- Configure OAuth2 Proxy:
Edit docker/oauth2-proxy/oauth2-proxy.cfg for your OIDC provider.
See AUTHENTICATION.md for detailed setup instructions.
Production Deployment
Security Best Practices
-
Use Strong Passwords:
-
Use Docker Secrets:
-
Restrict Network Access:
-
Enable SSL/TLS:
Use NGINX with Let's Encrypt certificates or cloud load balancer.
- Scan Images:
Scaling Configuration
For high throughput:
services:
soliplex_api:
command: si-cli serve --host 0.0.0.0 --workers 8
deploy:
replicas: 2
soliplex_worker:
command: si-cli worker
environment:
WORKER_TASK_COUNT: 20
deploy:
replicas: 5
docling:
deploy:
replicas: 5
resources:
limits:
memory: 24000M
Resource Limits
Set appropriate limits to prevent resource exhaustion:
services:
soliplex_ingester:
deploy:
resources:
limits:
cpus: '4.0'
memory: 8G
reservations:
cpus: '2.0'
memory: 4G
Health Checks
Add health checks for automatic recovery:
services:
soliplex_ingester:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/batch/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Logging
Configure logging drivers:
Or use external logging:
Monitoring and Maintenance
Health Checks
Check service status:
View resource usage:
Check specific service:
Log Management
View logs:
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f soliplex_ingester
# Last 100 lines
docker-compose logs --tail=100 docling
# Since timestamp
docker-compose logs --since 2026-01-22T10:00:00
Export logs:
Database Maintenance
Backup:
docker-compose exec postgres pg_dump -U postgres soliplex_attrib | gzip > backup_$(date +%Y%m%d).sql.gz
Restore:
Vacuum database:
Vector Database Maintenance
Vacuum LanceDB:
Check database size:
Updates
Update images:
Update single service:
Rebuild custom image:
Troubleshooting
Common Issues
Services Won't Start
Check logs:
Check configuration:
Validate environment variables:
Database Connection Errors
Error: connection to server at "postgres" (172.18.0.2), port 5432 failed
Solutions:
1. Verify postgres is running: docker-compose ps postgres
2. Check network: docker-compose exec soliplex_ingester ping postgres
3. Verify credentials in DOC_DB_URL
4. Check postgres logs: docker-compose logs postgres
Test connection:
docker-compose exec soliplex_ingester psql "postgresql://soliplex_attrib:soliplex_attrib@postgres:5432/soliplex_attrib"
GPU Not Available
Error: could not select device driver "" with capabilities: [[gpu]]
Solutions:
1. Install NVIDIA Container Toolkit (see Prerequisites)
2. Restart Docker daemon: sudo systemctl restart docker
3. Verify GPU access: docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
Check GPU usage:
Docling Memory Errors
Error: CUDA out of memory or container crashes
Solutions:
1. Reduce DOCLING_NUM_THREADS in docker-compose.yml
2. Reduce DOCLING_SERVE_ENG_LOC_NUM_WORKERS
3. Increase GPU memory by using dedicated GPU per instance
4. Process smaller documents or reduce concurrency
Monitor memory:
Port Conflicts
Error: bind: address already in use
Solutions: 1. Check what's using the port:
- Change port mapping in docker-compose.yml:
Volume Permission Errors
Error: permission denied when accessing volumes
Solutions: 1. Check ownership:
-
Fix permissions:
-
Or use Docker's user mapping:
SeaweedFS Connection Errors
Error: Unable to connect to S3 endpoint
Solutions:
1. Verify SeaweedFS is running: docker-compose ps seaweedfs
2. Check initialization completed: docker-compose logs seaweedfs-init
3. Test S3 endpoint:
- Verify bucket exists:
HAProxy Routing Issues
Symptom: Requests timing out or routing to unhealthy Docling instances
Debug: 1. Check HAProxy stats (if enabled):
-
View HAProxy logs:
-
Test Docling instances directly:
-
Restart HAProxy:
Performance Issues
Slow Document Processing
Diagnose:
# Check worker activity
curl http://localhost:8002/api/v1/workflow/steps?status=RUNNING
# Check CPU/memory
docker stats
# Check Docling queue
docker-compose logs docling | grep -i queue
Solutions:
1. Increase worker concurrency: WORKER_TASK_COUNT=20
2. Add more Docling instances
3. Increase DOCLING_CONCURRENCY
4. Scale workers: docker-compose up -d --scale soliplex_worker=5
High Memory Usage
Monitor:
Solutions: 1. Reduce concurrency settings 2. Increase memory limits 3. Add swap space (not recommended for production) 4. Use smaller batch sizes
Disk Space Issues
Check usage:
Clean up:
# Remove unused images
docker image prune -a
# Remove stopped containers
docker container prune
# Remove unused volumes
docker volume prune
# Clean everything
docker system prune -a --volumes
Recovery Procedures
Reset Everything
⚠️ This deletes all data!
Reset Database Only
docker-compose down
docker volume rm docker_postgres_data
docker-compose up -d postgres
# Wait for postgres to initialize
docker-compose exec postgres psql -U postgres -f /docker-entrypoint-initdb.d/init.sql
docker-compose up -d
Restart Single Service
Force Recreate Service
Additional Resources
- Soliplex Ingester Documentation: ../README.md
- API Reference: API.md
- Configuration Guide: CONFIGURATION.md
- Authentication Guide: AUTHENTICATION.md
- Docling Documentation: https://docling-project.github.io/docling/
- Docker Compose Documentation: https://docs.docker.com/compose/
- NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/
Getting Help
If you encounter issues not covered in this guide:
- Check the main troubleshooting guide
- Review service-specific logs
- Open an issue on GitHub with:
- Output of
docker-compose ps - Relevant logs from
docker-compose logs - Your docker-compose.yml modifications
- Environment details (OS, Docker version, GPU info)