Troubleshooting Guide¶

This guide provides solutions to common issues encountered when deploying and operating the FastAPI HTTP/WebSocket application.

Table of Contents¶

Deployment Issues
Service Connectivity
Authentication & Authorization
Performance Issues
Database Problems
Redis Issues
Traefik Routing
Docker Container Issues
WebSocket Connection Problems
Rate Limiting Issues
Log Analysis
Emergency Procedures

Deployment Issues¶

Container Fails to Start¶

Symptoms: - Container exits immediately after starting - docker ps shows container not running - Exit code non-zero

Diagnosis:

# Check container logs
docker logs hw-server

# Check exit code
docker inspect hw-server --format='{{.State.ExitCode}}'

# Check recent events
docker events --since 10m

Common Causes & Solutions:

Missing Environment Variables:

# Check if .env files exist
ls -la .env.production docker/.srv_env docker/.pg_env docker/.kc_env

# Verify required variables are set
docker exec hw-server printenv | grep -E "DATABASE_URL|KEYCLOAK_BASE_URL|REDIS_IP"

Fix: Ensure all required variables are set in environment files.

Port Already in Use:

# Check what's using the port
sudo netstat -tulpn | grep :8000

Fix: Stop conflicting service or change port mapping.

Volume Permission Issues:

# Check volume permissions
docker exec hw-server ls -la /app

# Fix ownership
sudo chown -R 1000:1000 /path/to/volumes

Database Migration Failures¶

Symptoms: - Migration command fails - "Target database is not up to date" error - Duplicate column/table errors

Diagnosis:

# Check current migration version
docker exec hw-server alembic current

# View migration history
docker exec hw-server alembic history

# Check for pending migrations
docker exec hw-server alembic heads

Solutions:

Database Out of Sync:

# Check which migrations are applied
docker exec hw-db psql -U prod_user -d fastapi_prod \
  -c "SELECT * FROM alembic_version;"

# Stamp database at current code version
docker exec hw-server alembic stamp head

# Or downgrade and re-apply
docker exec hw-server alembic downgrade -1
docker exec hw-server alembic upgrade head

Migration Conflicts:

# Check for multiple heads
docker exec hw-server alembic heads

# Merge branches if needed
docker exec hw-server alembic merge <revision1> <revision2>

Failed Partial Migration:

# Manual rollback
docker exec hw-db psql -U prod_user -d fastapi_prod \
  -c "BEGIN; -- manually undo changes; COMMIT;"

# Update alembic_version table
docker exec hw-db psql -U prod_user -d fastapi_prod \
  -c "UPDATE alembic_version SET version_num='<previous_revision>';"

SSL Certificate Issues¶

Symptoms: - "Certificate verify failed" errors - HTTPS connections rejected - Let's Encrypt challenge fails

Diagnosis:

# Check Traefik logs
docker logs hw-traefik | grep -i certificate

# Check certificate status
docker exec hw-traefik ls -la /letsencrypt/

# Test certificate
curl -vI https://api.example.com 2>&1 | grep -A 10 "SSL certificate"

Solutions:

Let's Encrypt Rate Limiting:
Wait for rate limit reset (weekly limit: 50 certs per domain)

Use staging environment for testing:

# traefik.yml
certificatesResolvers:
  letsencrypt:
    acme:
      caServer: https://acme-staging-v02.api.letsencrypt.org/directory

DNS Not Propagated:

# Check DNS resolution
nslookup api.example.com
dig api.example.com

# Wait for DNS propagation (up to 48 hours)

Port 80 Not Accessible:

# Check firewall
sudo ufw status
sudo iptables -L -n | grep 80

# Test port 80 access
curl -I http://api.example.com/.well-known/acme-challenge/test

Service Connectivity¶

Cannot Connect to Application¶

Symptoms: - "Connection refused" errors - "No route to host" - Timeout errors

Diagnosis:

# Check if service is running
docker ps | grep hw-server

# Check if port is listening
docker exec hw-server netstat -tulpn | grep 8000

# Check health status
curl http://localhost:8000/health

# Check Traefik routing
curl http://localhost:8080/api/http/routers

Solutions:

Service Not Running:

# Restart service
docker-compose -f docker/docker-compose.yml restart hw-server

# Check startup logs
docker logs hw-server --tail 50

Network Issues:

# Check network configuration
docker network inspect hw-network

# Test connectivity between containers
docker exec hw-server ping hw-db
docker exec hw-server nc -zv hw-redis 6379

Firewall Blocking:

# Check firewall rules
sudo ufw status

# Allow necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

Inter-Service Communication Fails¶

Symptoms: - Application cannot reach database - Redis connection errors - Keycloak unreachable

Diagnosis:

# Check all services are on same network
docker network inspect hw-network | jq '.[0].Containers'

# Test DNS resolution
docker exec hw-server nslookup hw-db
docker exec hw-server nslookup hw-redis

# Test port connectivity
docker exec hw-server nc -zv hw-db 5432
docker exec hw-server nc -zv hw-redis 6379
docker exec hw-server nc -zv hw-keycloak 8080

Solutions:

Services Not on Same Network:

# Ensure all services have same network in docker-compose.yml
services:
  hw-server:
    networks:
      - hw-network
  hw-db:
    networks:
      - hw-network

Wrong Service Names:

# Use container names, not localhost
# ❌ Wrong: DATABASE_URL=postgresql://localhost:5432/db
# ✅ Correct: DATABASE_URL=postgresql://hw-db:5432/db

Restart All Services:

docker-compose -f docker/docker-compose.yml down
docker-compose -f docker/docker-compose.yml up -d

Authentication & Authorization¶

Keycloak Authentication Fails¶

Symptoms: - "Invalid token" errors - "Unauthorized" (401) responses - "Token signature verification failed"

Diagnosis:

# Check Keycloak is running
docker logs hw-keycloak | tail -50

# Test Keycloak health
curl http://localhost:8080/health

# Verify token endpoint
curl http://localhost:8080/realms/production/.well-known/openid-configuration

# Check application logs for auth errors
docker logs hw-server | grep -i "auth\|token\|keycloak"

Solutions:

Token Expired:

# Check token expiration settings in Keycloak
# Admin Console → Realm Settings → Tokens
# Access Token Lifespan: 5 minutes (default)
# Refresh Token Lifespan: 30 minutes (default)

# Get new token
curl -X POST http://localhost:8080/realms/production/protocol/openid-connect/token \
  -d "client_id=fastapi-app" \
  -d "client_secret=YOUR_SECRET" \
  -d "grant_type=password" \
  -d "username=user@example.com" \
  -d "password=password"

Wrong Keycloak Configuration:

# Verify environment variables
docker exec hw-server printenv | grep KEYCLOAK

# Should match:
# KEYCLOAK_BASE_URL=http://hw-keycloak:8080
# KEYCLOAK_REALM=production
# KEYCLOAK_CLIENT_ID=fastapi-app

Client Secret Mismatch:

# Get client secret from Keycloak
# Admin Console → Clients → fastapi-app → Credentials

# Update in .env.production
KEYCLOAK_CLIENT_SECRET=<secret-from-keycloak>

# Restart application
docker-compose restart hw-server

Permission Denied Errors¶

Symptoms: - "Permission denied" (403) responses - "Insufficient permissions" errors - User cannot access expected endpoints

Diagnosis:

# Check user roles in Keycloak
# Admin Console → Users → <user> → Role Mappings

# Check handler code for required roles
# WebSocket: @pkg_router.register(PkgID.*, roles=["role-name"])
# HTTP: dependencies=[Depends(require_roles("role-name"))]

# Check application logs
docker logs hw-server | grep -i "permission\|rbac"

Solutions:

User Missing Required Role:

# Add role to user in Keycloak
# Admin Console → Users → <user> → Role Mappings → Assign role

# Or via kcadm.sh
docker exec hw-keycloak /opt/keycloak/bin/kcadm.sh \
  add-roles -r production --uusername user@example.com --rolename admin

Check Handler Role Requirements:

# Example WebSocket handler
@pkg_router.register(
    PkgID.CREATE_AUTHOR,
    roles=["create-author", "admin"]  # Requires BOTH roles
)

# Example HTTP endpoint
@router.post(
    "/authors",
    dependencies=[Depends(require_roles("create-author", "admin"))]
)

# User must have ALL specified roles to access the endpoint

Token Not Decoded Properly:

# Check token contents
echo "eyJhbGc..." | cut -d'.' -f2 | base64 -d | jq

# Verify 'realm_access.roles' field exists

Performance Issues¶

Slow Response Times¶

Symptoms: - API requests take > 1 second - WebSocket messages delayed - Timeout errors

Diagnosis:

# Check application metrics
curl http://localhost:8000/metrics | grep http_request_duration

# Check database query performance
docker exec hw-db psql -U prod_user -d fastapi_prod \
  -c "SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"

# Check CPU/memory usage
docker stats hw-server hw-db hw-redis

# Check network latency
docker exec hw-server ping hw-db
docker exec hw-server time nc -zv hw-db 5432

Solutions:

Database Query Optimization:

-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Add indexes
CREATE INDEX idx_author_name ON author(name);
CREATE INDEX idx_book_author_id ON book(author_id);

Increase Connection Pool:

# In .env.production
DB_POOL_SIZE=30  # Increase from 20
DB_MAX_OVERFLOW=20  # Increase from 10

# Restart application
docker-compose restart hw-server

Scale Horizontally:

# docker-compose.yml
services:
  hw-server:
    deploy:
      replicas: 3  # Run 3 instances

Enable Caching:

# Add Redis caching for expensive queries
from app.storage.redis import RRedis

async def get_popular_authors():
    cache_key = "popular_authors"
    cached = await redis.get(cache_key)
    if cached:
        return json.loads(cached)

    authors = await fetch_from_db()
    await redis.setex(cache_key, 300, json.dumps(authors))
    return authors

High Memory Usage¶

Symptoms: - OOM (Out of Memory) errors - Container restarts - Memory usage > 90%

Diagnosis:

# Check memory usage
docker stats hw-server --no-stream

# Check container memory limit
docker inspect hw-server | jq '.[0].HostConfig.Memory'

# Check Python memory usage
docker exec hw-server python -c "import psutil; print(psutil.virtual_memory())"

Solutions:

Increase Memory Limit:

# docker-compose.yml
services:
  hw-server:
    deploy:
      resources:
        limits:
          memory: 4G  # Increase from 2G

Check for Memory Leaks:

# Use memory profiler
from memory_profiler import profile

@profile
def problematic_function():
    ...

# Run and check output
docker exec hw-server python -m memory_profiler app.py

Reduce Workers:

# CMD in Dockerfile or docker-compose command
CMD ["uvicorn", "app:application", "--workers", "2"]  # Reduce from 4

Database Problems¶

Cannot Connect to Database¶

Symptoms: - "could not connect to server" errors - "FATAL: password authentication failed" - "database does not exist"

Diagnosis:

# Check PostgreSQL is running
docker ps | grep hw-db

# Check PostgreSQL logs
docker logs hw-db | tail -50

# Test connection from application container
docker exec hw-server psql -h hw-db -U prod_user -d fastapi_prod -c "SELECT 1;"

# Check connection string
docker exec hw-server printenv DATABASE_URL

Solutions:

Database Not Ready:

# Wait for database to be healthy
docker-compose -f docker/docker-compose.yml up -d hw-db

# Check health status
docker inspect hw-db --format='{{.State.Health.Status}}'

# Wait and retry
sleep 10
docker-compose restart hw-server

Wrong Credentials:

# Verify credentials match
# .env.production: DATABASE_URL=postgresql://prod_user:PASSWORD@hw-db:5432/fastapi_prod
# docker/.pg_env: POSTGRES_USER=prod_user, POSTGRES_PASSWORD=PASSWORD

# Reset password if needed
docker exec hw-db psql -U postgres \
  -c "ALTER USER prod_user WITH PASSWORD 'new_password';"

Database Does Not Exist:

# Create database
docker exec hw-db psql -U postgres -c "CREATE DATABASE fastapi_prod;"

# Or recreate database container
docker-compose down hw-db
docker volume rm postgres-hw-data
docker-compose up -d hw-db

Database Locks/Deadlocks¶

Symptoms: - "deadlock detected" errors - Queries hanging indefinitely - "could not obtain lock" errors

Diagnosis:

-- Check active locks
SELECT locktype, relation::regclass, mode, granted, pid
FROM pg_locks
WHERE NOT granted;

-- Check blocking queries
SELECT blocked_locks.pid AS blocked_pid,
       blocked_activity.usename AS blocked_user,
       blocking_locks.pid AS blocking_pid,
       blocking_activity.usename AS blocking_user,
       blocked_activity.query AS blocked_statement,
       blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Solutions:

Kill Blocking Query:

-- Terminate blocking process
SELECT pg_terminate_backend(<blocking_pid>);

Prevent Long Transactions:

# Use short-lived transactions
async with async_session() as session:
    async with session.begin():
        # Keep transaction scope small
        await session.execute(stmt)
        # Don't do expensive operations here

Set Statement Timeout:

-- Set timeout for long-running queries
ALTER DATABASE fastapi_prod SET statement_timeout = '30s';

Redis Issues¶

Cannot Connect to Redis¶

Symptoms: - "Connection refused" errors - "NOAUTH Authentication required" - Rate limiting not working

Diagnosis:

# Check Redis is running
docker ps | grep hw-redis

# Test connection
docker exec hw-redis redis-cli ping

# Test from application container
docker exec hw-server redis-cli -h hw-redis ping

# Check Redis logs
docker logs hw-redis | tail -50

Solutions:

Redis Not Running:

# Restart Redis
docker-compose restart hw-redis

# Check health
docker exec hw-redis redis-cli ping

Authentication Required:

# Check if Redis requires password
docker exec hw-redis redis-cli CONFIG GET requirepass

# If yes, ensure REDIS_PASSWORD is set in .env.production
REDIS_PASSWORD=your_redis_password

# Test with password
docker exec hw-redis redis-cli -a your_redis_password ping

Wrong Redis DB:

# Check which DB application is using
docker exec hw-server printenv | grep REDIS

# Should be:
# MAIN_REDIS_DB=0
# AUTH_REDIS_DB=1

Redis Memory Issues¶

Symptoms: - "OOM command not allowed" errors - Redis crashes - High memory usage

Diagnosis:

# Check Redis memory usage
docker exec hw-redis redis-cli INFO memory

# Check max memory setting
docker exec hw-redis redis-cli CONFIG GET maxmemory

Solutions:

Increase Max Memory:

# docker/redis/redis.conf
maxmemory 2gb

# Or set at runtime
docker exec hw-redis redis-cli CONFIG SET maxmemory 2gb

# Restart Redis
docker-compose restart hw-redis

Configure Eviction Policy:

# docker/redis/redis.conf
maxmemory-policy allkeys-lru  # Evict least recently used keys

# Or set at runtime
docker exec hw-redis redis-cli CONFIG SET maxmemory-policy allkeys-lru

Clear Unused Keys:

# Find keys by pattern
docker exec hw-redis redis-cli KEYS "rate_limit:*"

# Clear old keys (be careful!)
docker exec hw-redis redis-cli FLUSHDB

Traefik Routing¶

404 Not Found Errors¶

Symptoms: - Traefik returns 404 for valid endpoints - "Service not found" errors

Diagnosis:

# Check Traefik dashboard
curl http://localhost:8080/api/http/routers | jq

# Check container labels
docker inspect hw-server | jq '.[0].Config.Labels'

# Check Traefik logs
docker logs hw-traefik | grep -i error

Solutions:

Missing Labels:

# docker-compose.yml
services:
  hw-server:
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.fastapi.rule=Host(`api.example.com`)"
      - "traefik.http.routers.fastapi.entrypoints=websecure"
      - "traefik.http.services.fastapi.loadbalancer.server.port=8000"

Restart Traefik:

docker-compose restart hw-traefik

# Verify routing rules
curl http://localhost:8080/api/http/routers

Check Service Discovery:

# Ensure service is on same network as Traefik
docker network inspect hw-network | jq '.[0].Containers'

SSL/TLS Redirect Loop¶

Symptoms: - Browser shows "too many redirects" - Infinite redirect between HTTP and HTTPS

Solutions:

# docker-compose.yml - Ensure Traefik knows it's behind a proxy
services:
  hw-server:
    labels:
      - "traefik.http.middlewares.secure-headers.headers.sslproxyheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.fastapi.middlewares=secure-headers"

Docker Container Issues¶

Container Keeps Restarting¶

Symptoms: - Container in restart loop - docker ps shows "Restarting" status

Diagnosis:

# Check restart count
docker inspect hw-server | jq '.[0].RestartCount'

# Check last exit code
docker inspect hw-server | jq '.[0].State.ExitCode'

# View all logs (before restart)
docker logs hw-server --timestamps

Solutions:

Application Crash:

# Check for Python exceptions
docker logs hw-server | grep -i "exception\|error\|traceback"

# Run container interactively to debug
docker run -it --rm --entrypoint /bin/bash hw-server

Health Check Failing:

# Test health check manually
docker exec hw-server curl -f http://localhost:8000/health

# Adjust health check parameters
# In Dockerfile:
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=5 \
  CMD curl -f http://localhost:8000/health || exit 1

Resource Limits:

# Check if hitting resource limits
docker stats hw-server --no-stream

# Increase limits in docker-compose.yml
deploy:
  resources:
    limits:
      cpus: '4'
      memory: 4G

Volume Permission Issues¶

Symptoms: - "Permission denied" errors when writing files - Cannot create directories

Solutions:

# Fix volume ownership
docker exec --user root hw-server chown -R appuser:appuser /app

# Or set ownership on host
sudo chown -R 1000:1000 /path/to/volume

# Ensure user ID matches
docker exec hw-server id
# uid=1000(appuser) gid=1000(appuser)

WebSocket Connection Problems¶

WebSocket Connection Rejected¶

Symptoms: - "Connection closed: 1006" - "Connection closed: 1008 (policy violation)" - Cannot establish WebSocket connection

Diagnosis:

# Check application logs
docker logs hw-server | grep -i websocket

# Test WebSocket connection
wscat -c ws://localhost:8000/web?access_token=TOKEN

# Check Traefik WebSocket configuration
curl http://localhost:8080/api/http/routers | jq '.[] | select(.name=="fastapi")'

Solutions:

Missing Access Token:

# WebSocket requires token in query string
wscat -c "ws://localhost:8000/web?access_token=YOUR_JWT_TOKEN"

Connection Limit Reached:

# Check active connections in Redis
docker exec hw-redis redis-cli SCARD "ws_connections:user123"

# Increase limit in .env.production
WS_MAX_CONNECTIONS_PER_USER=10  # Increase from 5

# Restart application
docker-compose restart hw-server

Traefik Not Forwarding WebSocket:

# docker-compose.yml
services:
  hw-server:
    labels:
      # Ensure WebSocket headers are preserved
      - "traefik.http.routers.fastapi.rule=Host(`api.example.com`)"
      # Traefik v3 handles WebSocket automatically, but verify:
      - "traefik.http.services.fastapi.loadbalancer.passhostheader=true"

WebSocket Messages Not Received¶

Symptoms: - Messages sent but no response - Connection stays open but silent

Diagnosis:

# Check application logs for message processing
docker logs hw-server | grep "pkg_id\|req_id"

# Check rate limiting
docker logs hw-server | grep "rate limit"

# Test with wscat
wscat -c "ws://localhost:8000/web?access_token=TOKEN"
> {"pkg_id": 1, "req_id": "test-123", "data": {}}

Solutions:

Invalid Message Format:

// Correct format
{
  "pkg_id": 1,
  "req_id": "550e8400-e29b-41d4-a716-446655440000",
  "data": {}
}

Handler Not Registered:

# Check registered handlers
make ws-handlers

# Or check logs at startup
docker logs hw-server | grep "Registered handler"

Rate Limit Hit:

# Check rate limit settings
docker exec hw-server printenv WS_MESSAGE_RATE_LIMIT

# Increase if needed
WS_MESSAGE_RATE_LIMIT=200  # In .env.production

Rate Limiting Issues¶

False Positive Rate Limits¶

Symptoms: - Users getting 429 errors incorrectly - Rate limit triggers too quickly

Diagnosis:

# Check rate limit settings
docker exec hw-server printenv | grep RATE_LIMIT

# Check Redis rate limit keys
docker exec hw-redis redis-cli KEYS "rate_limit:*"

# Check specific user's rate limit
docker exec hw-redis redis-cli GET "rate_limit:user:user123"

Solutions:

Increase Rate Limits:

# .env.production
RATE_LIMIT_PER_MINUTE=120  # Increase from 60
RATE_LIMIT_BURST=20  # Increase from 10
WS_MESSAGE_RATE_LIMIT=200  # Increase from 100

# Restart application
docker-compose restart hw-server

Clear Rate Limit Keys:

# Clear specific user
docker exec hw-redis redis-cli DEL "rate_limit:user:user123"

# Clear all rate limit keys (careful!)
docker exec hw-redis redis-cli KEYS "rate_limit:*" | \
  xargs docker exec hw-redis redis-cli DEL

Exclude Specific Endpoints:

# app/middlewares/rate_limit.py
EXCLUDED_PATHS = [
    r"^/health$",
    r"^/metrics$",
    r"^/docs$",
    r"^/internal/.*",  # Add internal endpoints
]

Log Analysis¶

Finding Errors in Logs¶

Common LogQL Queries:

# Recent errors
{service="shell"} | json | level="ERROR"

# Authentication failures
{service="shell"} | json | logger=~"app.auth.*" |~ "(?i)(failed|invalid|denied)"

# Database errors
{service="shell"} | json |~ "(?i)(database|postgres|sqlalchemy)" | level="ERROR"

# Slow queries (requires duration_ms field)
{service="shell"} | json | duration_ms > 1000

# WebSocket errors
{service="shell"} | json | logger=~"app.api.ws.*" | level="ERROR"

# Rate limit violations
{service="shell"} | json |~ "(?i)(rate limit|429|too many requests)"

# Specific user activity
{service="shell"} | json | user_id="user123"

# Specific endpoint
{service="shell"} | json | endpoint=~"/api/authors.*"

Analyzing Performance Issues¶

# HTTP request duration
{service="shell"} | json | logfmt | line_format "{{.method}} {{.endpoint}} {{.duration_ms}}ms"

# Database query performance
{service="shell"} | json |~ "(?i)query" | line_format "{{.message}} {{.duration_ms}}ms"

# Top error messages
{service="shell"} | json | level="ERROR" | line_format "{{.message}}" | count by message

Emergency Procedures¶

Application Down¶

Immediate Actions:

Check service health:

docker ps | grep hw-
curl http://localhost:8000/health

Restart failed services:

docker-compose -f docker/docker-compose.yml restart hw-server

Check recent logs:

docker logs hw-server --tail 100
docker logs hw-traefik --tail 100

If restart fails, rollback:

git log --oneline -5
git checkout <previous-working-commit>
docker-compose down
docker-compose up -d

Database Corruption¶

Immediate Actions:

Stop application:
```
docker-compose stop hw-server
```

Check database integrity:

docker exec hw-db pg_dump -U prod_user fastapi_prod > emergency_backup.sql

Restore from backup:

docker exec hw-db psql -U postgres -c "DROP DATABASE fastapi_prod;"
docker exec hw-db psql -U postgres -c "CREATE DATABASE fastapi_prod;"
docker exec -i hw-db psql -U prod_user fastapi_prod < latest_backup.sql

Restart application:
```
docker-compose start hw-server
```

Security Incident¶

Immediate Actions:

Isolate affected services:

# Disconnect from network
docker network disconnect hw-network hw-server

Review audit logs:

docker logs hw-server | grep -i "suspicious\|attack\|unauthorized"

Block malicious IPs (if applicable):
```
sudo ufw deny from <malicious-ip>
```

Rotate credentials:

# Generate new secrets
openssl rand -hex 32

# Update .env.production
# Restart services
docker-compose restart

Complete System Failure¶

Recovery Steps:

Document current state:

docker ps -a > system_state.txt
docker logs hw-server > logs_server.txt
docker logs hw-db > logs_db.txt

Stop all services:

docker-compose -f docker/docker-compose.yml down

Restore from backups:

# Restore database
docker volume rm postgres-hw-data
docker volume create postgres-hw-data
docker-compose up -d hw-db
docker exec -i hw-db psql -U prod_user fastapi_prod < backup.sql

# Restore Redis data if needed
docker volume rm redis-hw-data
docker volume create redis-hw-data

Start services gradually:

docker-compose up -d hw-db hw-redis
sleep 10
docker-compose up -d hw-keycloak
sleep 10
docker-compose up -d hw-server
docker-compose up -d hw-traefik

Verify system health:

curl http://localhost:8000/health
docker ps
docker-compose logs --tail 50

Getting Help¶

If issues persist after trying these solutions:

Check application logs in Grafana:
http://localhost:3000/d/application-logs
Filter by service, level, endpoint
Review metrics in Prometheus:
http://localhost:9090
Check for anomalies
Consult documentation:
Monitoring Guide
Backup/Recovery Guide
Security Guide
Contact support:
GitHub Issues: https://github.com/acikabubo/fastapi-http-websocket/issues
Internal documentation: Confluence/Wiki
On-call rotation: PagerDuty

Troubleshooting Guide¶

Table of Contents¶

Deployment Issues¶

Container Fails to Start¶

Database Migration Failures¶

SSL Certificate Issues¶

Service Connectivity¶

Cannot Connect to Application¶

Inter-Service Communication Fails¶

Authentication & Authorization¶

Keycloak Authentication Fails¶

Permission Denied Errors¶

Performance Issues¶

Slow Response Times¶

High Memory Usage¶

Database Problems¶

Cannot Connect to Database¶

Database Locks/Deadlocks¶

Redis Issues¶

Cannot Connect to Redis¶

Redis Memory Issues¶

Traefik Routing¶

404 Not Found Errors¶

SSL/TLS Redirect Loop¶

Docker Container Issues¶

Container Keeps Restarting¶

Volume Permission Issues¶

WebSocket Connection Problems¶

WebSocket Connection Rejected¶

WebSocket Messages Not Received¶

Rate Limiting Issues¶

False Positive Rate Limits¶

Log Analysis¶

Finding Errors in Logs¶

Analyzing Performance Issues¶

Emergency Procedures¶

Application Down¶

Database Corruption¶

Security Incident¶

Complete System Failure¶

Getting Help¶

Additional Resources¶