Skip to content

Troubleshooting Guide

Tags: Reference, Troubleshooting

A comprehensive guide for diagnosing and resolving common issues in the EgyGeeks infrastructure.

404 Errors

Problem

Requests to a service return 404 Not Found errors, or the service is unreachable.

Causes & Solutions

Container Health Issues

Diagnosis:

# Check if container is running
docker ps | grep SERVICE_NAME

# Check container logs
docker logs CONTAINER_ID

# Inspect container health status
docker ps --format "table {{.Names}}\t{{.Status}}"

Solution: - Ensure the container is in Up state - Check for restart loops: docker inspect CONTAINER_ID | grep -A 5 State - Review container logs for startup errors - Restart the container if necessary: docker restart CONTAINER_ID

Traefik Labels & Routing Configuration

Diagnosis:

# Verify Traefik is running
docker ps | grep traefik

# Check Traefik dashboard (typically http://server-ip:8080)
# Look for routes in the HTTP section

# Verify container has proper labels
docker inspect CONTAINER_ID | grep -A 20 Labels

Required labels for Traefik:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.SERVICE.rule=Host(`domain.com`)"
  - "traefik.http.routers.SERVICE.entrypoints=web,websecure"
  - "traefik.http.services.SERVICE.loadbalancer.server.port=PORT"

Solution: - Verify all required labels are present in docker-compose.yml - Check that the service name in labels matches the container service name - Ensure the target port matches the application's listening port - Redeploy with docker-compose up -d to apply label changes

Network Connectivity

Diagnosis:

# Check if container is on correct network
docker network inspect traefik_public | grep CONTAINER_NAME

# Test connectivity from Traefik to service
docker exec TRAEFIK_CONTAINER ping SERVICE_CONTAINER_NAME

# Check DNS resolution within network
docker exec TRAEFIK_CONTAINER nslookup SERVICE_CONTAINER_NAME

Solution: - Verify service is connected to traefik_public network: networks: - traefik_public - Check network exists: docker network ls | grep traefik_public - Recreate network if missing: docker network create traefik_public - Ensure Traefik can reach the service on the configured port


Unhealthy Containers

Problem

Containers show as "Unhealthy" in docker ps output or fail to start.

Diagnosis & Solutions

Check health status:

docker ps --format "table {{.Names}}\t{{.Status}}"

View detailed health check output:

docker inspect CONTAINER_ID | grep -A 10 '"Health"'

Common issues:

Issue Diagnosis Solution
Failed health checks docker logs CONTAINER_ID shows errors Fix the underlying service error; check ports, dependencies
Startup takes too long Container exits before passing health check Increase start_period in health check configuration
Port not listening docker exec CONTAINER_ID netstat -tlnp or lsof -i Verify application started; check port in health check URL
Dependency not ready Container depends on database/service not yet running Use depends_on condition or wait script

Example health check configuration:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:PORT/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s


SSL/TLS Issues

Problem

SSL certificates not provisioning, HTTPS not working, or Let's Encrypt errors.

Diagnosis

Check Traefik certificate storage:

docker exec traefik ls -la /letsencrypt/
docker exec traefik cat /letsencrypt/acme.json | jq '.[] | keys'

View Traefik logs for Let's Encrypt activity:

docker logs traefik | grep -i "tls\|ssl\|certificate\|letsencrypt"

Test certificate renewal:

docker exec traefik traefik renew

Solutions

Certificate Not Provisioning

  • Verify domain is publicly accessible and resolves to server IP
  • Check that Traefik has internet access for Let's Encrypt validation
  • Ensure acme.json file exists and is writable: docker exec traefik ls -la /letsencrypt/
  • Verify Traefik configuration includes ACME/Let's Encrypt provider
  • Check DNS propagation: nslookup domain.com

Certificate Renewal Failing

  • Check certificate expiry: docker exec traefik openssl x509 -in /etc/ssl/certs/certificate.crt -noout -dates
  • Manually trigger renewal by redeploying Traefik
  • Review Traefik logs for rate limiting or DNS validation errors

HTTPS Not Working

  • Verify entrypoints include websecure: traefik.http.routers.SERVICE.entrypoints=web,websecure
  • Check that port 443 is accessible and not blocked by firewall
  • Ensure certificate is valid: curl -vI https://domain.com

Database Connection Issues

Problem

Application cannot connect to database; "Connection refused" or timeout errors.

Diagnosis

Check if database container is running:

docker ps | grep -i database
docker logs DATABASE_CONTAINER_ID

Test connectivity from application container:

docker exec APP_CONTAINER_ID nc -zv DATABASE_HOST DATABASE_PORT

Verify database service is listening:

docker exec DATABASE_CONTAINER_ID netstat -tlnp | grep PORT

Check connection environment variables:

docker exec APP_CONTAINER_ID env | grep -i db

Solutions

Issue Solution
Container not running docker ps - restart with docker-compose up -d DATABASE_NAME
Port mismatch Verify DATABASE_PORT matches listening port in container
Wrong hostname Use service name from docker-compose (not localhost): DATABASE_SERVICE_NAME:PORT
Network not shared Ensure both containers on same network: networks: - traefik_public
Credentials incorrect Verify environment variables: docker inspect APP_CONTAINER_ID \| grep -E "MYSQL\|POSTGRES"
Database not initialized Check logs for initialization errors; may need to reset volume

Test connection manually:

# For MySQL
docker exec APP_CONTAINER_ID mysql -h DATABASE_HOST -u USER -p PASSWORD -e "SELECT 1;"

# For PostgreSQL
docker exec APP_CONTAINER_ID psql -h DATABASE_HOST -U USER -d DATABASE -c "SELECT 1;"


Port Conflicts

Problem

Container fails to start with "port already in use" error or multiple services can't bind to same port.

Diagnosis

Find what's using a port:

# macOS/Linux
lsof -i :PORT

# Docker level
docker ps --format "table {{.Names}}\t{{.Ports}}"

Check all exposed ports:

docker ps --format "table {{.Names}}\t{{.Ports}}" | column -t

Solutions

  1. Identify conflicting service:

    lsof -i :PORT
    

  2. Choose action:

  3. If old container: docker rm CONTAINER_ID (stop first if needed)
  4. If system process: kill -9 PID (use with caution)
  5. If different service: Change port mapping in docker-compose.yml

  6. Update docker-compose.yml:

    services:
      service:
        ports:
          - "NEW_PORT:INTERNAL_PORT"
    

  7. Redeploy:

    docker-compose down
    docker-compose up -d
    


Build Failures

Problem

docker-compose build or docker build fails with errors.

Diagnosis

View full build logs:

docker-compose build --no-cache SERVICE_NAME

Check Dockerfile syntax:

docker run --rm -i hadolint/hadolint < Dockerfile

Common Build Issues

Error Cause Solution
FROM image not found Base image doesn't exist or wrong registry Verify image name; check Docker Hub; use docker pull to verify access
ADD/COPY failed File path incorrect or file doesn't exist Check relative paths; verify file exists in build context
RUN command not found Command doesn't exist in base image Check base image; install package with package manager
Permission denied Script not executable Add chmod +x before copying or in RUN command
Out of disk space Build creating large intermediate layers Clean up with docker system prune

Solutions

Clean build:

docker-compose build --no-cache SERVICE_NAME

Remove build cache:

docker builder prune

Check available disk space:

docker system df

Build verbosely to see where it fails:

docker-compose build SERVICE_NAME --progress=plain


General Troubleshooting Tips

  1. Always check logs first:

    docker logs CONTAINER_ID -f --tail=100
    

  2. View service status across the stack:

    docker-compose ps
    

  3. Inspect actual vs expected configuration:

    docker inspect CONTAINER_ID
    docker-compose config
    

  4. Restart everything (nuclear option):

    docker-compose down
    docker-compose up -d
    

  5. Network inspection:

    docker network inspect traefik_public
    


  • Server Setup: See server.md for server details and installed software
  • Infrastructure Architecture: See docs/journeys/infrastructure.md for system overview
  • Docker Compose Configuration: See docker-compose.yml for current service definitions