Chapter 3: Monitoring & Logs¶

Progress Indicator¶

Module 3: Going Live
├─ Chapter 1: Custom Domains
├─ Chapter 2: Environment Secrets
└─ Chapter 3: Monitoring & Logs     ← YOU ARE HERE

Learning Objectives¶

By the end of this chapter, you will: - Access and read Docker container logs - Monitor application health and status - Debug common issues using logs - Set up health checks for automatic restarts - Monitor system resources and performance - Understand when and how to troubleshoot

Prerequisites¶

Completed Chapters 1 & 2
Application deployed and running
Basic understanding of logs and debugging
Docker and Docker Compose installed

Step 1: Understanding Application Logs¶

What Are Logs?¶

Logs are timestamped records of events that happen in your application. They help you: - Understand what your application is doing - Debug issues and errors - Monitor performance - Audit user activity

Log Levels¶

Understanding log levels helps you prioritize issues:

Level	Purpose	Example
ERROR	Critical problems	Database connection failed
WARN	Warnings and issues	Deprecated API usage
INFO	General information	Request received, user logged in
DEBUG	Detailed debugging info	Variable values, function calls
TRACE	Very detailed info	Every step of execution

Step 2: Access Docker Logs¶

View Logs from Running Container¶

See all logs from a service:

# View logs from your application
docker-compose logs app

# View logs from a specific service
docker-compose logs database
docker-compose logs traefik

See logs in real-time (follow mode):

# Watch logs as they happen (like tail -f)
docker-compose logs -f app

# Exit with Ctrl+C

Limit logs by number of lines:

# Show only the last 50 lines
docker-compose logs --tail=50 app

# Show last 100 lines and follow
docker-compose logs -f --tail=100 app

Filter logs by service:

# View logs from multiple services
docker-compose logs app database

# View logs from all services
docker-compose logs

Get Logs with Timestamps¶

# Show timestamps (helpful for debugging)
docker-compose logs --timestamps app

# Show last hour of logs
docker-compose logs --tail=all --timestamps app

Real-World Examples¶

Find errors in your application:

# Show last 100 lines
docker-compose logs --tail=100 app

# Look for ERROR or Exception lines
docker-compose logs app | grep -i error
docker-compose logs app | grep -i exception

Debug database connectivity:

# Check database service logs
docker-compose logs database

# Check application's database connection attempts
docker-compose logs app | grep -i "database\|connection\|postgres"

Monitor request traffic:

# Watch Traefik logs for incoming requests
docker-compose logs -f traefik

# Watch application request logs
docker-compose logs -f app | grep -i "GET\|POST\|PUT\|DELETE"

Step 3: Monitor Service Health¶

Check Running Services¶

# See status of all services
docker-compose ps

# Expected output:
# NAME                 COMMAND              SERVICE      STATUS
# myapp-app-1         "npm start"          app          Up 2 hours
# myapp-database-1    "postgres"           database     Up 2 hours
# myapp-traefik-1     "/traefik"           traefik      Up 2 hours

Interpret Service Status¶

Status	Meaning	Action
Up X hours	Service running normally	No action needed
Restarting	Service crashing and restarting	Check logs immediately
Exited	Service stopped	Restart: `docker-compose up -d`
Dead	Service encountered fatal error	Check logs: `docker-compose logs`

Step 4: Set Up Health Checks¶

Add Health Checks to Services¶

Health checks help Docker automatically restart unhealthy services:

version: '3.8'

services:
  app:
    image: your-app:latest
    healthcheck:
      # Command to check if service is healthy
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      # Wait 30 seconds before starting health checks
      start_period: 30s
      # Check every 10 seconds
      interval: 10s
      # Wait 5 seconds for response
      timeout: 5s
      # Mark unhealthy after 3 failed checks
      retries: 3
    restart: unless-stopped

  database:
    image: postgres:15
    healthcheck:
      # Check if PostgreSQL is ready
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      start_period: 10s
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

Create Health Endpoint in Application¶

Your application should expose a /health endpoint:

Node.js/Express Example:

// Health check endpoint
app.get('/health', (req, res) => {
  // Check database connection
  const dbHealthy = checkDatabaseConnection();

  if (dbHealthy) {
    res.status(200).json({ status: 'healthy', timestamp: new Date() });
  } else {
    res.status(503).json({ status: 'unhealthy', reason: 'database' });
  }
});

function checkDatabaseConnection() {
  try {
    // Quick database query to verify connection
    db.query('SELECT 1');
    return true;
  } catch (err) {
    console.error('Health check failed:', err);
    return false;
  }
}

Python/Flask Example:

from flask import Flask, jsonify
from datetime import datetime

app = Flask(__name__)

@app.route('/health', methods=['GET'])
def health_check():
    try:
        # Check database connection
        if check_db_connection():
            return jsonify({
                'status': 'healthy',
                'timestamp': datetime.utcnow().isoformat()
            }), 200
        else:
            return jsonify({
                'status': 'unhealthy',
                'reason': 'database'
            }), 503
    except Exception as e:
        return jsonify({
            'status': 'unhealthy',
            'error': str(e)
        }), 503

def check_db_connection():
    try:
        db.session.execute('SELECT 1')
        return True
    except:
        return False

Monitor Health Check Status¶

# View health status in container inspect
docker-compose ps

# Get detailed health info
docker inspect myapp-app-1 | grep -A 20 "Health"

# Monitor health checks in real-time
watch 'docker-compose ps'

Step 5: Common Issues and Debugging¶

Application Not Starting¶

Problem: Service shows "Exited" or "Restarting"

Debug steps:

# 1. Check logs for error messages
docker-compose logs app

# 2. Look for the exit code (if shown)
# Exit code 0 = normal stop
# Exit code 1-127 = application error
# Exit code 128+ = signal (e.g., 143 = killed)

# 3. Check if port is already in use
netstat -tulpn | grep :3000

# 4. If environment variables are missing
docker-compose config | grep -A 10 "environment:"

# 5. Restart with debug output
docker-compose logs -f app &
docker-compose up app

Memory Leaks / High CPU Usage¶

Problem: Service using too much memory or CPU

Monitor resource usage:

# Show real-time resource usage
docker stats

# Monitor specific container
docker stats myapp-app-1

# Output example:
# CONTAINER      CPU %   MEM USAGE / LIMIT   MEM %
# myapp-app-1    2.15%   250MiB / 1GiB       25%

Check logs for issues:

# Look for memory-related errors
docker-compose logs app | grep -i "memory\|oom\|heap"

# Check for infinite loops or excessive logging
docker-compose logs --tail=1000 app | tail -50

# Look for repeated error messages
docker-compose logs app | sort | uniq -c | sort -rn | head -20

Database Connection Issues¶

Problem: Application can't connect to database

Debug:

# 1. Verify database is running
docker-compose ps database

# 2. Check database logs
docker-compose logs database

# 3. Verify network connectivity
docker-compose exec app ping database

# 4. Test connection manually
docker-compose exec app psql -h database -U postgres -d myapp

# 5. Check environment variables
docker-compose exec app env | grep DATABASE

Networking Issues¶

Problem: Services can't communicate with each other

Debug:

# 1. Check networks are defined
docker network ls | grep myapp

# 2. Verify services are on same network
docker network inspect myapp_app-network

# 3. Test connectivity between services
docker-compose exec app ping database
docker-compose exec app curl -v http://traefik:8080

# 4. Check service names are correct
# Services communicate using service names as hostnames
# In docker-compose, "database" is resolved to the database service

Step 6: Debugging Workflow¶

Systematic Troubleshooting¶

Follow this workflow when something goes wrong:

1. CHECK SERVICE STATUS
   └─ docker-compose ps
   └─ Is service running? If not → Start it: docker-compose up -d

2. EXAMINE LOGS
   └─ docker-compose logs service-name
   └─ Look for errors, stack traces, error codes
   └─ Use grep to filter: docker-compose logs | grep -i error

3. CHECK CONFIGURATION
   └─ docker-compose config
   └─ Verify environment variables, ports, volumes
   └─ Ensure service dependencies are running

4. TEST CONNECTIVITY
   └─ docker-compose exec app ping other-service
   └─ docker-compose exec app curl http://localhost:3000
   └─ Verify ports are correct

5. CHECK RESOURCES
   └─ docker stats
   └─ Is service out of memory? Disk full?
   └─ Is CPU maxed out?

6. RESTART IF NEEDED
   └─ docker-compose restart service-name
   └─ Or: docker-compose down && docker-compose up -d

Useful Debug Commands¶

# View entire application configuration
docker-compose config

# Show only environment variables
docker-compose config | grep -A 100 "environment:"

# Execute command in running container
docker-compose exec app sh
# Then explore inside container:
ls -la
env
ps aux
netstat -tulpn

# View container file system
docker-compose exec app ls -la /app
docker-compose exec app cat /app/config.json

# Check container's network
docker-compose exec app ifconfig
docker-compose exec app ip addr

Step 7: Log Aggregation Setup (Optional)¶

Centralize Logs for Easier Access¶

For production, consider centralizing logs:

Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)

version: '3.8'

services:
  app:
    image: your-app:latest
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  logstash:
    image: docker.elastic.co/logstash/logstash:8.0.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    ports:
      - "5000:5000"

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
    environment:
      - discovery.type=single-node

  kibana:
    image: docker.elastic.co/kibana/kibana:8.0.0
    ports:
      - "5601:5601"

Option 2: Simple Log Rotation

services:
  app:
    image: your-app:latest
    logging:
      driver: json-file
      options:
        # Keep only last 10 logs, max 10MB each
        max-size: "10m"
        max-file: "10"

Verification Checklist¶

Verify your monitoring setup:

# ✓ All services running
docker-compose ps
# All should show "Up" status

# ✓ Can access logs
docker-compose logs app
# Should show recent application output

# ✓ Health checks working
docker inspect myapp-app-1 | grep -A 5 "Health"
# Should show "Status: healthy"

# ✓ Can execute commands in container
docker-compose exec app echo "test"
# Should print "test"

# ✓ Services can communicate
docker-compose exec app ping database
# Should get responses, no timeouts

# ✓ Resource usage is reasonable
docker stats --no-stream
# Should show healthy CPU and memory usage

Common Monitoring Issues¶

Can't Access Logs¶

Problem: docker-compose logs shows nothing or old logs

Solutions:

# Logs might be rotated - increase max-file
# Or check if service is actually running
docker-compose ps app

# View live logs instead
docker-compose logs -f app

# Clear and restart
docker-compose restart app
docker-compose logs -f app

Health Check Always Failing¶

Problem: Service keeps restarting because health check fails

Solutions:

# Test health endpoint manually
docker-compose exec app curl http://localhost:3000/health

# Check if endpoint exists in your application
docker-compose logs app | grep health

# Increase start_period to give app more time to start
# Update docker-compose.yml:
# start_period: 60s  # Give 60 seconds to start

# Verify curl or healthcheck command exists in container
docker-compose exec app which curl

Logs Growing Too Large¶

Problem: Log files consuming too much disk space

Solutions:

# Enable log rotation in docker-compose.yml
logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "3"

# Or use syslog driver
logging:
  driver: syslog
  options:
    syslog-address: "udp://127.0.0.1:514"

AI Prompts for This Lesson¶

Viewing & Filtering Logs¶

Find Specific Errors in Logs

I need to find and analyze errors in my application logs.

Service: [app/database/traefik]
Time period: [last hour/today/specific time range]

Generate commands to:
1. View last 200 lines of logs
2. Filter for ERROR level messages
3. Search for specific error text: [error message]
4. Show logs with timestamps
5. Export logs to a file for analysis

Also suggest what to look for in the output.

Monitor Real-Time Logs

I want to monitor my application in real-time.

Services to monitor:
- app (Node.js on port 3000)
- database (PostgreSQL)
- traefik (reverse proxy)

Generate:
1. Command to tail all service logs together
2. How to filter for specific log levels
3. How to follow logs from multiple services
4. How to search logs as they appear
5. Best practices for production monitoring

Include color-coding tips if possible.

Health Check Configuration¶

Create Health Check Endpoint

I need to add health checks to my application.

Technology: [Node.js/Python/Go/etc]
Framework: [Express/Django/Gin/etc]
Dependencies to check:
- PostgreSQL database
- Redis cache
- External API: [service name]

Generate:
1. Health check endpoint code (/health route)
2. Check database connection
3. Check Redis connection
4. Check external service availability
5. Return proper HTTP status codes (200/503)

Include complete implementation with error handling.

Docker Compose Health Check Setup

Add health checks to my docker-compose.yml.

Services:
- app (web server on port 3000, has /health endpoint)
- database (PostgreSQL 15)
- redis (cache server)

Generate complete docker-compose.yml configuration with:
1. Health check commands for each service
2. Appropriate intervals and timeouts
3. Retry policies
4. Restart policies
5. Dependencies between services

Make it production-ready with best practices.

Debugging Common Issues¶

Application Keeps Restarting

My application container keeps restarting.

Output from `docker-compose ps`:
[paste output - shows Restarting status]

Output from `docker-compose logs --tail=100 app`:
[paste last 100 lines of logs]

Exit code (if shown): [number]

Service configuration:
- Image: [image name]
- Port: [port]
- Dependencies: [database, redis, etc]

Help me diagnose and fix the restart loop.

Database Connection Failed

My app can't connect to the database.

Error message from logs:
[paste error message]

Output from `docker-compose ps database`:
[paste output]

Output from `docker-compose logs database | tail -50`:
[paste database logs]

Output from `docker-compose exec app ping database`:
[paste ping result]

My DATABASE_URL: postgresql://user:****@database:5432/dbname

What's wrong with the database connection?

High Memory Usage

My container is using too much memory.

Output from `docker stats --no-stream`:
[paste output showing memory usage]

Service: [app/database/etc]
Expected memory: [e.g., 500MB]
Actual memory: [e.g., 2GB]

Output from `docker-compose logs app | grep -i "memory\|oom"`:
[paste any memory-related logs]

Help me:
1. Identify what's causing high memory usage
2. Find memory leaks in logs
3. Set appropriate memory limits
4. Fix the issue

Performance Monitoring¶

Setup Resource Monitoring

I want to monitor resource usage for all my services.

Services:
- app (Node.js)
- database (PostgreSQL)
- redis (cache)
- traefik (proxy)

Generate:
1. Commands to monitor CPU and memory usage
2. How to set resource limits in docker-compose.yml
3. Alerts for high resource usage
4. Log rotation configuration
5. Best practices for production monitoring

Include specific limits for each service type.

Analyze Slow Performance

My application is running slowly.

Symptoms:
- Slow response times: [e.g., 5-10 seconds]
- High CPU usage: [percentage]
- Users reporting: [specific issues]

Output from `docker stats`:
[paste output]

Recent logs from `docker-compose logs --tail=100 app`:
[paste logs]

Help me:
1. Identify the bottleneck
2. Analyze logs for performance issues
3. Check database query performance
4. Optimize resource allocation
5. Fix the slow performance

Technology: [your stack]

Log Rotation & Management¶

Setup Log Rotation

My Docker logs are growing too large.

Current log sizes:
[paste output from `du -sh /var/lib/docker/containers/*`]

Services:
- app (high traffic application)
- database (PostgreSQL)
- traefik (proxy for all requests)

Generate docker-compose.yml logging configuration:
1. Limit log file size (max 10MB per file)
2. Keep last 5 log files
3. Rotate logs automatically
4. Compress old logs
5. Prevent disk space issues

Include configuration for all services.

Logs Consuming All Disk Space

My server is running out of disk space due to logs.

Output from `df -h`:
[paste output showing disk usage]

Output from `du -sh /var/lib/docker/containers/* | sort -h`:
[paste largest log files]

Immediate actions needed:
1. Clear old logs safely
2. Setup log rotation
3. Prevent this from happening again
4. Monitor disk space

Provide emergency cleanup commands and permanent solution.

Debugging Workflow¶

Create Debugging Checklist

Generate a systematic debugging checklist for my stack.

Application:
- Technology: [Node.js/Python/etc]
- Framework: [Express/Django/etc]
- Database: [PostgreSQL/MySQL/etc]
- Other services: [Redis, etc]

Create step-by-step debugging workflow:
1. How to check if services are running
2. Commands to examine logs
3. Network connectivity tests
4. Database connection verification
5. Resource usage checks
6. Common issues and solutions

Include all Docker commands I'll need.

Service Not Responding

My service is not responding to requests.

Service: [app/api/etc]
URL: [http://yourdomain.com]

Output from `docker-compose ps`:
[paste - shows service status]

Output from `docker-compose logs --tail=50 app`:
[paste recent logs]

Output from `curl -v http://localhost:3000`:
[paste curl output or error]

Output from `docker-compose exec app ps aux`:
[paste process list]

Walk me through debugging this step by step.

Production Monitoring Setup¶

Setup Production Monitoring

I'm deploying to production and need comprehensive monitoring.

Stack:
- Application: [technology]
- Database: [PostgreSQL/MySQL/etc]
- Cache: [Redis/Memcached/etc]
- Reverse Proxy: Traefik

Setup monitoring for:
1. Application health checks
2. Database performance
3. Resource usage (CPU, memory, disk)
4. Request/response times
5. Error rates
6. Log aggregation

Generate docker-compose.yml configuration and monitoring strategy.
Include free/open-source tools if possible.

Setup Alerts & Notifications

I want to be notified when issues occur.

Scenarios to monitor:
- Service crashes or restarts
- High memory usage (>80%)
- High CPU usage (>90%)
- Disk space low (<10% free)
- Database connection failures
- SSL certificate expiring

How to setup:
1. Health check alerts
2. Resource threshold alerts
3. Notification channels (email, Slack, etc)
4. Alert rules in docker-compose
5. Testing alerts work

Prefer simple, lightweight solutions.

Traefik Specific Issues¶

Traefik Not Routing Requests

Traefik is running but not routing to my application.

Domain: [yourdomain.com]
Expected: Route to app service on port 3000

Output from `docker-compose logs traefik | tail -100`:
[paste logs]

My Traefik labels in docker-compose.yml:
[paste your Traefik labels]

Output from `curl -v http://yourdomain.com`:
[paste response]

Traefik dashboard (http://localhost:8080) shows:
[describe what you see]

What's wrong with the Traefik configuration?

SSL Certificate Errors in Traefik

Traefik can't obtain SSL certificates.

Domain: [yourdomain.com]

Output from `docker-compose logs traefik | grep -i "letsencrypt\|acme"`:
[paste relevant logs]

Browser error: [SSL/certificate error message]

My Traefik ACME configuration:
[paste ACME config from docker-compose.yml]

Output from `ls -la acme.json`:
[paste file permissions]

Help me fix the SSL certificate issue.

Advanced Debugging¶

Debug Network Issues Between Containers

My containers can't communicate with each other.

Services:
- app (needs to connect to database)
- database (PostgreSQL)
- redis (cache)

Output from `docker network ls`:
[paste output]

Output from `docker network inspect [network-name]`:
[paste network details]

Output from `docker-compose exec app ping database`:
[paste result]

Help me:
1. Verify network configuration
2. Test connectivity between services
3. Fix network issues
4. Ensure proper DNS resolution

Export Logs for Analysis

I need to export and analyze logs offline.

Services: [app, database, traefik]
Time period: [last 24 hours]

Generate commands to:
1. Export logs from all services
2. Filter logs by time range
3. Save to files with timestamps
4. Compress for transfer
5. Analyze patterns in logs

Format: [JSON/plain text/etc]
Destination: [local file/remote server]

What's Next¶

Congratulations! You've completed Module 3: Going Live!

→ Module 4: Team Workflow

Learn how to invite team members, manage permissions, and set up collaborative deployment workflows.

Production Monitoring Best Practices¶

Practice	Why	Frequency
Check logs	Catch errors early	Daily
Monitor resources	Prevent outages	Continuous
Review health checks	Ensure reliability	Hourly
Rotate logs	Save disk space	Weekly
Update alerts	Stay informed	Real-time

Help & Support¶

Log confusion? Use docker-compose logs --timestamps -f app for clear, time-stamped output
Performance issues? Check docker stats first, then logs second
Health checks failing? Make sure your endpoint is correct and accessible
Need aggregation? Consider ELK stack or cloud logging solutions for production

You're now production-ready! Congratulations on completing Module 3.