Chapter 3: Monitoring & Logs¶
Progress Indicator¶
Module 3: Going Live
├─ Chapter 1: Custom Domains
├─ Chapter 2: Environment Secrets
└─ Chapter 3: Monitoring & Logs ← YOU ARE HERE
Learning Objectives¶
By the end of this chapter, you will: - Access and read Docker container logs - Monitor application health and status - Debug common issues using logs - Set up health checks for automatic restarts - Monitor system resources and performance - Understand when and how to troubleshoot
Prerequisites¶
- Completed Chapters 1 & 2
- Application deployed and running
- Basic understanding of logs and debugging
- Docker and Docker Compose installed
Step 1: Understanding Application Logs¶
What Are Logs?¶
Logs are timestamped records of events that happen in your application. They help you: - Understand what your application is doing - Debug issues and errors - Monitor performance - Audit user activity
Log Levels¶
Understanding log levels helps you prioritize issues:
| Level | Purpose | Example |
|---|---|---|
| ERROR | Critical problems | Database connection failed |
| WARN | Warnings and issues | Deprecated API usage |
| INFO | General information | Request received, user logged in |
| DEBUG | Detailed debugging info | Variable values, function calls |
| TRACE | Very detailed info | Every step of execution |
Step 2: Access Docker Logs¶
View Logs from Running Container¶
See all logs from a service:
# View logs from your application
docker-compose logs app
# View logs from a specific service
docker-compose logs database
docker-compose logs traefik
See logs in real-time (follow mode):
Limit logs by number of lines:
# Show only the last 50 lines
docker-compose logs --tail=50 app
# Show last 100 lines and follow
docker-compose logs -f --tail=100 app
Filter logs by service:
# View logs from multiple services
docker-compose logs app database
# View logs from all services
docker-compose logs
Get Logs with Timestamps¶
# Show timestamps (helpful for debugging)
docker-compose logs --timestamps app
# Show last hour of logs
docker-compose logs --tail=all --timestamps app
Real-World Examples¶
Find errors in your application:
# Show last 100 lines
docker-compose logs --tail=100 app
# Look for ERROR or Exception lines
docker-compose logs app | grep -i error
docker-compose logs app | grep -i exception
Debug database connectivity:
# Check database service logs
docker-compose logs database
# Check application's database connection attempts
docker-compose logs app | grep -i "database\|connection\|postgres"
Monitor request traffic:
# Watch Traefik logs for incoming requests
docker-compose logs -f traefik
# Watch application request logs
docker-compose logs -f app | grep -i "GET\|POST\|PUT\|DELETE"
Step 3: Monitor Service Health¶
Check Running Services¶
# See status of all services
docker-compose ps
# Expected output:
# NAME COMMAND SERVICE STATUS
# myapp-app-1 "npm start" app Up 2 hours
# myapp-database-1 "postgres" database Up 2 hours
# myapp-traefik-1 "/traefik" traefik Up 2 hours
Interpret Service Status¶
| Status | Meaning | Action |
|---|---|---|
| Up X hours | Service running normally | No action needed |
| Restarting | Service crashing and restarting | Check logs immediately |
| Exited | Service stopped | Restart: docker-compose up -d |
| Dead | Service encountered fatal error | Check logs: docker-compose logs |
Step 4: Set Up Health Checks¶
Add Health Checks to Services¶
Health checks help Docker automatically restart unhealthy services:
version: '3.8'
services:
app:
image: your-app:latest
healthcheck:
# Command to check if service is healthy
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
# Wait 30 seconds before starting health checks
start_period: 30s
# Check every 10 seconds
interval: 10s
# Wait 5 seconds for response
timeout: 5s
# Mark unhealthy after 3 failed checks
retries: 3
restart: unless-stopped
database:
image: postgres:15
healthcheck:
# Check if PostgreSQL is ready
test: ["CMD-SHELL", "pg_isready -U postgres"]
start_period: 10s
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
Create Health Endpoint in Application¶
Your application should expose a /health endpoint:
Node.js/Express Example:
// Health check endpoint
app.get('/health', (req, res) => {
// Check database connection
const dbHealthy = checkDatabaseConnection();
if (dbHealthy) {
res.status(200).json({ status: 'healthy', timestamp: new Date() });
} else {
res.status(503).json({ status: 'unhealthy', reason: 'database' });
}
});
function checkDatabaseConnection() {
try {
// Quick database query to verify connection
db.query('SELECT 1');
return true;
} catch (err) {
console.error('Health check failed:', err);
return false;
}
}
Python/Flask Example:
from flask import Flask, jsonify
from datetime import datetime
app = Flask(__name__)
@app.route('/health', methods=['GET'])
def health_check():
try:
# Check database connection
if check_db_connection():
return jsonify({
'status': 'healthy',
'timestamp': datetime.utcnow().isoformat()
}), 200
else:
return jsonify({
'status': 'unhealthy',
'reason': 'database'
}), 503
except Exception as e:
return jsonify({
'status': 'unhealthy',
'error': str(e)
}), 503
def check_db_connection():
try:
db.session.execute('SELECT 1')
return True
except:
return False
Monitor Health Check Status¶
# View health status in container inspect
docker-compose ps
# Get detailed health info
docker inspect myapp-app-1 | grep -A 20 "Health"
# Monitor health checks in real-time
watch 'docker-compose ps'
Step 5: Common Issues and Debugging¶
Application Not Starting¶
Problem: Service shows "Exited" or "Restarting"
Debug steps:
# 1. Check logs for error messages
docker-compose logs app
# 2. Look for the exit code (if shown)
# Exit code 0 = normal stop
# Exit code 1-127 = application error
# Exit code 128+ = signal (e.g., 143 = killed)
# 3. Check if port is already in use
netstat -tulpn | grep :3000
# 4. If environment variables are missing
docker-compose config | grep -A 10 "environment:"
# 5. Restart with debug output
docker-compose logs -f app &
docker-compose up app
Memory Leaks / High CPU Usage¶
Problem: Service using too much memory or CPU
Monitor resource usage:
# Show real-time resource usage
docker stats
# Monitor specific container
docker stats myapp-app-1
# Output example:
# CONTAINER CPU % MEM USAGE / LIMIT MEM %
# myapp-app-1 2.15% 250MiB / 1GiB 25%
Check logs for issues:
# Look for memory-related errors
docker-compose logs app | grep -i "memory\|oom\|heap"
# Check for infinite loops or excessive logging
docker-compose logs --tail=1000 app | tail -50
# Look for repeated error messages
docker-compose logs app | sort | uniq -c | sort -rn | head -20
Database Connection Issues¶
Problem: Application can't connect to database
Debug:
# 1. Verify database is running
docker-compose ps database
# 2. Check database logs
docker-compose logs database
# 3. Verify network connectivity
docker-compose exec app ping database
# 4. Test connection manually
docker-compose exec app psql -h database -U postgres -d myapp
# 5. Check environment variables
docker-compose exec app env | grep DATABASE
Networking Issues¶
Problem: Services can't communicate with each other
Debug:
# 1. Check networks are defined
docker network ls | grep myapp
# 2. Verify services are on same network
docker network inspect myapp_app-network
# 3. Test connectivity between services
docker-compose exec app ping database
docker-compose exec app curl -v http://traefik:8080
# 4. Check service names are correct
# Services communicate using service names as hostnames
# In docker-compose, "database" is resolved to the database service
Step 6: Debugging Workflow¶
Systematic Troubleshooting¶
Follow this workflow when something goes wrong:
1. CHECK SERVICE STATUS
└─ docker-compose ps
└─ Is service running? If not → Start it: docker-compose up -d
2. EXAMINE LOGS
└─ docker-compose logs service-name
└─ Look for errors, stack traces, error codes
└─ Use grep to filter: docker-compose logs | grep -i error
3. CHECK CONFIGURATION
└─ docker-compose config
└─ Verify environment variables, ports, volumes
└─ Ensure service dependencies are running
4. TEST CONNECTIVITY
└─ docker-compose exec app ping other-service
└─ docker-compose exec app curl http://localhost:3000
└─ Verify ports are correct
5. CHECK RESOURCES
└─ docker stats
└─ Is service out of memory? Disk full?
└─ Is CPU maxed out?
6. RESTART IF NEEDED
└─ docker-compose restart service-name
└─ Or: docker-compose down && docker-compose up -d
Useful Debug Commands¶
# View entire application configuration
docker-compose config
# Show only environment variables
docker-compose config | grep -A 100 "environment:"
# Execute command in running container
docker-compose exec app sh
# Then explore inside container:
ls -la
env
ps aux
netstat -tulpn
# View container file system
docker-compose exec app ls -la /app
docker-compose exec app cat /app/config.json
# Check container's network
docker-compose exec app ifconfig
docker-compose exec app ip addr
Step 7: Log Aggregation Setup (Optional)¶
Centralize Logs for Easier Access¶
For production, consider centralizing logs:
Option 1: ELK Stack (Elasticsearch, Logstash, Kibana)
version: '3.8'
services:
app:
image: your-app:latest
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
logstash:
image: docker.elastic.co/logstash/logstash:8.0.0
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
ports:
- "5000:5000"
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
environment:
- discovery.type=single-node
kibana:
image: docker.elastic.co/kibana/kibana:8.0.0
ports:
- "5601:5601"
Option 2: Simple Log Rotation
services:
app:
image: your-app:latest
logging:
driver: json-file
options:
# Keep only last 10 logs, max 10MB each
max-size: "10m"
max-file: "10"
Verification Checklist¶
Verify your monitoring setup:
# ✓ All services running
docker-compose ps
# All should show "Up" status
# ✓ Can access logs
docker-compose logs app
# Should show recent application output
# ✓ Health checks working
docker inspect myapp-app-1 | grep -A 5 "Health"
# Should show "Status: healthy"
# ✓ Can execute commands in container
docker-compose exec app echo "test"
# Should print "test"
# ✓ Services can communicate
docker-compose exec app ping database
# Should get responses, no timeouts
# ✓ Resource usage is reasonable
docker stats --no-stream
# Should show healthy CPU and memory usage
Common Monitoring Issues¶
Can't Access Logs¶
Problem: docker-compose logs shows nothing or old logs
Solutions:
# Logs might be rotated - increase max-file
# Or check if service is actually running
docker-compose ps app
# View live logs instead
docker-compose logs -f app
# Clear and restart
docker-compose restart app
docker-compose logs -f app
Health Check Always Failing¶
Problem: Service keeps restarting because health check fails
Solutions:
# Test health endpoint manually
docker-compose exec app curl http://localhost:3000/health
# Check if endpoint exists in your application
docker-compose logs app | grep health
# Increase start_period to give app more time to start
# Update docker-compose.yml:
# start_period: 60s # Give 60 seconds to start
# Verify curl or healthcheck command exists in container
docker-compose exec app which curl
Logs Growing Too Large¶
Problem: Log files consuming too much disk space
Solutions:
# Enable log rotation in docker-compose.yml
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
# Or use syslog driver
logging:
driver: syslog
options:
syslog-address: "udp://127.0.0.1:514"
AI Prompts for This Lesson¶
Viewing & Filtering Logs¶
Find Specific Errors in Logs
I need to find and analyze errors in my application logs.
Service: [app/database/traefik]
Time period: [last hour/today/specific time range]
Generate commands to:
1. View last 200 lines of logs
2. Filter for ERROR level messages
3. Search for specific error text: [error message]
4. Show logs with timestamps
5. Export logs to a file for analysis
Also suggest what to look for in the output.
Monitor Real-Time Logs
I want to monitor my application in real-time.
Services to monitor:
- app (Node.js on port 3000)
- database (PostgreSQL)
- traefik (reverse proxy)
Generate:
1. Command to tail all service logs together
2. How to filter for specific log levels
3. How to follow logs from multiple services
4. How to search logs as they appear
5. Best practices for production monitoring
Include color-coding tips if possible.
Health Check Configuration¶
Create Health Check Endpoint
I need to add health checks to my application.
Technology: [Node.js/Python/Go/etc]
Framework: [Express/Django/Gin/etc]
Dependencies to check:
- PostgreSQL database
- Redis cache
- External API: [service name]
Generate:
1. Health check endpoint code (/health route)
2. Check database connection
3. Check Redis connection
4. Check external service availability
5. Return proper HTTP status codes (200/503)
Include complete implementation with error handling.
Docker Compose Health Check Setup
Add health checks to my docker-compose.yml.
Services:
- app (web server on port 3000, has /health endpoint)
- database (PostgreSQL 15)
- redis (cache server)
Generate complete docker-compose.yml configuration with:
1. Health check commands for each service
2. Appropriate intervals and timeouts
3. Retry policies
4. Restart policies
5. Dependencies between services
Make it production-ready with best practices.
Debugging Common Issues¶
Application Keeps Restarting
My application container keeps restarting.
Output from `docker-compose ps`:
[paste output - shows Restarting status]
Output from `docker-compose logs --tail=100 app`:
[paste last 100 lines of logs]
Exit code (if shown): [number]
Service configuration:
- Image: [image name]
- Port: [port]
- Dependencies: [database, redis, etc]
Help me diagnose and fix the restart loop.
Database Connection Failed
My app can't connect to the database.
Error message from logs:
[paste error message]
Output from `docker-compose ps database`:
[paste output]
Output from `docker-compose logs database | tail -50`:
[paste database logs]
Output from `docker-compose exec app ping database`:
[paste ping result]
My DATABASE_URL: postgresql://user:****@database:5432/dbname
What's wrong with the database connection?
High Memory Usage
My container is using too much memory.
Output from `docker stats --no-stream`:
[paste output showing memory usage]
Service: [app/database/etc]
Expected memory: [e.g., 500MB]
Actual memory: [e.g., 2GB]
Output from `docker-compose logs app | grep -i "memory\|oom"`:
[paste any memory-related logs]
Help me:
1. Identify what's causing high memory usage
2. Find memory leaks in logs
3. Set appropriate memory limits
4. Fix the issue
Performance Monitoring¶
Setup Resource Monitoring
I want to monitor resource usage for all my services.
Services:
- app (Node.js)
- database (PostgreSQL)
- redis (cache)
- traefik (proxy)
Generate:
1. Commands to monitor CPU and memory usage
2. How to set resource limits in docker-compose.yml
3. Alerts for high resource usage
4. Log rotation configuration
5. Best practices for production monitoring
Include specific limits for each service type.
Analyze Slow Performance
My application is running slowly.
Symptoms:
- Slow response times: [e.g., 5-10 seconds]
- High CPU usage: [percentage]
- Users reporting: [specific issues]
Output from `docker stats`:
[paste output]
Recent logs from `docker-compose logs --tail=100 app`:
[paste logs]
Help me:
1. Identify the bottleneck
2. Analyze logs for performance issues
3. Check database query performance
4. Optimize resource allocation
5. Fix the slow performance
Technology: [your stack]
Log Rotation & Management¶
Setup Log Rotation
My Docker logs are growing too large.
Current log sizes:
[paste output from `du -sh /var/lib/docker/containers/*`]
Services:
- app (high traffic application)
- database (PostgreSQL)
- traefik (proxy for all requests)
Generate docker-compose.yml logging configuration:
1. Limit log file size (max 10MB per file)
2. Keep last 5 log files
3. Rotate logs automatically
4. Compress old logs
5. Prevent disk space issues
Include configuration for all services.
Logs Consuming All Disk Space
My server is running out of disk space due to logs.
Output from `df -h`:
[paste output showing disk usage]
Output from `du -sh /var/lib/docker/containers/* | sort -h`:
[paste largest log files]
Immediate actions needed:
1. Clear old logs safely
2. Setup log rotation
3. Prevent this from happening again
4. Monitor disk space
Provide emergency cleanup commands and permanent solution.
Debugging Workflow¶
Create Debugging Checklist
Generate a systematic debugging checklist for my stack.
Application:
- Technology: [Node.js/Python/etc]
- Framework: [Express/Django/etc]
- Database: [PostgreSQL/MySQL/etc]
- Other services: [Redis, etc]
Create step-by-step debugging workflow:
1. How to check if services are running
2. Commands to examine logs
3. Network connectivity tests
4. Database connection verification
5. Resource usage checks
6. Common issues and solutions
Include all Docker commands I'll need.
Service Not Responding
My service is not responding to requests.
Service: [app/api/etc]
URL: [http://yourdomain.com]
Output from `docker-compose ps`:
[paste - shows service status]
Output from `docker-compose logs --tail=50 app`:
[paste recent logs]
Output from `curl -v http://localhost:3000`:
[paste curl output or error]
Output from `docker-compose exec app ps aux`:
[paste process list]
Walk me through debugging this step by step.
Production Monitoring Setup¶
Setup Production Monitoring
I'm deploying to production and need comprehensive monitoring.
Stack:
- Application: [technology]
- Database: [PostgreSQL/MySQL/etc]
- Cache: [Redis/Memcached/etc]
- Reverse Proxy: Traefik
Setup monitoring for:
1. Application health checks
2. Database performance
3. Resource usage (CPU, memory, disk)
4. Request/response times
5. Error rates
6. Log aggregation
Generate docker-compose.yml configuration and monitoring strategy.
Include free/open-source tools if possible.
Setup Alerts & Notifications
I want to be notified when issues occur.
Scenarios to monitor:
- Service crashes or restarts
- High memory usage (>80%)
- High CPU usage (>90%)
- Disk space low (<10% free)
- Database connection failures
- SSL certificate expiring
How to setup:
1. Health check alerts
2. Resource threshold alerts
3. Notification channels (email, Slack, etc)
4. Alert rules in docker-compose
5. Testing alerts work
Prefer simple, lightweight solutions.
Traefik Specific Issues¶
Traefik Not Routing Requests
Traefik is running but not routing to my application.
Domain: [yourdomain.com]
Expected: Route to app service on port 3000
Output from `docker-compose logs traefik | tail -100`:
[paste logs]
My Traefik labels in docker-compose.yml:
[paste your Traefik labels]
Output from `curl -v http://yourdomain.com`:
[paste response]
Traefik dashboard (http://localhost:8080) shows:
[describe what you see]
What's wrong with the Traefik configuration?
SSL Certificate Errors in Traefik
Traefik can't obtain SSL certificates.
Domain: [yourdomain.com]
Output from `docker-compose logs traefik | grep -i "letsencrypt\|acme"`:
[paste relevant logs]
Browser error: [SSL/certificate error message]
My Traefik ACME configuration:
[paste ACME config from docker-compose.yml]
Output from `ls -la acme.json`:
[paste file permissions]
Help me fix the SSL certificate issue.
Advanced Debugging¶
Debug Network Issues Between Containers
My containers can't communicate with each other.
Services:
- app (needs to connect to database)
- database (PostgreSQL)
- redis (cache)
Output from `docker network ls`:
[paste output]
Output from `docker network inspect [network-name]`:
[paste network details]
Output from `docker-compose exec app ping database`:
[paste result]
Help me:
1. Verify network configuration
2. Test connectivity between services
3. Fix network issues
4. Ensure proper DNS resolution
Export Logs for Analysis
I need to export and analyze logs offline.
Services: [app, database, traefik]
Time period: [last 24 hours]
Generate commands to:
1. Export logs from all services
2. Filter logs by time range
3. Save to files with timestamps
4. Compress for transfer
5. Analyze patterns in logs
Format: [JSON/plain text/etc]
Destination: [local file/remote server]
What's Next¶
Congratulations! You've completed Module 3: Going Live!
Learn how to invite team members, manage permissions, and set up collaborative deployment workflows.
Production Monitoring Best Practices¶
| Practice | Why | Frequency |
|---|---|---|
| Check logs | Catch errors early | Daily |
| Monitor resources | Prevent outages | Continuous |
| Review health checks | Ensure reliability | Hourly |
| Rotate logs | Save disk space | Weekly |
| Update alerts | Stay informed | Real-time |
Help & Support¶
- Log confusion? Use
docker-compose logs --timestamps -f appfor clear, time-stamped output - Performance issues? Check
docker statsfirst, then logs second - Health checks failing? Make sure your endpoint is correct and accessible
- Need aggregation? Consider ELK stack or cloud logging solutions for production
You're now production-ready! Congratulations on completing Module 3.