ADVANCED GUIDE Β· LEVEL 2

Docker Networking
& Service Architecture

Deep dive into container-to-container communication, custom networks, service discovery, and real-world AI microservice stacks with OpenClaw + Kimi 2.5.

Container Networks
Service Discovery
API Gateways
AI Microservices
OpenClaw + Kimi 2.5
Windows 11 WSL2
01 Β· Foundation

How Docker Networking Works

By default, containers are isolated. Docker networking is the layer that lets them talk to each other β€” and to the outside world β€” with full control over what can reach what.

πŸ”‘
Core idea: Every container gets its own virtual network interface. Docker creates software-defined networks (like virtual LANs) that containers can join. Containers on the same network can reach each other by service name β€” Docker's built-in DNS handles the resolution automatically.
Full Multi-Container Network Architecture
🌐
Browser / Client
External
β†’:443 / :80
πŸ”€
nginx-proxy
Reverse Proxy
Port 80/443
β—† NETWORK: frontend-net (bridge)
πŸ”€
nginx-proxy
:80/:443
β†’HTTP :3000
βš›οΈ
react-app
:3000
β†’HTTP :8000
🐍
api-gateway
:8000
β—† NETWORK: backend-net (bridge) β€” isolated from frontend!
🐍
api-gateway
:8000
β†’gRPC :50051
πŸ€–
ai-service
:50051
β†’HTTP :8001
πŸ”Ž
search-svc
:8001
β†’:6379
⚑
redis
:6379
🐍
api-gateway
β†’TCP :5432
🐘
postgres
:5432
β†’Internal only
πŸ“‚
pgdata volume
persisted
πŸ€–
ai-service
β†’HTTPS :443
☁️
Kimi API
(Moonshot)
External
πŸ’‘
Notice that api-gateway sits on both networks β€” it's the only bridge between the secure backend and the frontend. The database (postgres) is only on backend-net and is completely unreachable from the outside world.
02 Β· Network Types

The Four Network Drivers

Docker ships with multiple network drivers. Choosing the right one changes security, performance, and connectivity significantly.

Default Β· Most Used
πŸŒ‰

bridge (user-defined)

Creates an isolated software-defined network. Containers on the same bridge can reach each other by service name (automatic DNS). Containers on different bridges are fully isolated. Use this for production multi-service apps.

High Performance
πŸš€

host

Container shares the host's network stack directly β€” no NAT overhead. The container's ports are host ports. Much faster for throughput-heavy services but you lose network isolation. Use for performance-critical services like Nginx or Prometheus.

Multi-Host
πŸ•ΈοΈ

overlay

Spans multiple Docker hosts (used in Docker Swarm or Kubernetes). Creates a virtual network that lets containers on different machines talk as if they're local. Required for distributed deployments.

Physical Network
πŸ”Œ

macvlan

Assigns a real MAC address to the container, making it appear as a physical device on your network. Useful for legacy apps that need to be directly on the LAN. Requires promiscuous mode on the NIC.

DriverDNS by nameIsolationMulti-hostBest For
bridge (default)No (by IP)SharedNoQuick testing
bridge (user-defined)YesIsolatedNoProduction apps βœ“
hostYes (host)NoneNoMax performance
overlayYesIsolatedYesSwarm/Kubernetes
noneNoFullNoBatch jobs, no net

Creating & Managing Networks

Network Commands
# Create a custom bridge network
$ docker network create frontend-net
a3c9d1e7f...
# Create with custom subnet and gateway
$ docker network create \
  --driver bridge \
  --subnet 172.20.0.0/16 \
  --gateway 172.20.0.1 \
  backend-net
# List all networks
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
a3c9d1e7f2b8 frontend-net bridge local
b8f2e4c1d3a9 backend-net bridge local
# Connect a running container to a network
$ docker network connect backend-net api-gateway
# Inspect network β€” see all connected containers and IPs
$ docker network inspect backend-net
# Remove unused networks
$ docker network prune
03 Β· Inter-Service Communication

How Containers Call Each Other

Once containers are on the same Docker network, Docker's embedded DNS server lets them find each other by service name. No IPs needed β€” the name IS the address.

πŸ”
Docker DNS: When you name a service api-gateway in docker-compose, every other container on that network can resolve http://api-gateway:8000 automatically. Docker runs an internal DNS server at 127.0.0.11 inside each container.
🌐

HTTP / REST

Most common. Service A calls http://service-b:8001/api/data. Simple, universal, great for synchronous request-response. Use for API calls between services.

⚑

gRPC

Google's high-performance RPC protocol over HTTP/2. Binary protocol (protobuf) β€” much faster and smaller than JSON. Ideal for internal microservices with strict latency needs.

πŸ“¨

Message Queue

Async communication via RabbitMQ or Redis Streams. Service A drops a message; Service B picks it up when ready. Decouples services completely β€” great for AI inference queues.

πŸ”Œ

WebSocket

Persistent bidirectional connections. Service A keeps a socket open to Service B for streaming data β€” like streaming LLM token output back to the frontend in real time.

πŸ—ƒοΈ

Shared Volume

Containers mount the same named volume. One writes files; another reads them. Common for batch ML pipelines where a preprocessor writes data that a model container reads.

⚑

Redis Pub/Sub

Redis acts as a message broker. Services publish events to channels; subscribers receive them instantly. Very low latency, simple to set up, great for real-time coordination.

Practical DNS Demo
Testing inter-container DNS from inside a container
# Exec into the api-gateway container
$ docker exec -it api-gateway sh
# Ping another service by NAME β€” Docker DNS resolves it!
/app # ping ai-service
PING ai-service (172.20.0.4): 56 data bytes
64 bytes from 172.20.0.4: icmp_seq=0 ttl=64 time=0.087 ms
# Curl another service by name
/app # curl http://ai-service:50051/health
{"status":"ok","model":"kimi-k2-0711-preview"}
# Check what DNS server Docker injected
/app # cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
04 Β· AI Microservice Stack

Building an AI Service Mesh

Modern AI applications aren't a single container. They're composed of multiple specialized services β€” an API gateway, an LLM proxy, a vector database, a cache layer, and the AI model API itself.

AI Application Architecture β€” Full Stack
πŸ‘€
User / App
β†’HTTPS
πŸ”€
Traefik
Reverse Proxy
:80 / :443
β—† NETWORK: public-net
πŸ”€
Traefik
β†’
🌐
Next.js
Frontend
:3000
β†’
🐍
FastAPI
Gateway
:8000
β—† NETWORK: ai-net (secured)
🐍
FastAPI
Gateway
β†’HTTP
πŸ”„
LiteLLM
Proxy
:4000
β†’HTTPS ext
πŸŒ™
Kimi API
Moonshot
api.moonshot.cn
🐍
FastAPI
Gateway
β†’TCP
🧠
Qdrant
Vector DB
:6333
β†’TCP
⚑
Redis
Cache
:6379
β†’TCP
🐘
Postgres
:5432
β—† NETWORK: monitoring-net
πŸ“Š
Prometheus
:9090
β†’
πŸ“ˆ
Grafana
:3001
β†’
πŸ“‹
Loki
Logs
:3100
05 Β· Real-World Project

OpenClaw + Kimi 2.5 on Docker

Let's build a complete, production-style AI stack using LiteLLM (OpenAI-compatible proxy) connecting to Kimi k2 (Moonshot AI) β€” all wired together through Docker networks. The pattern works with any OpenAI-compatible endpoint.

🧩
What is LiteLLM? LiteLLM is an open-source proxy that exposes a unified OpenAI-compatible API in front of 100+ LLMs β€” including Kimi/Moonshot, Anthropic, Groq, Ollama, and more. Your app calls http://litellm:4000/v1/chat/completions and LiteLLM routes it to the right provider. Think of it as the "OpenClaw" / universal LLM adapter layer.
Your App
β†’
LiteLLM Proxy
:4000
β†’
Kimi k2 API
moonshot.cn
Project Structure
ai-stack/ project structure
ai-stack/
β”œβ”€β”€ docker-compose.yml      # orchestrates everything
β”œβ”€β”€ .env                       # secrets (never commit this!)
β”œβ”€β”€ litellm/
β”‚ └── config.yaml           # LiteLLM model routing config
β”œβ”€β”€ gateway/
β”‚ β”œβ”€β”€ Dockerfile
β”‚ β”œβ”€β”€ main.py                # FastAPI gateway
β”‚ └── requirements.txt
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ Dockerfile
β”‚ └── src/
└── nginx/
└── nginx.conf
.env β€” Secrets File
.env β€” Never commit to Git!
# Kimi / Moonshot API key β€” get from platform.moonshot.cn
MOONSHOT_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
# LiteLLM master key (any string you choose β€” acts as your proxy's API key)
LITELLM_MASTER_KEY=sk-my-local-master-key-1234
# Postgres credentials
POSTGRES_USER=aiapp
POSTGRES_PASSWORD=supersecret
POSTGRES_DB=aidb
# Redis URL (used by gateway for caching)
REDIS_URL=redis://redis:6379/0
litellm/config.yaml β€” Model Router
πŸ“‹ litellm/config.yaml β€” routes "kimi" β†’ Moonshot AI
model_list:
# Kimi k2 β€” the flagship Moonshot model
- model_name: kimi-k2
litellm_params:
model: moonshot/moonshot-v1-8k
api_key: os.environ/MOONSHOT_API_KEY
api_base: https://api.moonshot.cn/v1
# Kimi long context (128k tokens)
- model_name: kimi-128k
litellm_params:
model: moonshot/moonshot-v1-128k
api_key: os.environ/MOONSHOT_API_KEY
api_base: https://api.moonshot.cn/v1
litellm_settings:
success_callback: []
cache: true
cache_params:
type: redis
host: redis       # ← Docker DNS resolves this!
port: 6379
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@postgres:5432/$(POSTGRES_DB)"
docker-compose.yml β€” The Full Stack
πŸ“‹ docker-compose.yml β€” complete AI stack
version: '3.9'
##################################################
# NETWORKS β€” explicit isolation between tiers
##################################################
networks:
public-net:
driver: bridge
ai-net:
driver: bridge
internal: false   # false = can reach external internet (for Kimi API)
monitoring-net:
driver: bridge
##################################################
# VOLUMES β€” persistent data
##################################################
volumes:
postgres_data:
redis_data:
qdrant_data:
prometheus_data:
##################################################
# SERVICES
##################################################
services:
# ─── Reverse Proxy ───────────────────────────
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
networks:
- public-net
depends_on:
- gateway
# ─── FastAPI Gateway ─────────────────────────
gateway:
build:
context: ./gateway
dockerfile: Dockerfile
environment:
LITELLM_URL: http://litellm:4000   # ← service name DNS!
LITELLM_API_KEY: ${LITELLM_MASTER_KEY}
REDIS_URL: ${REDIS_URL}
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
networks:
- public-net   # reachable from nginx
- ai-net      # can reach litellm, redis, postgres
- monitoring-net
depends_on:
- litellm
- redis
- postgres
healthcheck:
test: ["CMD","curl","-f","http://localhost:8000/health"]
interval: 30s
retries: 3
# ─── LiteLLM Proxy (routes to Kimi) ──────────
litellm:
image: ghcr.io/berriai/litellm:main-latest
command: ["--config","/app/config.yaml","--port","4000"]
volumes:
- ./litellm/config.yaml:/app/config.yaml:ro
env_file: .env   # passes MOONSHOT_API_KEY etc.
networks:
- ai-net      # NOT on public-net β€” hidden from outside!
depends_on:
- redis
- postgres
# ─── Redis Cache ─────────────────────────────
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
networks:
- ai-net
healthcheck:
test: ["CMD","redis-cli","ping"]
# ─── PostgreSQL ──────────────────────────────
postgres:
image: postgres:16-alpine
env_file: .env
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- ai-net
# ─── Qdrant Vector DB ────────────────────────
qdrant:
image: qdrant/qdrant:latest
volumes:
- qdrant_data:/qdrant/storage
networks:
- ai-net
gateway/Dockerfile
🐳 gateway/Dockerfile
FROM python:3.12-slim AS base
WORKDIR /app
# Install dependencies layer (cached unless requirements.txt changes)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy app code
COPY . .
EXPOSE 8000
# Non-root user for security
RUN adduser --disabled-password --gecos '' appuser
USER appuser
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
gateway/main.py β€” FastAPI calling Kimi via LiteLLM
gateway/main.py β€” FastAPI service
import os, httpx
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from openai import AsyncOpenAI
# OpenAI SDK pointed at our LiteLLM proxy β€” NOT the real OpenAI!
client = AsyncOpenAI(
    base_url=os.getenv("LITELLM_URL") + "/v1",
    # Docker DNS: "litellm" resolves to the litellm container's IP
    api_key=os.getenv("LITELLM_API_KEY")
)
app = FastAPI(title="AI Gateway")
class ChatRequest(BaseModel):
    message: str
    model: str = "kimi-k2"  # default to Kimi k2
    stream: bool = False
@app.get("/health")
async def health(): return {"status": "ok"}
@app.post("/chat")
async def chat(req: ChatRequest):
    # This call goes: gateway β†’ litellm container β†’ Kimi API
    response = await client.chat.completions.create(
        model=req.model,
        messages=[{"role": "user", "content": req.message}],
        stream=req.stream
    )
    return {"reply": response.choices[0].message.content}
@app.post("/chat/stream")
async def chat_stream(req: ChatRequest):
    # Streaming: tokens flow back as SSE (Server-Sent Events)
    async def generate():
        async with client.chat.completions.stream(
            model=req.model,
            messages=[{"role":"user","content":req.message}]
        ) as stream:
            async for text in stream.text_stream:
                yield f"data: {text}\n\n"
    return StreamingResponse(generate(), media_type="text/event-stream")
Start & Test the Stack
Windows Terminal β€” Run it all
# Build and start everything
C:\ai-stack> docker-compose up -d --build
[+] Running 7/7
βœ“ Container ai-stack-postgres-1 Started
βœ“ Container ai-stack-redis-1 Started
βœ“ Container ai-stack-qdrant-1 Started
βœ“ Container ai-stack-litellm-1 Started
βœ“ Container ai-stack-gateway-1 Started
βœ“ Container ai-stack-nginx-1 Started
# Test the gateway health
C:\ai-stack> curl http://localhost/health
{"status":"ok"}
# Send a chat message to Kimi via the whole chain!
C:\ai-stack> curl -X POST http://localhost/chat \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"What is Docker networking?\", \"model\": \"kimi-k2\"}"
{"reply":"Docker networking enables containers to communicate..."}
# Check LiteLLM proxy is reachable from gateway container (inter-service DNS)
C:\ai-stack> docker exec ai-stack-gateway-1 curl http://litellm:4000/health
{"status":"healthy","litellm_version":"1.x.x"}
# Verify the network isolation β€” redis should NOT be reachable from nginx
C:\ai-stack> docker exec ai-stack-nginx-1 curl http://redis:6379
curl: (6) Could not resolve host: redis  β† βœ“ Isolation works!
πŸ”’
Security win: Redis and Postgres are only on ai-net. Nginx is only on public-net. Even if Nginx was compromised, it cannot reach the database β€” they're on completely different Docker networks with no route between them.
06 Β· Observability

Logs, Metrics & Health Checks

Production containers need visibility. Docker provides built-in logging, and integrates cleanly with Prometheus + Grafana for metrics.

Health Checks

πŸ“‹ Health check patterns
# HTTP health check
healthcheck:
test: ["CMD","curl","-f",
"http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
# Redis ping check
healthcheck:
test: ["CMD","redis-cli","ping"]
# Postgres check
healthcheck:
test: ["CMD-SHELL",
"pg_isready -U $$POSTGRES_USER"]

Log Management

Log Commands
# Follow all service logs
$ docker-compose logs -f
# Follow specific service only
$ docker-compose logs -f gateway
# Last 100 lines with timestamps
$ docker logs --tail 100 -t \
  ai-stack-gateway-1
# Inspect container status
$ docker inspect \
  ai-stack-gateway-1 \
  --format '{{.State.Health.Status}}'
healthy
# Live resource stats
$ docker stats --no-stream
07 Β· Production Patterns

Production-Grade Best Practices

Patterns that separate hobby projects from real deployments.

πŸ—οΈ

Multi-Stage Builds

Use multiple FROM stages to separate build dependencies from the final runtime image. A Go app that builds to 1.4GB can ship as a 12MB final image.

πŸ”’

Non-Root Users

Never run containers as root. Add RUN adduser --disabled-password appuser and USER appuser to your Dockerfile. Limits blast radius if a container is compromised.

πŸ“Œ

Pin Image Tags

Use python:3.12.4-slim not python:latest. Pinned tags make builds reproducible and prevent surprise breakage when base images update.

⚑

Layer Caching Strategy

Order Dockerfile instructions from least-to-most-changing. Copy requirements.txt and install deps before copying your app code β€” so code changes don't invalidate the deps layer.

πŸ“Š

Resource Limits

Always set memory and CPU limits in compose. Without limits, one runaway container can starve all others on the same host.

πŸ”„

Restart Policies

Set restart: unless-stopped on critical services so they survive reboots and crashes automatically without manual intervention.

Resource Limits + Restart Policy
πŸ“‹ Resource limits in docker-compose.yml
services:
gateway:
restart: unless-stopped
deploy:
resources:
limits:
cpus: '0.5'        # max 50% of one CPU core
memory: 512M      # hard cap 512MB RAM
reservations:
cpus: '0.1'        # guaranteed minimum
memory: 128M
logging:
driver: json-file
options:
max-size: "10m"   # rotate logs at 10MB
max-file: "3"    # keep 3 rotated files
Multi-Stage Dockerfile
🐳 Multi-stage Dockerfile for production
# ── Stage 1: builder ── install all dev deps
FROM python:3.12-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# ── Stage 2: runtime ── only copy what we need
FROM python:3.12-slim AS runtime
WORKDIR /app
# Copy installed packages from builder stage
COPY --from=builder /root/.local /root/.local
# Copy only app source
COPY . .
# Non-root user
RUN adduser --disabled-password --gecos '' appuser && \
    chown -R appuser:appuser /app
USER appuser
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8000
HEALTHCHECK CMD curl -f http://localhost:8000/health || exit 1
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
08 Β· Security

Container Security Hardening

The Docker default settings are not production-secure. Here's the hardening checklist every deployment needs.

🚫

Never expose database ports

Postgres and Redis should have NO ports: section in compose. They're accessible by service name within Docker networks β€” there's never a reason to expose them to the host.

πŸ”

Secrets via environment / files

Never hardcode API keys in Dockerfiles or compose files. Use .env files, Docker secrets, or environment injection from a vault at runtime.

πŸ‘€

Read-only filesystems

Add read_only: true to services that don't need to write. If an attacker gets code execution, they can't modify the container filesystem.

πŸ”

Scan images for vulnerabilities

Use docker scout cves myimage:tag or Trivy to scan for known CVEs in your base images and dependencies before deploying.

Security scanning & hardening commands
# Scan image for CVEs with Docker Scout (built into Docker Desktop)
$ docker scout cves myapp:latest
# Or use Trivy (open source, very thorough)
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy image myapp:latest
# Run with read-only filesystem + drop capabilities
$ docker run --read-only \
  --cap-drop ALL \
  --security-opt no-new-privileges:true \
  myapp:latest
# Check which user a container is running as
$ docker exec mycontainer whoami
appuser  β† good. "root" = bad!
⚠️
Critical: Add .env and any file containing secrets to your .gitignore immediately. The most common Docker security breach is accidentally committing API keys to a public repo where Docker images are built in CI.
Full .gitignore for Docker projects
.gitignore
# Secrets β€” NEVER commit
.env
.env.*
secrets/
*.key
*.pem
# Docker build artifacts
.dockerignore   # (commit this one, but list it for reference)
# Python
__pycache__/
*.pyc
.venv/
# Node
node_modules/
Windows 11 WSL2 Tips
πŸͺŸ
Windows 11 + Docker Desktop Tips:

β€’ Store your project files inside WSL2 filesystem (/home/yourname/projects) not Windows filesystem (C:\Users\...) β€” file I/O is 10-20x faster.

β€’ Enable Resource Saver in Docker Desktop settings to free RAM when idle.

β€’ Use wsl --shutdown + wsl to restart WSL2 if Docker acts up.

β€’ In %USERPROFILE%\.wslconfig, set memory=8GB and processors=4 to cap WSL2 resource usage.