Docker Advanced — Networking, Service Mesh & AI Stacks

01 · Foundation

How Docker Networking Works

By default, containers are isolated. Docker networking is the layer that lets them talk to each other — and to the outside world — with full control over what can reach what.

🔑

Core idea: Every container gets its own virtual network interface. Docker creates software-defined networks (like virtual LANs) that containers can join. Containers on the same network can reach each other by service name — Docker's built-in DNS handles the resolution automatically.

Full Multi-Container Network Architecture

🌐

Browser / Client

External

→:443 / :80

🔀

nginx-proxy

Reverse Proxy
Port 80/443

◆ NETWORK: frontend-net (bridge)

🔀

nginx-proxy

:80/:443

→HTTP :3000

⚛️

react-app

:3000

→HTTP :8000

🐍

api-gateway

:8000

◆ NETWORK: backend-net (bridge) — isolated from frontend!

🐍

api-gateway

:8000

→gRPC :50051

🤖

ai-service

:50051

→HTTP :8001

🔎

search-svc

:8001

→:6379

⚡

redis

:6379

🐍

api-gateway

→TCP :5432

🐘

postgres

:5432

→Internal only

📂

pgdata volume

persisted

🤖

ai-service

→HTTPS :443

☁️

Kimi API
(Moonshot)

External

💡

Notice that api-gateway sits on both networks — it's the only bridge between the secure backend and the frontend. The database (postgres) is only on backend-net and is completely unreachable from the outside world.

02 · Network Types

The Four Network Drivers

Docker ships with multiple network drivers. Choosing the right one changes security, performance, and connectivity significantly.

Default · Most Used

🌉

bridge (user-defined)

Creates an isolated software-defined network. Containers on the same bridge can reach each other by service name (automatic DNS). Containers on different bridges are fully isolated. Use this for production multi-service apps.

High Performance

🚀

host

Container shares the host's network stack directly — no NAT overhead. The container's ports are host ports. Much faster for throughput-heavy services but you lose network isolation. Use for performance-critical services like Nginx or Prometheus.

Multi-Host

🕸️

overlay

Spans multiple Docker hosts (used in Docker Swarm or Kubernetes). Creates a virtual network that lets containers on different machines talk as if they're local. Required for distributed deployments.

Physical Network

🔌

macvlan

Assigns a real MAC address to the container, making it appear as a physical device on your network. Useful for legacy apps that need to be directly on the LAN. Requires promiscuous mode on the NIC.

Driver	DNS by name	Isolation	Multi-host	Best For
`bridge` (default)	No (by IP)	Shared	No	Quick testing
`bridge` (user-defined)	Yes	Isolated	No	Production apps ✓
`host`	Yes (host)	None	No	Max performance
`overlay`	Yes	Isolated	Yes	Swarm/Kubernetes
`none`	No	Full	No	Batch jobs, no net

Creating & Managing Networks

Network Commands

# Create a custom bridge network

$ docker network create frontend-net

a3c9d1e7f...

# Create with custom subnet and gateway

$ docker network create \

--driver bridge \

--subnet 172.20.0.0/16 \

--gateway 172.20.0.1 \

backend-net

# List all networks

$ docker network ls

NETWORK ID NAME DRIVER SCOPE

a3c9d1e7f2b8 frontend-net bridge local

b8f2e4c1d3a9 backend-net bridge local

# Connect a running container to a network

$ docker network connect backend-net api-gateway

# Inspect network — see all connected containers and IPs

$ docker network inspect backend-net

# Remove unused networks

$ docker network prune

03 · Inter-Service Communication

How Containers Call Each Other

Once containers are on the same Docker network, Docker's embedded DNS server lets them find each other by service name. No IPs needed — the name IS the address.

🔍

Docker DNS: When you name a service api-gateway in docker-compose, every other container on that network can resolve http://api-gateway:8000 automatically. Docker runs an internal DNS server at 127.0.0.11 inside each container.

🌐

HTTP / REST

Most common. Service A calls http://service-b:8001/api/data. Simple, universal, great for synchronous request-response. Use for API calls between services.

⚡

gRPC

Google's high-performance RPC protocol over HTTP/2. Binary protocol (protobuf) — much faster and smaller than JSON. Ideal for internal microservices with strict latency needs.

📨

Message Queue

Async communication via RabbitMQ or Redis Streams. Service A drops a message; Service B picks it up when ready. Decouples services completely — great for AI inference queues.

🔌

WebSocket

Persistent bidirectional connections. Service A keeps a socket open to Service B for streaming data — like streaming LLM token output back to the frontend in real time.

🗃️

Shared Volume

Containers mount the same named volume. One writes files; another reads them. Common for batch ML pipelines where a preprocessor writes data that a model container reads.

⚡

Redis Pub/Sub

Redis acts as a message broker. Services publish events to channels; subscribers receive them instantly. Very low latency, simple to set up, great for real-time coordination.

Practical DNS Demo

Testing inter-container DNS from inside a container

# Exec into the api-gateway container

$ docker exec -it api-gateway sh

# Ping another service by NAME — Docker DNS resolves it!

/app # ping ai-service

PING ai-service (172.20.0.4): 56 data bytes

64 bytes from 172.20.0.4: icmp_seq=0 ttl=64 time=0.087 ms

# Curl another service by name

/app # curl http://ai-service:50051/health

{"status":"ok","model":"kimi-k2-0711-preview"}

# Check what DNS server Docker injected

/app # cat /etc/resolv.conf

nameserver 127.0.0.11

options ndots:0

04 · AI Microservice Stack

Building an AI Service Mesh

Modern AI applications aren't a single container. They're composed of multiple specialized services — an API gateway, an LLM proxy, a vector database, a cache layer, and the AI model API itself.

AI Application Architecture — Full Stack

👤

User / App

→HTTPS

🔀

Traefik
Reverse Proxy

:80 / :443

◆ NETWORK: public-net

🔀

Traefik

→

🌐

Next.js
Frontend

:3000

→

🐍

FastAPI
Gateway

:8000

◆ NETWORK: ai-net (secured)

🐍

FastAPI
Gateway

→HTTP

🔄

LiteLLM
Proxy

:4000

→HTTPS ext

🌙

Kimi API
Moonshot

api.moonshot.cn

🐍

FastAPI
Gateway

→TCP

🧠

Qdrant
Vector DB

:6333

→TCP

⚡

Redis
Cache

:6379

→TCP

🐘

Postgres

:5432

◆ NETWORK: monitoring-net

📊

Prometheus

:9090

→

📈

Grafana

:3001

→

📋

Loki
Logs

:3100

05 · Real-World Project

OpenClaw + Kimi 2.5 on Docker

Let's build a complete, production-style AI stack using LiteLLM (OpenAI-compatible proxy) connecting to Kimi k2 (Moonshot AI) — all wired together through Docker networks. The pattern works with any OpenAI-compatible endpoint.

🧩

What is LiteLLM? LiteLLM is an open-source proxy that exposes a unified OpenAI-compatible API in front of 100+ LLMs — including Kimi/Moonshot, Anthropic, Groq, Ollama, and more. Your app calls http://litellm:4000/v1/chat/completions and LiteLLM routes it to the right provider. Think of it as the "OpenClaw" / universal LLM adapter layer.

Your App

→

LiteLLM Proxy
:4000

→

Kimi k2 API
moonshot.cn

Project Structure

ai-stack/ project structure

ai-stack/

├── docker-compose.yml # orchestrates everything

├── .env # secrets (never commit this!)

├── litellm/

│ └── config.yaml # LiteLLM model routing config

├── gateway/

│ ├── Dockerfile

│ ├── main.py # FastAPI gateway

│ └── requirements.txt

├── frontend/

│ ├── Dockerfile

│ └── src/

└── nginx/

└── nginx.conf

.env — Secrets File

.env — Never commit to Git!

# Kimi / Moonshot API key — get from platform.moonshot.cn

MOONSHOT_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

# LiteLLM master key (any string you choose — acts as your proxy's API key)

LITELLM_MASTER_KEY=sk-my-local-master-key-1234

# Postgres credentials

POSTGRES_USER=aiapp

POSTGRES_PASSWORD=supersecret

POSTGRES_DB=aidb

# Redis URL (used by gateway for caching)

REDIS_URL=redis://redis:6379/0

litellm/config.yaml — Model Router

📋 litellm/config.yaml — routes "kimi" → Moonshot AI

model_list:

# Kimi k2 — the flagship Moonshot model

- model_name: kimi-k2

litellm_params:

model: moonshot/moonshot-v1-8k

api_key: os.environ/MOONSHOT_API_KEY

api_base: https://api.moonshot.cn/v1

# Kimi long context (128k tokens)

- model_name: kimi-128k

litellm_params:

model: moonshot/moonshot-v1-128k

api_key: os.environ/MOONSHOT_API_KEY

api_base: https://api.moonshot.cn/v1

litellm_settings:

success_callback: []

cache: true

cache_params:

type: redis

host: redis # ← Docker DNS resolves this!

port: 6379

general_settings:

master_key: os.environ/LITELLM_MASTER_KEY

database_url: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@postgres:5432/$(POSTGRES_DB)"

docker-compose.yml — The Full Stack

📋 docker-compose.yml — complete AI stack

version: '3.9'

##################################################

# NETWORKS — explicit isolation between tiers

##################################################

networks:

public-net:

driver: bridge

ai-net:

driver: bridge

internal: false # false = can reach external internet (for Kimi API)

monitoring-net:

driver: bridge

##################################################

# VOLUMES — persistent data

##################################################

volumes:

postgres_data:

redis_data:

qdrant_data:

prometheus_data:

##################################################

# SERVICES

##################################################

services:

# ─── Reverse Proxy ───────────────────────────

nginx:

image: nginx:alpine

ports:

- "80:80"

volumes:

- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro

networks:

- public-net

depends_on:

- gateway

# ─── FastAPI Gateway ─────────────────────────

gateway:

build:

context: ./gateway

dockerfile: Dockerfile

environment:

LITELLM_URL: http://litellm:4000 # ← service name DNS!

LITELLM_API_KEY: ${LITELLM_MASTER_KEY}

REDIS_URL: ${REDIS_URL}

DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}

networks:

- public-net # reachable from nginx

- ai-net # can reach litellm, redis, postgres

- monitoring-net

depends_on:

- litellm

- redis

- postgres

healthcheck:

test: ["CMD","curl","-f","http://localhost:8000/health"]

interval: 30s

retries: 3

# ─── LiteLLM Proxy (routes to Kimi) ──────────

litellm:

image: ghcr.io/berriai/litellm:main-latest

command: ["--config","/app/config.yaml","--port","4000"]

volumes:

- ./litellm/config.yaml:/app/config.yaml:ro

env_file: .env # passes MOONSHOT_API_KEY etc.

networks:

- ai-net # NOT on public-net — hidden from outside!

depends_on:

- redis

- postgres

# ─── Redis Cache ─────────────────────────────

redis:

image: redis:7-alpine

command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

volumes:

- redis_data:/data

networks:

- ai-net

healthcheck:

test: ["CMD","redis-cli","ping"]

# ─── PostgreSQL ──────────────────────────────

postgres:

image: postgres:16-alpine

env_file: .env

volumes:

- postgres_data:/var/lib/postgresql/data

networks:

- ai-net

# ─── Qdrant Vector DB ────────────────────────

qdrant:

image: qdrant/qdrant:latest

volumes:

- qdrant_data:/qdrant/storage

networks:

- ai-net

gateway/Dockerfile

🐳 gateway/Dockerfile

FROM python:3.12-slim AS base

WORKDIR /app

# Install dependencies layer (cached unless requirements.txt changes)

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

# Copy app code

COPY . .

EXPOSE 8000

# Non-root user for security

RUN adduser --disabled-password --gecos '' appuser

USER appuser

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

gateway/main.py — FastAPI calling Kimi via LiteLLM

gateway/main.py — FastAPI service

import os, httpx

from fastapi import FastAPI, HTTPException

from fastapi.responses import StreamingResponse

from pydantic import BaseModel

from openai import AsyncOpenAI

# OpenAI SDK pointed at our LiteLLM proxy — NOT the real OpenAI!

client = AsyncOpenAI(

base_url=os.getenv("LITELLM_URL") + "/v1",

# Docker DNS: "litellm" resolves to the litellm container's IP

api_key=os.getenv("LITELLM_API_KEY")

)

app = FastAPI(title="AI Gateway")

class ChatRequest(BaseModel):

message: str

model: str = "kimi-k2" # default to Kimi k2

stream: bool = False

@app.get("/health")

async def health(): return {"status": "ok"}

@app.post("/chat")

async def chat(req: ChatRequest):

# This call goes: gateway → litellm container → Kimi API

response = await client.chat.completions.create(

model=req.model,

messages=[{"role": "user", "content": req.message}],

stream=req.stream

)

return {"reply": response.choices[0].message.content}

@app.post("/chat/stream")

async def chat_stream(req: ChatRequest):

# Streaming: tokens flow back as SSE (Server-Sent Events)

async def generate():

async with client.chat.completions.stream(

model=req.model,

messages=[{"role":"user","content":req.message}]

) as stream:

async for text in stream.text_stream:

yield f"data: {text}\n\n"

return StreamingResponse(generate(), media_type="text/event-stream")

Start & Test the Stack

Windows Terminal — Run it all

# Build and start everything

C:\ai-stack> docker-compose up -d --build

[+] Running 7/7

✓ Container ai-stack-postgres-1 Started

✓ Container ai-stack-redis-1 Started

✓ Container ai-stack-qdrant-1 Started

✓ Container ai-stack-litellm-1 Started

✓ Container ai-stack-gateway-1 Started

✓ Container ai-stack-nginx-1 Started

# Test the gateway health

C:\ai-stack> curl http://localhost/health

{"status":"ok"}

# Send a chat message to Kimi via the whole chain!

C:\ai-stack> curl -X POST http://localhost/chat \

-H "Content-Type: application/json" \

-d "{\"message\": \"What is Docker networking?\", \"model\": \"kimi-k2\"}"

{"reply":"Docker networking enables containers to communicate..."}

# Check LiteLLM proxy is reachable from gateway container (inter-service DNS)

C:\ai-stack> docker exec ai-stack-gateway-1 curl http://litellm:4000/health

{"status":"healthy","litellm_version":"1.x.x"}

# Verify the network isolation — redis should NOT be reachable from nginx

C:\ai-stack> docker exec ai-stack-nginx-1 curl http://redis:6379

curl: (6) Could not resolve host: redis ← ✓ Isolation works!

🔒

Security win: Redis and Postgres are only on ai-net. Nginx is only on public-net. Even if Nginx was compromised, it cannot reach the database — they're on completely different Docker networks with no route between them.

06 · Observability

Logs, Metrics & Health Checks

Production containers need visibility. Docker provides built-in logging, and integrates cleanly with Prometheus + Grafana for metrics.

Health Checks

📋 Health check patterns

# HTTP health check

healthcheck:

test: ["CMD","curl","-f",

"http://localhost:8000/health"]

interval: 30s

timeout: 10s

retries: 3

start_period: 15s

# Redis ping check

healthcheck:

test: ["CMD","redis-cli","ping"]

# Postgres check

healthcheck:

test: ["CMD-SHELL",

"pg_isready -U $$POSTGRES_USER"]

Log Management

Log Commands

# Follow all service logs

$ docker-compose logs -f

# Follow specific service only

$ docker-compose logs -f gateway

# Last 100 lines with timestamps

$ docker logs --tail 100 -t \

ai-stack-gateway-1

# Inspect container status

$ docker inspect \

ai-stack-gateway-1 \

--format '{{.State.Health.Status}}'

healthy

# Live resource stats

$ docker stats --no-stream

07 · Production Patterns

Production-Grade Best Practices

Patterns that separate hobby projects from real deployments.

🏗️

Multi-Stage Builds

Use multiple FROM stages to separate build dependencies from the final runtime image. A Go app that builds to 1.4GB can ship as a 12MB final image.

🔒

Non-Root Users

Never run containers as root. Add RUN adduser --disabled-password appuser and USER appuser to your Dockerfile. Limits blast radius if a container is compromised.

📌

Pin Image Tags

Use python:3.12.4-slim not python:latest. Pinned tags make builds reproducible and prevent surprise breakage when base images update.

⚡

Layer Caching Strategy

Order Dockerfile instructions from least-to-most-changing. Copy requirements.txt and install deps before copying your app code — so code changes don't invalidate the deps layer.

📊

Resource Limits

Always set memory and CPU limits in compose. Without limits, one runaway container can starve all others on the same host.

🔄

Restart Policies

Set restart: unless-stopped on critical services so they survive reboots and crashes automatically without manual intervention.

Resource Limits + Restart Policy

📋 Resource limits in docker-compose.yml

services:

gateway:

restart: unless-stopped

deploy:

resources:

limits:

cpus: '0.5' # max 50% of one CPU core

memory: 512M # hard cap 512MB RAM

reservations:

cpus: '0.1' # guaranteed minimum

memory: 128M

logging:

driver: json-file

options:

max-size: "10m" # rotate logs at 10MB

max-file: "3" # keep 3 rotated files

Multi-Stage Dockerfile

🐳 Multi-stage Dockerfile for production

# ── Stage 1: builder ── install all dev deps

FROM python:3.12-slim AS builder

WORKDIR /build

COPY requirements.txt .

RUN pip install --user --no-cache-dir -r requirements.txt

# ── Stage 2: runtime ── only copy what we need

FROM python:3.12-slim AS runtime

WORKDIR /app

# Copy installed packages from builder stage

COPY --from=builder /root/.local /root/.local

# Copy only app source

COPY . .

# Non-root user

RUN adduser --disabled-password --gecos '' appuser && \

chown -R appuser:appuser /app

USER appuser

ENV PATH=/root/.local/bin:$PATH

EXPOSE 8000

HEALTHCHECK CMD curl -f http://localhost:8000/health || exit 1

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

08 · Security

Container Security Hardening

The Docker default settings are not production-secure. Here's the hardening checklist every deployment needs.

🚫

Never expose database ports

Postgres and Redis should have NO ports: section in compose. They're accessible by service name within Docker networks — there's never a reason to expose them to the host.

🔐

Secrets via environment / files

Never hardcode API keys in Dockerfiles or compose files. Use .env files, Docker secrets, or environment injection from a vault at runtime.

👤

Read-only filesystems

Add read_only: true to services that don't need to write. If an attacker gets code execution, they can't modify the container filesystem.

🔍

Scan images for vulnerabilities

Use docker scout cves myimage:tag or Trivy to scan for known CVEs in your base images and dependencies before deploying.

Security scanning & hardening commands

# Scan image for CVEs with Docker Scout (built into Docker Desktop)

$ docker scout cves myapp:latest

# Or use Trivy (open source, very thorough)

$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \

aquasec/trivy image myapp:latest

# Run with read-only filesystem + drop capabilities

$ docker run --read-only \

--cap-drop ALL \

--security-opt no-new-privileges:true \

myapp:latest

# Check which user a container is running as

$ docker exec mycontainer whoami

appuser ← good. "root" = bad!

⚠️

Critical: Add .env and any file containing secrets to your .gitignore immediately. The most common Docker security breach is accidentally committing API keys to a public repo where Docker images are built in CI.

Full .gitignore for Docker projects

.gitignore

# Secrets — NEVER commit

.env

.env.*

secrets/

*.key

*.pem

# Docker build artifacts

.dockerignore # (commit this one, but list it for reference)

# Python

__pycache__/

*.pyc

.venv/

# Node

node_modules/

Windows 11 WSL2 Tips

🪟

Windows 11 + Docker Desktop Tips:

• Store your project files inside WSL2 filesystem (/home/yourname/projects) not Windows filesystem (C:\Users\...) — file I/O is 10-20x faster.

• Enable Resource Saver in Docker Desktop settings to free RAM when idle.

• Use wsl --shutdown + wsl to restart WSL2 if Docker acts up.

• In %USERPROFILE%\.wslconfig, set memory=8GB and processors=4 to cap WSL2 resource usage.

Docker Networking& Service Architecture

How Docker Networking Works

The Four Network Drivers

bridge (user-defined)

host

overlay

macvlan

Creating & Managing Networks

How Containers Call Each Other

HTTP / REST

gRPC

Message Queue

WebSocket

Shared Volume

Redis Pub/Sub

Building an AI Service Mesh

OpenClaw + Kimi 2.5 on Docker

Logs, Metrics & Health Checks

Health Checks

Log Management

Production-Grade Best Practices

Multi-Stage Builds

Non-Root Users

Pin Image Tags

Layer Caching Strategy

Resource Limits

Restart Policies

Container Security Hardening

Never expose database ports

Secrets via environment / files

Read-only filesystems

Scan images for vulnerabilities

Docker Networking
& Service Architecture