Production

Health checks, readiness, Prometheus metrics, and operational notes for running Defend behind load balancers.

Health versus readiness

GET /v1/health - cheap liveness; confirms the process is up and lists provider capabilities.
GET /v1/ready - verifies heavy subsystems (classifier initialization and session accumulator) completed startup. Use this for readiness probes so cold instances do not receive traffic mid–model load.

Metrics

GET /v1/metrics exposes Prometheus metrics via FastAPI instrumentator. Scrape it from your observability stack; the path is intentionally versioned under /v1.

Logging

The service uses structured logging on startup and shutdown. Forward container stdout to your log aggregator and correlate with session_id from application logs (never log raw user secrets).

Capacity and limits

The input guard rejects oversized payloads (413 when text length ≥ 20 000 characters). Size your ingress timeouts to tolerate LLM provider latency on the output path.

Secrets

When output guarding uses claude or openai, model output (and optional session input context) is sent to that vendor. Apply your data-handling policy before enabling modules in production.

Production

Health versus readiness

Metrics

Logging

Capacity and limits

See also

On this page