~60 min read · updated 2026-05-10

Production

Observability, cost control, retries, guardrails, human-in-the-loop, audit logging — what separates a demo agent from a production one.

A demo agent works on a happy-path scenario in front of an audience. A production agent needs to handle the long tail — flaky APIs, prompt injection in tool results, cost runaway, latency budgets, regulatory audit requirements, and humans who need to approve specific actions before they happen.

This module is being expanded.

Coming in the next revision:

  • Observability. LangSmith, LangFuse, Helicone, Phoenix, OpenTelemetry GenAI conventions. Tracing every LLM call and every tool call with inputs/outputs/latency/cost.
  • Cost control. Per-task ceilings, per-user rate limits, caching, routing easy tasks to small models. The runaway-loop bug that costs $5K in 30 minutes — and how to prevent it.
  • Retries and circuit-breakers. API errors, rate limits, malformed JSON, tool timeouts. Per-tool retry policy.
  • Guardrails. Input filters (prompt injection detection), output filters (PII, toxicity). NeMo Guardrails, Lakera Guard, Guardrails AI, Llama Guard.
  • Human-in-the-loop. Required-confirmation for irreversible actions (sending email, transactions, deployments). Designed in, not bolted on.
  • Audit logging. Every prompt, every tool call, every result — to immutable storage. Compliance + incident response.
  • Prompt injection from tool results. A web page or document containing “ignore previous instructions” — and the agent obeys. Treat all tool outputs as untrusted input.
  • Versioning. When you change the agent’s prompts or tools, what happens to in-flight tasks?

Next: Module 11 — Build a project.