Overview
What this track covers, who it's for, and how to use it.
This is a self-paced track on agentic AI — the practice of building LLM applications that operate in loops, use tools, hold state, and produce results over minutes-to-hours of autonomous work. By the end you’ll have written agents in code, exposed tools via the Model Context Protocol (MCP), implemented memory, evaluated agent quality, and operated an agent in production.
The track is opinionated. It teaches the patterns that ship in real products as of 2026, not every research idea that’s appeared in a paper. Where there are multiple ways to do something, I’ll tell you what most teams pick and why — then mention the alternatives if the dominant choice doesn’t fit your situation.
Who this is for
Engineers who:
- Have used LLM APIs as chat completions and want to graduate to agents.
- Know one of Python or TypeScript well enough to write a small program.
- Have built or maintained a backend service (HTTP, database, queue) — not necessarily ML.
- Are not aiming to train models — only to use them, well, in production.
If you’ve never called an LLM API at all, work through the OpenAI Quickstart or Anthropic Quickstart first. Twenty minutes; come back here.
What you’ll learn
After completing the track:
- The agent loop as a primitive, written from scratch (~60 lines of code), and what production runtimes add on top.
- Tool use via OpenAI function calling, Anthropic tool use, and the Model Context Protocol (MCP) — the open standard that has consolidated how LLMs talk to tools.
- Memory patterns — short-term (context window), long-term (vector stores), episodic (chat history), and the trade-offs between them.
- Planning patterns — ReAct, plan-and-execute, reflection, tree of thoughts, and when modern reasoning models replace these.
- The framework landscape — LangChain, LangGraph, Pydantic AI, Claude Agent SDK, OpenAI Agents SDK — what each is good at, what to pick when.
- Multi-agent systems — orchestrator-worker, role-based crews, A2A handoffs.
- Specialized agents — code agents (Cursor, Claude Code), browser/computer-use agents, research agents.
- Evaluation — how to write tests for agents, regression suites, LLM-as-judge.
- Production — observability, cost control, retries, guardrails, human-in-the-loop, audit logging.
- A capstone project: building an end-to-end agent that does real work for you.
The 12-module map
| # | Module | What you build |
|---|---|---|
| 00 | Overview (this page) | — |
| 01 | Foundations | The minimal agent loop, from scratch |
| 02 | Tools and function calling | Calling an LLM with tools, parsing tool calls |
| 03 | MCP — the Model Context Protocol | Your own MCP server, connected to a client |
| 04 | Memory | Short-term + a small vector-backed long-term memory |
| 05 | Planning patterns | ReAct vs Plan-and-Execute, with reasoning models |
| 06 | Frameworks | LangGraph + one model-vendor SDK side by side |
| 07 | Multi-agent | An orchestrator-worker example |
| 08 | Specialized agents | Code, browser, research — patterns and trade-offs |
| 09 | Evaluation | A regression suite + LLM-as-judge eval |
| 10 | Production | Observability, cost control, guardrails, HITL |
| 11 | Build a project | Capstone: an agent that does real work for you |
Each module is self-contained but assumes the previous ones. Expect 30-90 minutes per module depending on whether you do the exercises.
How to use this track
Three patterns work:
- Sequential. Start at 01, walk through 11. Best for first-timers.
- Reference. Use the index sidebar; jump to the module you need. Best if you already build agents and want a specific topic.
- Project-driven. Skip to 11, pick a project, work backwards through the modules whose content you need. Best for the practically minded.
Prerequisites and setup
- Python 3.11+ or Node 20+. Examples will be in both where it matters.
- An LLM API key — Anthropic Claude or OpenAI. ~$10-$20 of credit covers the entire track.
- Git and a code editor. Cursor or Claude Code are useful for the later modules and demonstrate agentic coding directly.
- Optional: Docker for running MCP servers locally. Not required for early modules.
A note on what’s not here
This track doesn’t cover:
- Model training or fine-tuning. Different discipline. See the data scientist path post.
- Prompt engineering as a craft. Important but covered widely elsewhere. We use prompting in this track but don’t dedicate a module to it.
- Specific vertical applications (legal AI, medical AI, etc.). The patterns transfer; the regulatory and data context doesn’t.
- Frontier-research material — Tree of Thoughts paper variants, novel planning algorithms, RAG benchmarks. Mentioned where relevant; not the focus.
The goal is to make you the engineer who can build, ship, and operate agents — not the researcher who can publish a new method.
Ready? Module 01 is the agent loop. Let’s go.