Overview

What this track covers, who it's for, and how to use it.

This is a self-paced track on agentic AI — the practice of building LLM applications that operate in loops, use tools, hold state, and produce results over minutes-to-hours of autonomous work. By the end you’ll have written agents in code, exposed tools via the Model Context Protocol (MCP), implemented memory, evaluated agent quality, and operated an agent in production.

The track is opinionated. It teaches the patterns that ship in real products as of 2026, not every research idea that’s appeared in a paper. Where there are multiple ways to do something, I’ll tell you what most teams pick and why — then mention the alternatives if the dominant choice doesn’t fit your situation.

Who this is for

Engineers who:

Have used LLM APIs as chat completions and want to graduate to agents.
Know one of Python or TypeScript well enough to write a small program.
Have built or maintained a backend service (HTTP, database, queue) — not necessarily ML.
Are not aiming to train models — only to use them, well, in production.

If you’ve never called an LLM API at all, work through the OpenAI Quickstart or Anthropic Quickstart first. Twenty minutes; come back here.

What you’ll learn

After completing the track:

The agent loop as a primitive, written from scratch (~60 lines of code), and what production runtimes add on top.
Tool use via OpenAI function calling, Anthropic tool use, and the Model Context Protocol (MCP) — the open standard that has consolidated how LLMs talk to tools.
Memory patterns — short-term (context window), long-term (vector stores), episodic (chat history), and the trade-offs between them.
Planning patterns — ReAct, plan-and-execute, reflection, tree of thoughts, and when modern reasoning models replace these.
The framework landscape — LangChain, LangGraph, Pydantic AI, Claude Agent SDK, OpenAI Agents SDK — what each is good at, what to pick when.
Multi-agent systems — orchestrator-worker, role-based crews, A2A handoffs.
Specialized agents — code agents (Cursor, Claude Code), browser/computer-use agents, research agents.
Evaluation — how to write tests for agents, regression suites, LLM-as-judge.
Production — observability, cost control, retries, guardrails, human-in-the-loop, audit logging.
A capstone project: building an end-to-end agent that does real work for you.

The 12-module map

#	Module	What you build
00	Overview (this page)	—
01	Foundations	The minimal agent loop, from scratch
02	Tools and function calling	Calling an LLM with tools, parsing tool calls
03	MCP — the Model Context Protocol	Your own MCP server, connected to a client
04	Memory	Short-term + a small vector-backed long-term memory
05	Planning patterns	ReAct vs Plan-and-Execute, with reasoning models
06	Frameworks	LangGraph + one model-vendor SDK side by side
07	Multi-agent	An orchestrator-worker example
08	Specialized agents	Code, browser, research — patterns and trade-offs
09	Evaluation	A regression suite + LLM-as-judge eval
10	Production	Observability, cost control, guardrails, HITL
11	Build a project	Capstone: an agent that does real work for you

Each module is self-contained but assumes the previous ones. Expect 30-90 minutes per module depending on whether you do the exercises.

How to use this track

Three patterns work:

Sequential. Start at 01, walk through 11. Best for first-timers.
Reference. Use the index sidebar; jump to the module you need. Best if you already build agents and want a specific topic.
Project-driven. Skip to 11, pick a project, work backwards through the modules whose content you need. Best for the practically minded.

Prerequisites and setup

Python 3.11+ or Node 20+. Examples will be in both where it matters.
An LLM API key — Anthropic Claude or OpenAI. ~$10-$20 of credit covers the entire track.
Git and a code editor. Cursor or Claude Code are useful for the later modules and demonstrate agentic coding directly.
Optional: Docker for running MCP servers locally. Not required for early modules.

A note on what’s not here

This track doesn’t cover:

Model training or fine-tuning. Different discipline. See the data scientist path post.
Prompt engineering as a craft. Important but covered widely elsewhere. We use prompting in this track but don’t dedicate a module to it.
Specific vertical applications (legal AI, medical AI, etc.). The patterns transfer; the regulatory and data context doesn’t.
Frontier-research material — Tree of Thoughts paper variants, novel planning algorithms, RAG benchmarks. Mentioned where relevant; not the focus.

The goal is to make you the engineer who can build, ship, and operate agents — not the researcher who can publish a new method.

Ready? Module 01 is the agent loop. Let’s go.