How LLMs Actually Work

Module 2, Lecture 2.1 | Introduction to Agentic Systems

This lecture builds the mental model every agent developer needs: how LLMs process input, generate output, and why certain design decisions matter. It covers neural networks and the transformer architecture at a high level, explains next-token prediction as the single fundamental operation of all LLMs, introduces attention as the mechanism that lets models focus on relevant parts of their input, and frames the context window as the agent developer's primary design space. The core takeaway: LLMs are stateless functions, and the agent engineer's job is to assemble the right context on every call.

Read the full lecture narrative

Additional Resources

Lecture slides
3Blue1Brown: Neural Networks — Grant Sanderson's visual series on neural networks, including chapters on transformers, attention, and how LLMs store facts
The Illustrated Transformer — Jay Alammar's widely referenced visual walkthrough of the transformer architecture
Attention Is All You Need (Vaswani et al., 2017) — The original paper that introduced the transformer architecture
Andrej Karpathy's "Let's build GPT: from scratch, in code, spelled out" — A hands-on video building a GPT from scratch in Python
Transformer (deep learning architecture) — Wikipedia — Comprehensive reference covering transformer architecture, history, and applications