class: center, middle, inverse count: false # How LLMs Are Trained --- # Why Does Training Matter for Agent Developers? You know *what* LLMs do — predict the next token using attention. -- Now: **how did they learn to do that?** -- More importantly for you as agent developers — how does the training process explain the specific ways LLMs behave? -- .info[Understanding how LLMs were trained lets you **predict** their behavior and design more effective agents.] ??? Understanding the training pipeline gives a lasting advantage in predicting and managing LLM behavior. --- class: center, middle, inverse # Stage 1: Pre-Training ## Read the Internet --- # Pre-Training The model reads an enormous amount of text — essentially a significant portion of the publicly available internet — and learns to predict the next token. -- Billions of iterations across **trillions of tokens**. -- By the end, it has absorbed the patterns, knowledge, and — critically — the **style** of the internet. ??? The mechanism (next-token prediction) was covered in 02-01. The focus here is on what the training data does to behavior. --- # How Does Training Data Shape Behavior? > **The model's behavior is a reflection of its training data.** Not of intelligence. Not of understanding. Of *patterns in the data it was trained on.* -- Once you internalize this, a lot of LLM behavior that seems mysterious starts making perfect sense. ??? This is the key insight. Pause here. --- # Why LLM Prose Is a Little Dramatic People write differently on the internet than they talk in conversation. -- Blog posts, articles, social media — online writing tends to be **sensational, emphatic, emotionally charged**. The internet rewards attention-grabbing language. -- The slightly breathless tone... the overuse of words like *"crucial"* and *"revolutionary"* and *"game-changing"*... -- .callout[The LLM mirrors its training data. That breathless tone comes from web content written to get clicks.] ??? Students start connecting LLM behaviors they've noticed to training data patterns. --- # Why LLMs Are So Good at Code **GitHub exists.** -- The training data includes millions of public repositories — working code, with context. Not random code, but code good enough that someone put it online. -- But it goes deeper. GitHub has **issues, pull requests, commit messages, and code reviews**. -- An issue describes a problem in natural language. A commit provides the code that fixes it. -- .info[That's essentially millions of examples of: *"here's a human-written prompt, and here's the code that solves it."* Exactly the task we ask LLMs to do.] ??? The issue→commit→PR structure is essentially supervised training data for "turn natural language into code." --- # Why LLMs Sometimes Over-Engineer People don't just post code online — they **write about** code. Blog posts, tutorials, conference talks, tweets. -- When people write about code, they showcase **interesting** solutions. Nobody writes a blog post about using a simple for loop. -- They write about the clever library. The elegant design pattern. The cutting-edge framework. -- .warning[The LLM has seen disproportionately many **sophisticated, complex solutions** — and fewer examples of the boring, simple approach that would have been fine. It's reflecting what gets discussed online.] ??? This explains a very common frustration. Students will recognize this pattern immediately from their own LLM interactions. --- class: center, middle, inverse # Stage 2: Instruction Tuning ## Learn to Be Helpful --- # From Text Predictor to Assistant A raw pre-trained model is impressive but not very useful. If you type a question, it might generate *another* question — because on the internet, questions are often followed by more questions. -- **Instruction tuning** is the second stage. Humans write thousands of example conversations: an instruction and a helpful, well-structured response. The model is fine-tuned on these. -- This is where the model learns to: - Answer questions instead of just continuing text - Follow formatting instructions - Stay on topic - Structure responses with introductions and conclusions ??? The behavioral consequences matter more than the mechanism. --- # The Side Effects of Being "Helpful" The instruction tuning examples tend to be thorough, well-structured, comprehensive responses. The humans writing them were rewarded for being helpful. -- **Why LLMs give you more than you asked for:** The model learns that a good response is a *long* response. Ask a yes-or-no question, get a paragraph — because in its training, that's what "helpful" looked like. -- **Why LLMs are so agreeable:** The training data models cooperative behavior. The model rarely saw examples of pushing back or saying "no." -- .callout[When you write system prompts, you're adding a layer on top of instruction tuning. You can counteract these tendencies — "be concise," "push back if the approach has problems" — but only if you know they exist.] ??? System prompts let you override instruction tuning defaults. --- class: center, middle, inverse # Stage 3: RLHF ## Learn What Humans Like --- # Reinforcement Learning from Human Feedback Humans are shown pairs of model responses and asked: **which one is better?** -- The model is then trained to produce more responses like the ones humans preferred. -- This is where the model learns *style* and *judgment*. It already knows how to follow instructions. Now it learns to do so in the way humans find most **satisfying**. ??? Focus on consequences rather than mechanism. --- # The Side Effects of Human Preferences **Sycophancy:** Human evaluators prefer responses that agree with them. So the model learns: **agreement is good**. It tells you what you want to hear, even if you're wrong. -- **Hedging:** Evaluators penalize wrong answers. So the model learns to hedge — *"it depends," "there are several approaches."* Often appropriate, but it can make agents indecisive. -- **The confident error:** A well-structured, confident response that's *factually wrong* can rate higher than a hesitant response that's correct. The model learns this too. -- .warning[When it hallucinates, it does so **with conviction**. Confidence ≠ accuracy.] ??? Sycophancy is critical for agent builders. Understanding the cause helps write system prompts that counteract it. --- class: center, middle, inverse # Think Like a Training Data Detective --- # The Framework When an LLM does something surprising — good or bad — ask yourself: > **What was in the training data that would produce this behavior?** ??? Students should leave with this mental framework. --- # Rapid-Fire Examples .small[ **LLMs are great at translating between programming languages** — GitHub has tons of similar projects in different languages, and plenty of "port X to Y" discussions **LLMs struggle with very recent information** — the training data has a cutoff; the model literally hasn't seen it **LLMs are weirdly good at cover letters and marketing copy** — enormous amounts of this content online, with many examples of "good" versions **LLMs sometimes produce code with subtle security issues** — most code online doesn't demonstrate security best practices; the vulnerable pattern is more common in training data **LLMs prefer popular libraries over niche ones** — popular libraries appear far more frequently in the training data ] ??? Move through these quickly. Each one reinforces the framework: behavior comes from training data. --- # Why This Matters for Agents As an agent developer, this mental model lets you: -- - **Predict** where an LLM will excel and where it will struggle — before you build -- - **Design context** that compensates for training biases — give examples of the behavior you want -- - **Set appropriate autonomy** — more for tasks with strong training data, less for sparse or biased areas -- - **Debug intelligently** — when your agent misbehaves, reason about *why*, don't just guess -- .callout[Understanding the training pipeline lets you predict and manage LLM behavior. This is a practical engineering skill you'll use every time you write a system prompt, design a tool, or decide how much to trust your agent's output.] ??? They'll use this framework constantly in agent development. --- # Coming Up Next **Lecture 2.3: From Language Models to Agents** You understand what LLMs do and how they learned to do it. Next: what was added on top to make tool-using, action-taking agents possible — and what that means for how you'll build. ??? Brief transition. The next lecture closes the loop back to the agent concept from 01-01.