Module 3, Lecture 3.2 | Working with LLMs in Practice
This lecture unpacks the model parameter from your first API call. It covers the three-layer hierarchy of providers, brands, and models, the consistent three-tier pattern across providers (fast/cheap, balanced, frontier), and the four axes of model selection: capability, speed, cost, and context window. It explains frontier model infrastructure, per-token billing mechanics with ballpark cost estimates, and the hardware realities of running models locally. A live demo compares Haiku, Sonnet, and Opus on the same prompt to make the tradeoffs concrete.
Claude Models Overview — Anthropic Docs — The complete list of available Claude models, their API IDs, context window sizes, and capabilities. The authoritative reference for current model names and versions.
Anthropic Pricing — Per-token costs for each Claude tier (Haiku, Sonnet, Opus) and context window sizes. The first place to look when estimating the cost of an agent task.
Artificial Analysis LLM Leaderboard — An independent leaderboard comparing 300+ models across quality, speed, price, and context window. Useful for understanding the tradeoff landscape across providers.
Chatbot Arena Leaderboard — Crowdsourced leaderboard based on blind human preference comparisons between models. The closest thing to a real-world quality ranking, updated regularly.
LLM Benchmarks Explained — DataCamp — A guide to the major benchmarks (MMLU, GPQA, HumanEval, GSM8K) and what they actually measure. Essential reading for interpreting model comparisons critically.
The Complete Guide to LLM Quantization — LocalLLM — How reducing weight precision (FP16 to INT8/INT4) shrinks model memory requirements with modest quality tradeoffs. Covers GPTQ, AWQ, and GGUF methods.
Ollama — Tool for running open-source LLMs locally. The easiest on-ramp for local model experimentation mentioned in the lecture.