Lecture 5.3 introduced the agent loop while building the coding agent system prompt. This lecture goes deeper — not to introduce new tools or prompting techniques, but to fully internalize the architecture that makes tool-calling agents work. The two-loop structure is the skeleton of every agent you will ever build; understanding it clearly removes a lot of confusion that shows up later when agents behave unexpectedly.
The naive mental model of an agent is a single loop: ask the user for input, call the LLM, print the result, repeat. That works for a chatbot. It breaks for an agent.
The problem is that an agent's work happens at two different timescales. From the user's perspective, one exchange is one turn: they type a request and eventually receive a response. But inside that single turn, the LLM may decide it needs to call three tools before it can answer. Each tool call requires a round-trip to the LLM: call the API, get a tool request, execute the tool, send the result back, call the API again. The user is unaware of this; they're waiting. The agent is running.
A single loop cannot express both concerns cleanly. You'd end up with a mess of flags and conditionals tracking whether you're in the "getting user input" phase or the "handling tool calls" phase. The two-loop design names these concerns and separates them:
The outer loop runs once per user message. It gets input, appends it to the messages array, delegates everything else to the inner loop, and then surfaces the final reply. One iteration of the outer loop is one visible exchange.
The inner loop runs once per API call. It calls the LLM, inspects the response, and branches on stop_reason. If the model requested tools, execute them and loop again. If the model is done, break out to the outer loop. Multiple inner loop iterations are invisible to the user — they're the conversation between the agent code and the model.
In pseudocode:
# Outer loop: one iteration per user message
while True:
get user input
append to messages
run inner loop ← invisible to user
print reply
# Inner loop: one iteration per API call
while True:
call API
check stop_reason
if tool_use → execute tools, loop
if end_turn → surface reply, break
This separation also maps directly onto what researchers call the ReAct pattern (Reasoning + Acting). The model reasons about what to do, acts by requesting a tool, observes the result, and reasons again. The inner loop is the mechanism that makes this cycle possible.
The inner loop has exactly one decision point: response.stop_reason. There are two values that matter:
"tool_use" — the model is not done. It has issued one or more structured tool requests. Execute them, package the results, send them back."end_turn" — the model is done. Extract the text reply, break out of the inner loop.
Two states. One branch. That is the entire agent. Every agent built on the Anthropic API — from a 200-line teaching script to Claude Code — uses this exact structure.
Students sometimes expect agent loops to be elaborate. The insight is that the core is trivially simple. The complexity lives in the tools, the system prompt, and the context management around the loop — not in the loop itself.
The complete implementation for this lecture is in agent.py. The system prompt and tool schemas are carried over unchanged from Lecture 5.3. For now, the tool implementations are stubs — they return strings like "(stub) reading filename" rather than actually touching the filesystem. The stubs are intentional: they let the loop run correctly and be verified before real filesystem operations add any complexity.
def run_agent():
messages = []
print("Coding Agent — type your request. Empty line to quit.\n")
while True:
user_input = input("You: ").strip()
if not user_input:
print("Goodbye.")
break
messages.append({"role": "user", "content": user_input})
# inner loop runs here
The outer loop maintains the messages array across the entire session. Each new user message gets appended before the inner loop runs. When the inner loop returns, the outer loop goes back to input() and waits for the next message.
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = dispatch_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
else:
for block in response.content:
if block.type == "text":
print(f"\nAssistant: {block.text}\n")
break
Five lines drive everything meaningful here. Each one deserves attention.
messages.append({"role": "assistant", "content": response.content})
response.content is a list of content blocks — it may contain tool_use blocks, text blocks, or both. The entire list goes into the messages array as the assistant turn, not just the text. This is critical: on the next API call, the model needs to see its own tool requests in the history. If you only preserved the text, the model would lose track of what it asked for.
if response.stop_reason == "tool_use"
The only branch. Everything else follows from this.
"tool_use_id": block.id
Each tool result is tagged with the ID of the tool call that requested it. The model may issue multiple tool calls in a single response — it might want to read three files at once. The IDs allow the model to match each result to the request that generated it. Without IDs, a response with three tool results would be ambiguous.
messages.append({"role": "user", "content": tool_results})
Tool results enter the conversation as role: "user". This surprises people. The reason is the API's view of the world: there are only two speakers — the user and the assistant. Tool results come from outside the model (the agent code ran them), so they re-enter from the user side. The model did not produce these results; the external world did.
break
The only exit from the inner loop. The loop runs until stop_reason is not "tool_use". In principle, a model could keep requesting tools indefinitely. Production agents set an iteration limit (typically 25–50 tool calls per user message); if the limit is hit, the agent surfaces an error rather than looping forever. This lecture's implementation omits the limit for clarity; Lecture 6.4 adds instrumentation and discusses failure modes.
A common question is why tool errors are returned as strings rather than raised as exceptions. The answer is that the model needs to read the error. If a tool throws an uncaught exception that crashes the agent, the model never learns what went wrong. By returning an error string — "Error: file not found: main.py" — the error enters the conversation as a tool_result, and the model can report it to the user, try an alternative approach, or ask for clarification.
The full agent.py for this lecture is at agent.py. Real tool implementations — with filesystem operations and proper error handling — replace the stubs in Lecture 6.2.
Reading the code is one thing; watching the messages array grow through an actual exchange makes the mechanics concrete.
User: "Create hello.py with a hello world function"
[0] role: user "Create hello.py with a hello world function"
[1] role: assistant [tool_use: edit_file(path="hello.py", old_str="", ...)]
[2] role: user [tool_result: "Created hello.py"]
[3] role: assistant [text: "Done — created hello.py."]
The inner loop ran twice. First iteration: the model requested a tool (entry [1]). Second iteration: the model produced the final reply (entry [3]). The user sees only [0] and [3]. Entries [1] and [2] are the internal conversation between the agent code and the model.
User: "Add a main block to hello.py"
The model needs to read the file before it can edit it:
[0] role: user "Add a main block to hello.py"
[1] role: assistant [tool_use: read_file(filename="hello.py")]
[2] role: user [tool_result: "def hello():\n print('Hello!')"]
[3] role: assistant [tool_use: edit_file(path="hello.py", ...)]
[4] role: user [tool_result: "Edited hello.py"]
[5] role: assistant [text: "Done — added a main block."]
The inner loop ran three times. The model on its third API call can see the file contents from [2] and the edit confirmation from [4]. The user sees only [0] and [5].
This is the read-before-edit pattern from Lecture 5.3, now visible in the messages array. The model isn't following the rule because of willpower — it follows it because the messages array gives it all the information it needs to do the right thing, and the system prompt reinforces the behavior.

Each API call sends the complete messages array. The model has no persistent state between calls — it reads the full array fresh on every iteration of the inner loop.
The consequence is monotonic growth. Every tool result appended in iteration N is still present in iteration N+5. A read_file that returns 2,000 tokens of file contents adds 2,000 tokens to every subsequent API call in the session. The system prompt and tool schemas are a fixed overhead on every call; the messages array grows on top of that.
This is the context growth problem introduced in Module 4, now visible in the code that produces it. The inner loop creates the problem. Context management strategies — sliding window, compaction, progressive disclosure — are the solutions. Lecture 6.4 instruments this growth with real token counts. Section 4 of the course addresses it systematically.
A single API response can contain multiple tool_use blocks when the model determines the calls are independent:
[1] role: assistant [tool_use: read_file("utils.py"),
tool_use: read_file("tests.py")]
[2] role: user [tool_result id=tu_004: "...",
tool_result id=tu_005: "..."]
The for block in response.content loop in the agent handles this naturally — it collects every tool_use block in the response, executes each, and packages all results into a single user turn. Both results re-enter as one message.
The tool_use_id field becomes especially important here. The agent may execute tools sequentially or in parallel; the order in which results are added to tool_results may not match the order the model requested them. The IDs allow the model to sort it out regardless of ordering.
With three simple tools, parallel calls are uncommon. As agents gain more tools and take on more complex tasks, the model issues parallel calls frequently to minimize round-trips.
stop_reason — the only decision point in the inner loop. "tool_use" means keep going; "end_turn" means break.role: "user" — they come from outside the model, so they re-enter from the user side of the conversation.tool_use_id — ties each result to its request. Required when the model issues multiple tool calls in one response.Lecture 6.2 replaces the stub tools in agent.py with real filesystem implementations and adds directory safety enforcement.