Section 4 Lab | Agent Engineering Duration: ~1.5 hours Prerequisites: Lab 4 (Extended Coding Agent) + Module 8 (Lectures 8.1–8.3)
Module 8 covered Retrieval-Augmented Generation as a category — the broad pattern of fetching external information and injecting it into the model's context for the current generation. Lecture 8.2 went deep on the vector-based implementation. Lecture 8.3 made the case that RAG is broader than vectors and that web search, database queries, and file reads are all valid retrieval mechanisms.
This lab gives your coding agent its first external-information capability that is not a file read: a web search tool backed by the Brave Search API. After this lab, your agent will be able to answer questions about current information, recent releases, and topics outside its training data.
The point is not to build something elaborate. The tool itself is small. What matters is the conceptual move: extending the agent with a retrieval mechanism that has nothing to do with vectors.
agent.py) — Your coding agent from Lab 4 with a new web_search tool registered alongside the existing toolsweb_search.py) — The function that calls the Brave API and returns resultsreflection.md) — A short writeup answering the three questions in Part 4Brave Search offers a free tier suitable for this lab.
Add the key to your .env file at the project root:
BRAVE_API_KEY=your-key-here
Make sure .env is in your .gitignore — never commit API keys.
Verify the key works with a one-line curl test before integrating it into the agent:
curl -s "https://api.search.brave.com/res/v1/web/search?q=python+packaging" \
-H "X-Subscription-Token: $BRAVE_API_KEY" \
-H "Accept: application/json" | head -100
You should see a JSON response with a web.results array. If you get an authentication error, check the key. If you get a rate-limit error, wait a minute and retry.
Create web_search.py alongside your agent.py. The wrapper handles the HTTP call, error handling, and result formatting.
web_search(query, count=5)
Requirements:
query (str): the search querycount (int, default 5): number of results to return; the API allows up to 20BRAVE_API_KEY from the environment (use os.getenv with python-dotenv)API details:
https://api.search.brave.com/res/v1/web/searchq (the search), count (1-20)X-Subscription-Token: <api key>, Accept: application/jsonresponse.json()["web"]["results"], each with title, url, and description fieldsOutput format suggestion:
[1] Python 3.13 Released — What's New
https://docs.python.org/3.13/whatsnew/3.13.html
Python 3.13.0 was released on October 7, 2024, with a free-threaded build,
a new interactive interpreter, and improvements to typing...
[2] Python 3.13 release notes
https://www.python.org/downloads/release/python-3130/
Major new features: PEP 703 (Making the Global Interpreter Lock optional),
PEP 744 (JIT Compilation), improved error messages...
[3] ...
Numbered, with the title on its own line, the URL on the second line, and the snippet wrapped to a reasonable width on subsequent lines. The model handles this format well and can cite specific results.
Your tool should handle:
results array. Return a clear message like No web search results for '<query>'.description can be hundreds of characters. Truncate to ~300 characters per snippet to keep token usage reasonable.urllib.parse.quote_plus or requests' params argument so spaces and punctuation are encoded.Add an if __name__ == "__main__": block so the tool runs standalone:
python3 web_search.py
It should perform one search (e.g., "latest Anthropic Claude release") and print the formatted results. This lets you verify the tool independently before wiring it into the agent.
Define the tool schema for the Anthropic API in agent.py. Add it to your existing TOOLS list:
{
"name": "web_search",
"description": (
"Search the public web for current information using the Brave "
"Search API. Use this when the user asks about recent events, "
"current versions of software, news, or any topic where the "
"answer might not be in your training data. Do not use for "
"questions about the user's local files — use read_file for that."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Phrase as you would in a search engine."
},
"count": {
"type": "integer",
"description": "Number of results to return (1-10). Default 5.",
"default": 5
}
},
"required": ["query"]
}
}
The description matters. The model uses it to decide when to call the tool. Two key phrases: "current information" (so the model picks this for time-sensitive questions) and the explicit contrast with read_file (so the model doesn't reach for web search when the user is asking about local code).
Add web_search to your tool dispatch (registry from Lecture 6.3 or the if/elif from earlier lectures, whichever your agent uses):
from web_search import web_search
# In your tool registry or dispatch function:
"web_search": web_search,
Add a brief mention of the new capability to your agent's system prompt:
## Available Tools
...
**web_search(query, count=5)**: Search the public web for current
information. Use this when the user asks about recent events, current
versions, news, or anything that might not be in your training data.
Each result includes a title, URL, and snippet.
If your system prompt has a "How to Work" section, add guidance like:
- For questions about current information (today's news, recent
releases, current best practices), use web_search rather than
answering from training data.
- When you use web search, mention the source URLs in your reply
so the user can verify.
The "mention the source URLs" instruction encourages source attribution, which is one of the practical advantages of RAG over training-data-only answers.
Run the agent with each of the following queries. For each one, observe what the agent does and write it down.
"What is the latest version of Python?" — Time-sensitive. The model's training data may have an out-of-date answer. Does the agent use web_search? Does it cite the source?
"List the files in this directory." — Local. The agent should NOT use web_search here — it should use list_files. This tests whether the tool descriptions correctly steer the model.
"What were the major announcements at the most recent Anthropic event?" — Recent and specific. Should pull from the web. Look at how the agent phrases its query.
"What is the time complexity of binary search?" — General knowledge, well-covered in training data. The agent should answer directly without web search. If it does call web_search, that suggests your tool description is over-broad.
"What's the current weather in Mahwah, New Jersey?" — Tests something Brave may or may not handle well. Snippet results probably show a weather widget summary. Does the agent extract a useful answer from the snippet?
For each query, note:
web_search?Write reflection.md answering these three questions. Two-to-three sentences each is enough.
In which of your test queries did the agent correctly choose between web_search and other tools? In which did it choose poorly? What about your tool descriptions or system prompt could you change to improve those decisions?
Lecture 8.3 framed retrieval as a category — vector search, web search, database queries, file reads, and parametric "retrieval" from the model itself are all RAG. After implementing the web search tool, where do you see the conceptual overlap with the other tools your agent already has (read_file, search_file)? What do they have in common as retrieval mechanisms?
What kind of agent application would benefit most from web search as its primary retrieval mechanism, vs. one that would be better served by vector RAG over a private corpus? Give one concrete example of each.
Submit:
agent.py — Your extended agent with the web_search tool registeredweb_search.py — The Brave Search wrapperreflection.md — Your answers to the three reflection questionsweb_search and using the results| Component | Weight | Criteria |
|---|---|---|
| Tool implementation | 30% | web_search works, handles errors and edge cases (no results, rate limits, missing key, special characters), formats results clearly |
| Agent integration | 25% | Tool schema is well-written, dispatch works, system prompt update is clear and steers the model correctly |
| Test coverage | 20% | All five test queries attempted and observations recorded |
| Reflection quality | 25% | Three answers are specific, draw on actual lab observations, and demonstrate understanding of the broader retrieval framing from Lecture 8.3 |
Stay within the free tier. 2,000 queries/month is plenty for this lab, but if you iterate aggressively you can burn through them. The standalone test in 2.3 lets you verify the tool without going through the agent (and using extra queries on tool calls you can predict).
One result vs. many. Returning more results gives the model more to work with but uses more tokens. Five is a reasonable default; one or two is enough for many simple questions; ten only if the user is asking for a survey.
The model decides what to search for. Notice how the model rephrases the user's question into a search query — sometimes it makes good choices, sometimes not. This is the same skill humans bring to using search engines.
No vector embeddings here. This lab deliberately avoids the vector-based implementation from Lecture 8.2. The point is to demonstrate that RAG works without vectors. If you want to build a vector RAG system, ChromaDB and LangChain make it straightforward — but it is its own project, not this lab.