Lab 6: Build a Web Search Tool

Section 4 Lab | Agent Engineering Duration: ~1.5 hours Prerequisites: Lab 4 (Extended Coding Agent) + Module 8 (Lectures 8.1–8.3)

Overview

Module 8 covered Retrieval-Augmented Generation as a category — the broad pattern of fetching external information and injecting it into the model's context for the current generation. Lecture 8.2 went deep on the vector-based implementation. Lecture 8.3 made the case that RAG is broader than vectors and that web search, database queries, and file reads are all valid retrieval mechanisms.

This lab gives your coding agent its first external-information capability that is not a file read: a web search tool backed by the Brave Search API. After this lab, your agent will be able to answer questions about current information, recent releases, and topics outside its training data.

The point is not to build something elaborate. The tool itself is small. What matters is the conceptual move: extending the agent with a retrieval mechanism that has nothing to do with vectors.

What You'll Produce

Extended agent code (agent.py) — Your coding agent from Lab 4 with a new web_search tool registered alongside the existing tools
Brave Search wrapper (web_search.py) — The function that calls the Brave API and returns results
Reflection (reflection.md) — A short writeup answering the three questions in Part 4

Part 1: Set Up the Brave Search API (~15 minutes)

1.1 Get an API Key

Brave Search offers a free tier suitable for this lab.

Go to https://api.search.brave.com/ and sign up for an account
Subscribe to the Free plan (2,000 queries per month, 1 query per second)
From the dashboard, create an API key. Copy it.

1.2 Store the Key

Add the key to your .env file at the project root:

BRAVE_API_KEY=your-key-here

Make sure .env is in your .gitignore — never commit API keys.

1.3 Quick Sanity Check

Verify the key works with a one-line curl test before integrating it into the agent:

curl -s "https://api.search.brave.com/res/v1/web/search?q=python+packaging" \
     -H "X-Subscription-Token: $BRAVE_API_KEY" \
     -H "Accept: application/json" | head -100

You should see a JSON response with a web.results array. If you get an authentication error, check the key. If you get a rate-limit error, wait a minute and retry.

Part 2: Build the Web Search Tool (~30 minutes)

2.1 The Wrapper Function

Create web_search.py alongside your agent.py. The wrapper handles the HTTP call, error handling, and result formatting.

web_search(query, count=5)

Requirements:

query (str): the search query
count (int, default 5): number of results to return; the API allows up to 20
Returns a string formatted for the model: each result on its own block with title, URL, and snippet
Reads BRAVE_API_KEY from the environment (use os.getenv with python-dotenv)
Raises a clean error message (returned as a string for the agent to read) on:
- Missing API key
- Network errors
- Non-200 HTTP responses
- Rate-limit (429) responses
Times out after 10 seconds

API details:

Endpoint: https://api.search.brave.com/res/v1/web/search
Method: GET
Query string: q (the search), count (1-20)
Headers: X-Subscription-Token: <api key>, Accept: application/json
Response: JSON; results are at response.json()["web"]["results"], each with title, url, and description fields

Output format suggestion:

[1] Python 3.13 Released — What's New
    https://docs.python.org/3.13/whatsnew/3.13.html
    Python 3.13.0 was released on October 7, 2024, with a free-threaded build,
    a new interactive interpreter, and improvements to typing...

[2] Python 3.13 release notes
    https://www.python.org/downloads/release/python-3130/
    Major new features: PEP 703 (Making the Global Interpreter Lock optional),
    PEP 744 (JIT Compilation), improved error messages...

[3] ...

Numbered, with the title on its own line, the URL on the second line, and the snippet wrapped to a reasonable width on subsequent lines. The model handles this format well and can cite specific results.

2.2 Edge Cases

Your tool should handle:

No results. Brave can return an empty results array. Return a clear message like No web search results for '<query>'.
Long snippets. The API's description can be hundreds of characters. Truncate to ~300 characters per snippet to keep token usage reasonable.
Special characters in the query. Use urllib.parse.quote_plus or requests' params argument so spaces and punctuation are encoded.
Rate limiting. On a 429 response, return a string explaining that the rate limit was hit and the agent should wait. Do not silently retry — the agent should know.

2.3 Standalone Test

Add an if __name__ == "__main__": block so the tool runs standalone:

python3 web_search.py

It should perform one search (e.g., "latest Anthropic Claude release") and print the formatted results. This lets you verify the tool independently before wiring it into the agent.

Part 3: Add the Tool to the Agent (~20 minutes)

3.1 Tool Schema

Define the tool schema for the Anthropic API in agent.py. Add it to your existing TOOLS list:

{
    "name": "web_search",
    "description": (
        "Search the public web for current information using the Brave "
        "Search API. Use this when the user asks about recent events, "
        "current versions of software, news, or any topic where the "
        "answer might not be in your training data. Do not use for "
        "questions about the user's local files — use read_file for that."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query. Phrase as you would in a search engine."
            },
            "count": {
                "type": "integer",
                "description": "Number of results to return (1-10). Default 5.",
                "default": 5
            }
        },
        "required": ["query"]
    }
}

The description matters. The model uses it to decide when to call the tool. Two key phrases: "current information" (so the model picks this for time-sensitive questions) and the explicit contrast with read_file (so the model doesn't reach for web search when the user is asking about local code).

3.2 Dispatch

Add web_search to your tool dispatch (registry from Lecture 6.3 or the if/elif from earlier lectures, whichever your agent uses):

from web_search import web_search

# In your tool registry or dispatch function:
"web_search": web_search,

3.3 System Prompt Update

Add a brief mention of the new capability to your agent's system prompt:

## Available Tools
...
**web_search(query, count=5)**: Search the public web for current
information. Use this when the user asks about recent events, current
versions, news, or anything that might not be in your training data.
Each result includes a title, URL, and snippet.

If your system prompt has a "How to Work" section, add guidance like:

- For questions about current information (today's news, recent
  releases, current best practices), use web_search rather than
  answering from training data.
- When you use web search, mention the source URLs in your reply
  so the user can verify.

The "mention the source URLs" instruction encourages source attribution, which is one of the practical advantages of RAG over training-data-only answers.

Part 4: Test and Reflect (~25 minutes)

4.1 Test Queries

Run the agent with each of the following queries. For each one, observe what the agent does and write it down.

"What is the latest version of Python?" — Time-sensitive. The model's training data may have an out-of-date answer. Does the agent use web_search? Does it cite the source?
"List the files in this directory." — Local. The agent should NOT use web_search here — it should use list_files. This tests whether the tool descriptions correctly steer the model.
"What were the major announcements at the most recent Anthropic event?" — Recent and specific. Should pull from the web. Look at how the agent phrases its query.
"What is the time complexity of binary search?" — General knowledge, well-covered in training data. The agent should answer directly without web search. If it does call web_search, that suggests your tool description is over-broad.
"What's the current weather in Mahwah, New Jersey?" — Tests something Brave may or may not handle well. Snippet results probably show a weather widget summary. Does the agent extract a useful answer from the snippet?

For each query, note:

Did the agent call web_search?
If so, what query did it pass?
Did the answer use the retrieved information?
Were the URLs cited?

4.2 Reflection

Write reflection.md answering these three questions. Two-to-three sentences each is enough.

In which of your test queries did the agent correctly choose between web_search and other tools? In which did it choose poorly? What about your tool descriptions or system prompt could you change to improve those decisions?
Lecture 8.3 framed retrieval as a category — vector search, web search, database queries, file reads, and parametric "retrieval" from the model itself are all RAG. After implementing the web search tool, where do you see the conceptual overlap with the other tools your agent already has (read_file, search_file)? What do they have in common as retrieval mechanisms?
What kind of agent application would benefit most from web search as its primary retrieval mechanism, vs. one that would be better served by vector RAG over a private corpus? Give one concrete example of each.

Submission

Submit:

agent.py — Your extended agent with the web_search tool registered
web_search.py — The Brave Search wrapper
reflection.md — Your answers to the three reflection questions
A short transcript of one of your test sessions (any format) showing the agent calling web_search and using the results

Grading Rubric

Component	Weight	Criteria
Tool implementation	30%	`web_search` works, handles errors and edge cases (no results, rate limits, missing key, special characters), formats results clearly
Agent integration	25%	Tool schema is well-written, dispatch works, system prompt update is clear and steers the model correctly
Test coverage	20%	All five test queries attempted and observations recorded
Reflection quality	25%	Three answers are specific, draw on actual lab observations, and demonstrate understanding of the broader retrieval framing from Lecture 8.3

Notes and Tips

Stay within the free tier. 2,000 queries/month is plenty for this lab, but if you iterate aggressively you can burn through them. The standalone test in 2.3 lets you verify the tool without going through the agent (and using extra queries on tool calls you can predict).
One result vs. many. Returning more results gives the model more to work with but uses more tokens. Five is a reasonable default; one or two is enough for many simple questions; ten only if the user is asking for a survey.
The model decides what to search for. Notice how the model rephrases the user's question into a search query — sometimes it makes good choices, sometimes not. This is the same skill humans bring to using search engines.
No vector embeddings here. This lab deliberately avoids the vector-based implementation from Lecture 8.2. The point is to demonstrate that RAG works without vectors. If you want to build a vector RAG system, ChromaDB and LangChain make it straightforward — but it is its own project, not this lab.