Lab 4: Extend Your Coding Agent

Section 3 Lab | Agent Engineering Duration: ~3 hours Prerequisites: Labs 1-3 + Modules 5-6 (Lectures 5.1 through 6.3)

Overview

In Module 5 you built the system prompt. In Module 6 you built the agent loop, implemented the tools, added a tool registry, and instrumented token usage. You have a working coding agent with three tools: list_files, read_file, and edit_file.

This lab extends that agent in three directions:

New tools — Replace the naive read_file with token-efficient search_file + read_lines. Add delete_file (with confirmation) and mkdir.
Safety enforcement — Implement an approved-folders map so all file operations are restricted to explicitly allowed directories.
Real workload measurement — Use the agent to build a multi-file Node.js application from a specification, then measure context growth across the full session.

The deliverable is your extended agent code, a context growth analysis, and a comparison of token usage between naive and efficient tool designs.

What You'll Produce

Extended agent code (agent.py) — Your coding agent with the new tools, approved-folders enforcement, and token tracking from Lecture 6.3
Context growth analysis (analysis.md) — Token usage data, growth curves, and cost estimates comparing naive vs. efficient tool design
Generated application — The Node.js/Express project your agent built (in a workspace/ directory)

Part 1: New Tools (~50 minutes)

1.1 Replace `read_file` with `search_file` + `read_lines`

The naive read_file returns entire file contents — every token enters the context and stays there for every subsequent API call. Lecture 4.3 introduced the progressive disclosure pattern. Now implement it.

search_file(filename, pattern)

Search a file for lines matching a pattern. Return matching line numbers and a snippet of context around each match.

Requirements:

filename (str): Path to the file to search
pattern (str): Text or regex pattern to search for
Returns a string with matching lines in the format: line_number: content
Include 1 line of context before and after each match
If no matches found, return "No matches for '{pattern}' in {filename}"
If the file doesn't exist, return an appropriate error

read_lines(filename, start, end)

Read a specific range of lines from a file.

Requirements:

filename (str): Path to the file
start (int): Starting line number (1-based, inclusive)
end (int): Ending line number (1-based, inclusive)
Returns the specified lines with line numbers prefixed
If the range is out of bounds, clamp to file boundaries and note it in the output
If the file doesn't exist, return an appropriate error

Update your system prompt to describe the new tools and their workflow:

Remove read_file from the tool descriptions
Add search_file and read_lines descriptions
Add workflow guidance: "To understand a file's contents, first use search_file to find relevant sections, then use read_lines to read the specific lines you need."

Keep read_file in your codebase (but unregistered) — you'll need it for the comparison in Part 3.

1.2 `delete_file` with Confirmation

Implement the confirmation pattern from Lecture 6.1.

delete_file(path, confirmed)

Requirements:

path (str): Path to the file to delete
confirmed (bool, default False): Whether the user has confirmed deletion
If confirmed is False, return a warning message asking for confirmation — do NOT delete the file
If confirmed is True, delete the file and return a success message
The tool must go through validate_path — no deleting files outside approved directories

Add the tool to your system prompt with a behavioral note: "Always show the user what you're about to delete and wait for confirmation before calling with confirmed=True."

1.3 `mkdir`

mkdir(path)

Requirements:

path (str): Path of the directory to create
Creates the directory (and any parent directories) within the approved folder tree
Returns a success or error message
Must pass validate_path — no creating directories outside approved areas

Part 2: Approved-Folders Map (~30 minutes)

2.1 The Concept

In Lecture 6.1, validate_path enforced that all paths resolve within the current working directory. This lab extends that to an approved-folders map — a set of directories the agent is allowed to access.

Why a map instead of a single root?

The agent might need to read files in one directory and write to another
Different tools might have different access levels (read-only vs. read-write)
In production, agents are often restricted to specific workspace directories, not the entire project

2.2 Implementation

Create an ApprovedFolders class:

class ApprovedFolders:
    """Manages a set of approved directories for agent file operations."""

    def __init__(self):
        self.folders = {}  # path -> {"read": bool, "write": bool}

    def approve(self, path, read=True, write=True):
        """Add a directory to the approved set."""
        resolved = os.path.realpath(path)
        self.folders[resolved] = {"read": read, "write": write}

    def validate(self, path, operation="read"):
        """Validate that a path falls within an approved directory.
        
        Returns the resolved absolute path.
        Raises ValueError if access is denied.
        """
        resolved = os.path.realpath(path)
        for approved, permissions in self.folders.items():
            if resolved.startswith(approved + os.sep) or resolved == approved:
                if not permissions.get(operation, False):
                    raise ValueError(
                        f"Access denied: {operation} not permitted in {approved}"
                    )
                return resolved
        raise ValueError(f"Access denied: {path} is outside approved directories")

Replace the validate_path function from Lecture 6.1 with calls to your ApprovedFolders instance. Configure it at agent startup:

folders = ApprovedFolders()
folders.approve("./workspace", read=True, write=True)
folders.approve("./reference", read=True, write=False)  # Read-only

2.3 Testing

Verify your implementation handles these cases:

Reading a file in an approved directory → succeeds
Writing a file in a read-only directory → fails with clear error
Accessing a file outside all approved directories → fails with clear error
Path traversal attempt (../../../etc/passwd) → fails with clear error
Creating a subdirectory inside an approved writable directory → succeeds

Part 3: The Test Workload (~80 minutes)

3.1 The Task Manager API Specification

Your agent will build the following Node.js/Express application from scratch. Create a workspace/ directory as the approved write directory for this task.

Application: Task Manager REST API

A simple REST API for managing tasks (to-do items) with categories. The application should be split across multiple files following standard Express project conventions.

Required file structure:

workspace/
├── package.json
├── server.js          # Express app setup, middleware, listen
├── routes/
│   ├── tasks.js       # Task CRUD routes
│   └── categories.js  # Category CRUD routes
├── models/
│   ├── task.js        # Task data model (in-memory array)
│   └── category.js    # Category data model (in-memory array)
└── middleware/
    └── logger.js      # Request logging middleware

Task model fields: id (auto-increment), title (string), description (string), categoryId (integer), completed (boolean, default false), createdAt (ISO timestamp)

Category model fields: id (auto-increment), name (string)

Required endpoints:

Method	Path	Description
GET	/api/tasks	List all tasks
GET	/api/tasks/:id	Get a single task
POST	/api/tasks	Create a task
PUT	/api/tasks/:id	Update a task
DELETE	/api/tasks/:id	Delete a task
GET	/api/tasks?category=:id	Filter tasks by category
GET	/api/categories	List all categories
POST	/api/categories	Create a category
DELETE	/api/categories/:id	Delete a category (fail if tasks exist)

Requirements:

Use Express 5 with express.json() middleware
Each model manages its own in-memory array and provides CRUD functions
Routes should validate required fields and return appropriate HTTP status codes (400 for missing fields, 404 for not found)
The logger middleware should print method, path, and status code for every request
Seed the data with 2 categories ("Work", "Personal") and 3 sample tasks on startup

3.2 Baseline Run (Naive Tools)

Before switching to efficient tools, temporarily re-register read_file (the naive whole-file version) and unregister search_file/read_lines.

Give the agent the specification above and let it build the application. Your TokenTracker from Lecture 6.3 should be running.

Record:

Total API calls made
Input tokens per API call (the growth curve)
Total input and output tokens
Peak input token count
Approximate cost at Sonnet pricing

Save this data — you'll compare it against the efficient run.

3.3 Efficient Run

Switch back to search_file + read_lines (unregister read_file). Delete the workspace/ directory and start fresh.

Give the agent the same specification. Record the same metrics.

3.4 Modification Requests

After the agent builds the application (using the efficient tools), make these additional requests in the same session — do not restart the agent. This tests context growth across a multi-turn session.

"Add a PATCH /api/tasks/:id/toggle endpoint that flips the completed status of a task." — Requires reading the tasks route file, understanding the existing pattern, and adding a new route.
"Add input validation to the POST /api/tasks endpoint: title must be non-empty and under 100 characters, categoryId must reference an existing category." — Requires reading both the task routes and the category model.
"The logger middleware should also log the response time in milliseconds." — Requires reading and editing the middleware.

Record token usage for each modification request. Note how context grows as the session continues.

Part 4: Analysis (~30 minutes)

4.1 Context Growth Report

Create analysis.md with the following:

Token usage comparison table:

Metric	Naive (`read_file`)	Efficient (`search_file` + `read_lines`)
Total API calls
Total input tokens
Total output tokens
Peak input tokens
Estimated cost (Sonnet)

Context growth data:

For the efficient run (build + modifications), create a table showing input tokens per API call across the full session. Identify:

Where the biggest jumps occur and why (which tool calls caused them)
The growth rate during the initial build vs. during modifications
Whether any tool calls could have been avoided

Analysis questions (answer each in 2-3 sentences):

What percentage of input tokens were saved by using search_file + read_lines instead of read_file?
During the modification phase, where did most of the context growth come from — the accumulated history, the tool results, or the system prompt?
At what point (if any) did the agent's behavior degrade — forgetting earlier context, repeating work, or producing lower-quality edits?
Based on your token data, estimate the cost of a 30-minute coding session with 15 user messages. Is this sustainable for regular use?
What context management strategy from Module 4 would be most effective for reducing the growth you observed? Why?

Submission

Submit the following files:

agent.py — Your extended agent with all new tools, registry, approved-folders, and token tracking
analysis.md — Your context growth report with data tables and analysis
workspace/ — The generated Node.js application (from the efficient run)
Any supporting scripts you wrote for data collection or visualization

Grading Rubric

Component	Weight	Criteria
New tools implementation	25%	`search_file`, `read_lines`, `delete_file` (with confirmation), `mkdir` all work correctly with proper error handling
Approved-folders enforcement	15%	Path validation works for all operations; traversal attacks fail; read/write permissions enforced
Token tracking and data collection	20%	Complete token data for both runs and modification phase; data is presented clearly
Analysis quality	25%	Comparison is data-driven; growth sources identified; cost estimates reasonable; Module 4 connection made
Code quality	15%	Clean registry pattern; error messages are informative; code is well-organized

Lab 4: Extend Your Coding Agent

Overview

What You'll Produce

Part 1: New Tools (~50 minutes)

1.1 Replace read_file with search_file + read_lines

1.2 delete_file with Confirmation

1.3 mkdir

Part 2: Approved-Folders Map (~30 minutes)

2.1 The Concept

2.2 Implementation

2.3 Testing

Part 3: The Test Workload (~80 minutes)

3.1 The Task Manager API Specification

3.2 Baseline Run (Naive Tools)

3.3 Efficient Run

3.4 Modification Requests

Part 4: Analysis (~30 minutes)

4.1 Context Growth Report

Submission

Grading Rubric

1.1 Replace `read_file` with `search_file` + `read_lines`

1.2 `delete_file` with Confirmation

1.3 `mkdir`