Lab 4: Extend Your Coding Agent

Section 3 Lab | Agent Engineering Duration: ~3 hours Prerequisites: Labs 1-3 + Modules 5-6 (Lectures 5.1 through 6.3)


Overview

In Module 5 you built the system prompt. In Module 6 you built the agent loop, implemented the tools, added a tool registry, and instrumented token usage. You have a working coding agent with three tools: list_files, read_file, and edit_file.

This lab extends that agent in three directions:

  1. New tools — Replace the naive read_file with token-efficient search_file + read_lines. Add delete_file (with confirmation) and mkdir.
  2. Safety enforcement — Implement an approved-folders map so all file operations are restricted to explicitly allowed directories.
  3. Real workload measurement — Use the agent to build a multi-file Node.js application from a specification, then measure context growth across the full session.

The deliverable is your extended agent code, a context growth analysis, and a comparison of token usage between naive and efficient tool designs.


What You'll Produce

  1. Extended agent code (agent.py) — Your coding agent with the new tools, approved-folders enforcement, and token tracking from Lecture 6.3
  2. Context growth analysis (analysis.md) — Token usage data, growth curves, and cost estimates comparing naive vs. efficient tool design
  3. Generated application — The Node.js/Express project your agent built (in a workspace/ directory)

Part 1: New Tools (~50 minutes)

1.1 Replace read_file with search_file + read_lines

The naive read_file returns entire file contents — every token enters the context and stays there for every subsequent API call. Lecture 4.3 introduced the progressive disclosure pattern. Now implement it.

search_file(filename, pattern)

Search a file for lines matching a pattern. Return matching line numbers and a snippet of context around each match.

Requirements:

read_lines(filename, start, end)

Read a specific range of lines from a file.

Requirements:

Update your system prompt to describe the new tools and their workflow:

Keep read_file in your codebase (but unregistered) — you'll need it for the comparison in Part 3.

1.2 delete_file with Confirmation

Implement the confirmation pattern from Lecture 6.1.

delete_file(path, confirmed)

Requirements:

Add the tool to your system prompt with a behavioral note: "Always show the user what you're about to delete and wait for confirmation before calling with confirmed=True."

1.3 mkdir

mkdir(path)

Requirements:


Part 2: Approved-Folders Map (~30 minutes)

2.1 The Concept

In Lecture 6.1, validate_path enforced that all paths resolve within the current working directory. This lab extends that to an approved-folders map — a set of directories the agent is allowed to access.

Why a map instead of a single root?

2.2 Implementation

Create an ApprovedFolders class:

class ApprovedFolders:
    """Manages a set of approved directories for agent file operations."""

    def __init__(self):
        self.folders = {}  # path -> {"read": bool, "write": bool}

    def approve(self, path, read=True, write=True):
        """Add a directory to the approved set."""
        resolved = os.path.realpath(path)
        self.folders[resolved] = {"read": read, "write": write}

    def validate(self, path, operation="read"):
        """Validate that a path falls within an approved directory.
        
        Returns the resolved absolute path.
        Raises ValueError if access is denied.
        """
        resolved = os.path.realpath(path)
        for approved, permissions in self.folders.items():
            if resolved.startswith(approved + os.sep) or resolved == approved:
                if not permissions.get(operation, False):
                    raise ValueError(
                        f"Access denied: {operation} not permitted in {approved}"
                    )
                return resolved
        raise ValueError(f"Access denied: {path} is outside approved directories")

Replace the validate_path function from Lecture 6.1 with calls to your ApprovedFolders instance. Configure it at agent startup:

folders = ApprovedFolders()
folders.approve("./workspace", read=True, write=True)
folders.approve("./reference", read=True, write=False)  # Read-only

2.3 Testing

Verify your implementation handles these cases:


Part 3: The Test Workload (~80 minutes)

3.1 The Task Manager API Specification

Your agent will build the following Node.js/Express application from scratch. Create a workspace/ directory as the approved write directory for this task.

Application: Task Manager REST API

A simple REST API for managing tasks (to-do items) with categories. The application should be split across multiple files following standard Express project conventions.

Required file structure:

workspace/
├── package.json
├── server.js          # Express app setup, middleware, listen
├── routes/
│   ├── tasks.js       # Task CRUD routes
│   └── categories.js  # Category CRUD routes
├── models/
│   ├── task.js        # Task data model (in-memory array)
│   └── category.js    # Category data model (in-memory array)
└── middleware/
    └── logger.js      # Request logging middleware

Task model fields: id (auto-increment), title (string), description (string), categoryId (integer), completed (boolean, default false), createdAt (ISO timestamp)

Category model fields: id (auto-increment), name (string)

Required endpoints:

Method Path Description
GET /api/tasks List all tasks
GET /api/tasks/:id Get a single task
POST /api/tasks Create a task
PUT /api/tasks/:id Update a task
DELETE /api/tasks/:id Delete a task
GET /api/tasks?category=:id Filter tasks by category
GET /api/categories List all categories
POST /api/categories Create a category
DELETE /api/categories/:id Delete a category (fail if tasks exist)

Requirements:

3.2 Baseline Run (Naive Tools)

Before switching to efficient tools, temporarily re-register read_file (the naive whole-file version) and unregister search_file/read_lines.

Give the agent the specification above and let it build the application. Your TokenTracker from Lecture 6.3 should be running.

Record:

Save this data — you'll compare it against the efficient run.

3.3 Efficient Run

Switch back to search_file + read_lines (unregister read_file). Delete the workspace/ directory and start fresh.

Give the agent the same specification. Record the same metrics.

3.4 Modification Requests

After the agent builds the application (using the efficient tools), make these additional requests in the same session — do not restart the agent. This tests context growth across a multi-turn session.

  1. "Add a PATCH /api/tasks/:id/toggle endpoint that flips the completed status of a task." — Requires reading the tasks route file, understanding the existing pattern, and adding a new route.

  2. "Add input validation to the POST /api/tasks endpoint: title must be non-empty and under 100 characters, categoryId must reference an existing category." — Requires reading both the task routes and the category model.

  3. "The logger middleware should also log the response time in milliseconds." — Requires reading and editing the middleware.

Record token usage for each modification request. Note how context grows as the session continues.


Part 4: Analysis (~30 minutes)

4.1 Context Growth Report

Create analysis.md with the following:

Token usage comparison table:

Metric Naive (read_file) Efficient (search_file + read_lines)
Total API calls
Total input tokens
Total output tokens
Peak input tokens
Estimated cost (Sonnet)

Context growth data:

For the efficient run (build + modifications), create a table showing input tokens per API call across the full session. Identify:

Analysis questions (answer each in 2-3 sentences):

  1. What percentage of input tokens were saved by using search_file + read_lines instead of read_file?
  2. During the modification phase, where did most of the context growth come from — the accumulated history, the tool results, or the system prompt?
  3. At what point (if any) did the agent's behavior degrade — forgetting earlier context, repeating work, or producing lower-quality edits?
  4. Based on your token data, estimate the cost of a 30-minute coding session with 15 user messages. Is this sustainable for regular use?
  5. What context management strategy from Module 4 would be most effective for reducing the growth you observed? Why?

Submission

Submit the following files:


Grading Rubric

Component Weight Criteria
New tools implementation 25% search_file, read_lines, delete_file (with confirmation), mkdir all work correctly with proper error handling
Approved-folders enforcement 15% Path validation works for all operations; traversal attacks fail; read/write permissions enforced
Token tracking and data collection 20% Complete token data for both runs and modification phase; data is presented clearly
Analysis quality 25% Comparison is data-driven; growth sources identified; cost estimates reasonable; Module 4 connection made
Code quality 15% Clean registry pattern; error messages are informative; code is well-organized