Working With Coding Agents

A coding agent is not a chatbot, an autocomplete, or a replacement for craft. It is a tool that can act inside your project, if you know how to direct and supervise it.

TL;DR

Coding agents are loops that read, act, observe, and try again. To use them well, you need to understand what they can do, how to give them direction, and when to interrupt them before they drift.

A lot of people still approach coding agents with the wrong mental model.

They treat them like chatbots. Or smarter autocomplete. Or a magic junior developer that can be left alone with a vague request and a repository. None of those models survive contact with the tool.

A coding agent is something more specific: a system that can read your project, decide what to do next, act on files or tools, observe the result, and keep looping. Claude Code, Codex, Cursor agent mode, Copilot agent mode: the details differ, but the shape is the same.

It does not just answer. It works.

That makes it powerful. It also makes it easy to misuse.

ReadsYour request, rules, and files.

Before acting, the agent builds a picture of the task and the project around it.

ActsIt changes real things.

Files, commands, browser sessions, tickets, and external tools can all become part of the loop.

NarratesThe trace is visible.

Its running explanation is how you follow the work before the final diff exists.

Not a chatbot. An agent.

A chatbot replies to a message.

A coding agent takes a goal and tries to move the project toward it. It can inspect files, run commands, edit code, read errors, search for where something is defined, and try again after a failure. It can also narrate what it is doing while it works.

That narrative matters. It is not decoration. It is the surface you supervise.

When an agent says it is reading a test file, then changing a component, then running a build, it is giving you the trace of its current understanding. You do not have to read every line of code immediately, but you do need to follow where it thinks it is going.

If the narrative slows down, repeats itself, or starts circling the same failed idea, that is signal. The agent may still be producing text and running commands, but it has stopped making useful progress.

That is when you step in.

coding-agent
> Reading ticket JIRA-1234...
✓ Loaded: "Add user export to CSV"
> Looking for the User model in the codebase...
✓ Found in src/models/user.ts
> Adding exportToCsv() to the model...
✓ Method added. Running tests to verify...
> 3 tests pass. Now checking the API route...

The loop is the interface

The basic loop is simple:

  • it reads what you asked for;
  • it inspects the surrounding project;
  • it plans the next move;
  • it acts;
  • it observes what happened;
  • it starts again.

This is the part many people miss. The agent is not making one perfect decision. It is making a sequence of imperfect decisions and correcting along the way.

That is why small errors in direction matter. If the first interpretation is wrong, the later work can become a very competent execution of the wrong task.

You do not supervise the final answer only. You supervise the loop.

How to give direction

Good agent use starts before the agent writes code.

The prompt is not a spell. It is a briefing. You are telling a capable outsider what world they are operating in, what change matters, and how they should know when they are done.

Three things make the difference: context, specificity, and small steps.

Context says what the world is. Specificity says what to do. Small steps say how to proceed.

Context is not a dump

Context is what the agent needs to know about the world around the task.

Bad context is either missing or excessive.

Without context
Add user search.
The agent invents the rest.
With context
Add a search endpoint at /users. Filter by partial name and exact email. Use the existing UserRepository. Return the same paginated shape as the current list endpoint.
The task has a world around it.

That gives the agent the surrounding world without giving it the whole repository as a document dump.

More context is not automatically better. A full README, old architecture notes, unrelated historical decisions, and every file in a folder can make the agent worse. Noise costs twice: it distracts the agent and consumes the limited context window.

Focused context is better than abundant context.

Too much
  • The whole README
  • Full architecture docs
  • Historical decisions
  • Every file in the folder
Focused
  • Environment it runs in
  • Stack and task constraints
  • What already exists, to not redo
  • Only what this task needs

Specificity beats adjectives

Vague requests produce vague work.

Vague
Make this code better.
The agent picks what better means.
Specific
Reduce the complexity of this function by extracting validation into a helper. Keep the same input and output. Do not touch the API route.
The work has boundaries.

That sounds normal to a person, because a person can ask follow-up questions or infer taste from a shared environment. An agent often just chooses an interpretation and proceeds.

Now the task has boundaries.

Words like "improve", "optimize", "clean up", and "make it robust" are not useless, but they are incomplete. They need criteria attached to them. Faster how? Cleaner by what standard? More robust against which failure?

The agent cannot preserve your intention if you never made the intention visible.

Small steps keep drift visible

The larger the request, the later you notice the wrong turn.

This is especially dangerous with coding agents because they can do a lot of work quickly. They can touch ten files before you realize they misunderstood the first sentence.

Instead of asking for:

All at once
Build authentication: login, signup, password reset, email verification, and Google OAuth.
If it goes wrong, you notice late.
One step first
Implement email and password login first. Use the existing auth service. Stop after the login flow works and the tests pass.
You get a checkpoint.

Small steps are not about being slow. They are about preserving control. You get a checkpoint before the agent has committed to a large direction.

Undoing a wrong direction usually costs more than taking a smaller first step.

The larger the request, the later you notice the wrong turn.

It still gets simple things wrong

The strange thing about these systems is that they can modify a complex codebase and still fail at something that looks trivial.

Ask a model how many letters "r" are in "strawberry", and some runs will answer two. Ask whether 9.11 or 9.9 is larger, and some runs will compare the strings badly.

How many r's are in "strawberry"?
There are 2 r's in strawberry.
It is 3.
Which is bigger, 9.11 or 9.9?
9.11 is bigger than 9.9.
9.9 is bigger.

The point is not that the model is stupid. The point is that it does not work the way your intuition wants it to work. It predicts and reasons through patterns in text. It does not automatically see every problem the way a person sees it.

With code, this matters even more. The output can look plausible while being subtly wrong. A test can pass while the product behavior is still off. A refactor can be syntactically clean while changing a contract the agent did not understand.

Always verify.

Skills are borrowed expertise

Many agent systems have some version of skills, instructions, modes, rules, or specialized agents.

The naming changes. The idea is stable.

A skill is not magic. It is a packaged way of working. Someone has written down how the agent should approach a certain class of task: what to inspect first, what constraints to respect, what examples to imitate, what mistakes to avoid, and how to know when the work is done.

That matters because agent performance is variable. The same vague request can produce a good run once and a bad run the next time. A skill reduces that variance by giving the agent a more reliable starting method.

××8 good runs, 2 bad ones. You do not know which until you look.

Think of it as borrowed expertise. Not expertise transferred into the tool forever, but expertise made available to the person using it.

For someone new to coding agents, this is one of the most important ideas: the tool works better when the method is explicit.

Connectors expand reach, not judgment

Agents can also connect to external systems.

Depending on the environment, they may read a file system, open a browser, query GitHub, inspect a ticket, call a database, or use a connector exposed through something like MCP, the Model Context Protocol.

This expands what the agent can touch. It does not change who is responsible for direction.

If an agent can read a ticket, it can misunderstand the ticket. If it can edit files, it can edit the wrong files. If it can comment on an issue, it can post a comment that sounds reasonable and misses the point.

Connectors make the agent more useful because they give it access to the real work surface. They also make supervision more important, not less.

Supervision is not optional

The central mistake is to think supervision means distrust.

It does not. Supervision is how the tool is used well.

You supervise because the agent is non-deterministic. You supervise because it can drift. You supervise because it can confidently follow the wrong interpretation. You supervise because the final output is not the only thing that matters: the path matters too.

That path is where contracts get changed, tests get skipped, assumptions get introduced, and small mistakes become architecture.

The person supervising does not need to be afraid of the tool. They need to understand what kind of tool it is.

A coding agent amplifies work. It does not replace judgment.

That is the useful mental model. Not magic. Not autocomplete. Not a colleague you can fully ignore. A loop that can act inside your project, with enough power to help and enough freedom to drift.

Core ideaA coding agent is not a chatbot. It is a loop you have to supervise.

The tool can move quickly, but judgment still belongs to the person who understands what should be built.

Your job is to give it direction, watch the loop, and know when to stop it.