> how I worked with agents in 2025

December 31, 2025

2025 has transformed how I work as an engineer. Many previous assumptions I had about my profession have gone up in the air yet some have remained. What follows is a reflection on some of these new ways of working.

I have been an active user of Cursor, and especially its early Composer feature that allowed making changes across many different files in a single prompt session, since about June 2024. But this year I saw myself relinquish control over code a lot more than I previously expected.

The key thing about this change for me was that working chat-only turned my job into editing context, not files—whatever fits the window is the main leverage I have.

At the start of a session the agent is nearly useless. It doesn’t know your codebase, your conventions, your intent. You spend effort orienting it—loading files and writing instructions. Then you hit a peak: the model has enough context to be genuinely useful, and the work flies. But the window keeps filling. Tangents, stack traces, abandoned approaches, half-relevant state. Performance decays. Eventually you’re better off compacting the history or starting clean.

Here are the workflows I’ve been exploring.

1) Load, Run, Compact

This is the loop baked into most coding agents. I open Claude Code in a repo, tell it what I need—read files, change files, test changes—and when the context window fills up, the tool compacts it, starts a fresh session, and carries a summary forward. I let it manage memory. I accept the cold-start tax at the beginning of each session while it reloads context, the sweet spot in the middle where edits and tests snap, and the eventual compaction as the conversation fills. The carry-over summary is automatically generated, usually adequate, occasionally leaky.

It’s low-ceremony and flies while the task is local and linear. But I have zero say in what gets compacted, so subtlety can just disappear. I end up writing extra-pedantic cold-start instructions to claw back context, and I still ride the same slow-great-meh arc—rising latency, sudden misreads (like random dynamic imports in TypeScript). For small fixes or code I already understand, this “good-enough memory” path wins by staying out of the way. The moment requirements sprawl, I feel the lack of control.

2) Plan and Execute

For more control, I switch to a plan‑driven loop popularized by Cline and now present in most tools. The agent first drafts a plan—problem statement, scope, files to touch, phases, test strategy. I edit it, then the agent executes against that spec.

Plans can be ephemeral or persisted to disk. Ephemeral plans are useful to ensure coding agent stays on track. Persisted plans detach key context from the chat session and make it reloadable.

I will often pair the planning with an implementation log: as the agent advances through phases, it records decisions, deltas, and gotchas. I explicitly instruct it to update the log at each step. That log is my portable working memory across resets and tool restarts—useful to me and legible to the next agent session.

The upside is a clean artifact trail that lets me pick up exactly where I left off after a crash or a weekend away. The downside is that logs swell fast, stay half-structured, and burn tokens when I feed them back in; the agent rarely surfaces the currently hot files first, so I have to keep the thing pruned—short sections, fixed headings, and a pinned “load this first” line at the top.

One thing: by the end of the year I notice myself almost never using the official plan modes (the ones that limit file editing / bash tools). Models have gotten really good at following instructions and the programmatic limitations are disruptive more than they are useful. Plan mode got bitter-lesson’ed.

3) Subagents

This one’s risky but cool.

The goal is to treat the main session like you would a terminal session - keep it as long‑lived as possible. The main agent holds stable project context; you delegate discrete tasks to short‑lived subagents with the exact inputs they need. They run for 2–20 minutes, return artifacts, then disappear. Your primary context stays tight and predictable.

I load project scaffolding and constraints once at the start of the session. For each task, I spawn a subagent with a narrow plan. The subagent loops privately, writes changes, and returns a short completion summary (what changed, why, where to look). The main agent consumes the summary and reports back.

I’ve built specific subagents for tasks where individual steering matters less than the final output: a librarian that researches API docs, package documentation, and source code to retrieve information about particular packages or APIs; an implementer that executes a plan while adhering to codebase patterns.

The payoff is a main window that stays fresh—fewer compactions and a longer stretch of high-quality mid-session work. Subagents (in Claude Code at least) are a black box: I can’t nudge it mid-run or cleanly pause, abort, or tweak its course without starting over.

4) Back to IDE

All of the above optimizes for agent context. But there’s another kind of context that matters: the mental model I carry as the engineer. The vibe right now is “IDE is dead,” and my usage of it has decreased substantially—yet there are cases where I want to stay much closer in the loop than Claude Code allows—unusual business logic, or code where I need a solid mental model (“programming is theory-building” etc). In these cases I found the “tutorial doc” workflow, first described by Geoffrey Litt here, to be incredibly productive.

The core of it is that you use Claude Code to prepare you a tutorial document with all the necessary implementation details. Then you flip the script around and instead of tasking out the agent you read the document and implement the code yourself using all the AI tools that handle the drudge work. Cursor’s Tab and Composer models are incredibly useful here to move fast.

This allows me to see and touch everything: the abstractions, the edge cases, the data models. I stay close to the code and build a mental model that I wouldn’t get from reviewing agent-generated diffs.

Going forward: a task manager with a review tool?

The coding agents are getting better every 6 weeks. Opus 4.5 was a game changer—after it landed, I found myself embracing hands-off coding a lot more, delegating larger chunks with better (but still mixed) results.

I’m curious what my work looks like going forward. I’ve noticed I’m leaning harder into local version control tools (namely jj): reviewing diffs as my primary feedback loop, and structuring commits to tell a story so review stays manageable.

Maybe the end state is something like a task manager with a review tool: I queue up work, agents execute in the background, and my job becomes reviewing diffs and shaping commits. We’re not there yet—but every 6 weeks, it feels a little closer.

I’m sure I could get a lot done with this. I’m not so sure if I’d have a lot of fun doing it.