The appealing part
Parallely running agents can quickly produce multiple PRs that need review, requiring attention and a deep dive into a large codebase or complex application logic.
That is the part I keep coming back to.
The impressive part is obvious. You can split work across multiple agents, give each one a focused task, and suddenly several things are moving at once. It is challenging at the start, but also rewarding, as you can ship features that would take many more hours or even days without AI.
For a while, it feels like a pure productivity gain.
When the PRs start arriving
Then the PRs start arriving.
One PR is manageable. Two still feel good. By the third, I am usually switching between implementation details, test results, product intent, edge cases, and whether the agent understood the original direction correctly.
For me, by the fourth or fifth PR within less than an hour, maintaining focus becomes a real challenge. Mental fatigue starts to appear, and it feels like I would prefer to reschedule reviewing the PRs. I can deal with it, but the feeling of creativity—of building something that works—starts to fade.
That is the human limit I did not expect to hit so quickly.
The system keeps going, but I don’t
The system can keep going. The agents can keep producing output. The tooling can keep creating branches, commits, diffs, and summaries.
But someone still has to understand what changed.
Someone still has to decide whether the change fits the codebase, the product, and the original intent.
And that work is not free.
The real bottleneck
Addy Osmani describes this as a cognitive overhead problem: more parallel agents do not mean more human attention is available.
I think that is the important distinction.
The constraint is not only how fast agents can produce work, but how much context the developer can responsibly hold while reviewing it.
Simon Willison has made a similar point from his own experience with coding agents. The work can become faster, but also more mentally exhausting.
That resonates with me because the tiring part is not just reading code.
It is repeatedly rebuilding enough context to make good decisions.
Reviewing is the real work
Reviewing an AI-generated PR is not the same as watching progress happen.
It requires judgment.
Sometimes it requires reconstructing the agent’s path through the problem.
Sometimes it means noticing that the code technically works but moved the design in the wrong direction.
This is where parallel agents can create a misleading feeling of speed.
The output arrives quickly, but the validation still has to happen at human speed.
Planning with the human limit in mind
My thought is that this should be taken into consideration during the planning and estimation phase, especially in environments where there is high pressure to ship new features quickly.
If a team assumes that five agents mean five times the output, it may miss the review bottleneck.
The work does not end when the PR is opened. In many cases, that is when the most important human work starts.
The limits of handling multiple PRs—and more broadly, interacting with parallel or multiple agents—need to be taken into account when estimating the actual productivity boost that parallel agents can provide.
What seems to work (for now)
For now, I try to keep the number of active agents low enough that I can still review their work properly.
Two feels comfortable.
Three can work if the tasks are clearly separated.
Beyond that, I may still get more output, but I do not necessarily get better throughput.
The constraint is not only how many agents I can run.
It is how many decisions I can still make well.
References
- Addy Osmani, Your parallel Agent limit
- Simon Willison, Highlights from my conversation about agentic engineering on Lenny’s Podcast
