How should developers control Claude Code subagent spend?

Set manual permissions, cap concurrency, route research and lint-like tasks to cheaper models, keep expensive models for final synthesis, and monitor token use per task.

Claude Code Background Agents Guide (2026): Safer Subagents, Permissions, and API Cost Controls

Q: What changed in Claude Code v2.1.200?

Claude Code v2.1.200 changed default permission mode to Manual, added safer handling around AskUserQuestion dialogs, and fixed several background-agent and subagent reliability issues.

Q: Do background agents increase API cost?

They can. Parallel subagents are useful, but each agent may create its own context, retries, tool calls, and summaries. Use budgets, task caps, and cheaper models for exploration.

Published July 4, 2026 · 10 min read

On July 3, 2026, Anthropic shipped Claude Code v2.1.200 with a small-looking change that matters a lot for real developer workflows: the default permission mode is now Manual across the CLI, VS Code, and JetBrains. The same release also fixed several background-agent failures, including stalled sessions after sleep/wake and subagents cut off by rate limits.

That lands right after the July 1 v2.1.198 release, where Claude Code made background subagents the default. Put those two releases together and the message is clear: coding agents are moving from “one chat in a terminal” to parallel background work. That’s powerful. It can also get expensive and messy if you let every agent roam with full permissions and a frontier model.

This guide is the practical version: how to use Claude Code background agents without turning your repo into a token bonfire.

What changed, in plain English

Release	Date	Why developers should care
Claude Code v2.1.198	July 1, 2026	Background subagents became the default, with notifications when agents finish or need input.
Claude Code v2.1.200	July 3, 2026	Permission mode changed to Manual by default, and several background-session reliability bugs were fixed.
Claude Code v2.1.201	July 3, 2026	Sonnet 5 sessions stopped using mid-conversation system-role harness reminders.

The Manual permission default is the right direction. I’d rather approve one risky file edit than discover later that three background agents made overlapping changes while I was reading logs. Parallel agent work is useful only when the boundaries are boring and explicit.

The cost trap: parallel agents multiply context

A normal Claude Code task already has several token sinks: repository context, instructions, command output, tool results, and final summaries. Background agents multiply those sinks. If you launch five agents and each reads the same files, each one may build a separate context window and produce its own summaries.

Rule of thumb: use subagents for independent work, not for vague thinking. “Inspect auth, billing, and tests separately” is good. “Figure out what’s wrong” is how you pay five agents to discover the same thing.

Here’s a rough planning table for API-backed coding-agent work:

Task type	Recommended agent count	Model tier	Budget posture
Repo exploration	1-2	Cheap / fast	Strict token cap
Independent bug areas	2-4	Mid-tier	Stop after findings
Risky code edits	1 lead + reviewers	Strong model for lead	Manual permissions
Final synthesis	1	Best available	Short context, no duplicate logs

A safer Claude Code configuration pattern

Because v2.1.200 accepts defaultMode: "manual", make that explicit in shared project config instead of trusting whichever local default a teammate has installed.

{
  "permissions": {
    "defaultMode": "manual"
  },
  "backgroundAgents": {
    "maxConcurrent": 3,
    "notifyOnCompletion": true
  },
  "modelPolicy": {
    "explore": "fast-cheap",
    "edit": "balanced",
    "review": "strong"
  }
}

The exact keys you use may differ depending on your wrapper or team tooling, but the policy is what matters: manual writes, capped parallelism, and model routing by task type.

Use a task envelope before spawning subagents

Most token waste starts with a fuzzy task. Before delegating, write a short task envelope. It keeps each background agent from reading the whole project and producing a novel.

Task: Inspect retry handling in API client only.
Scope: src/api/client.ts, src/api/retry.ts, tests/api/*
Do not edit files.
Return:
- bug list with line references
- one minimal fix proposal
- tests that should be added
Stop after 800 words.

This sounds fussy. It isn’t. It’s cheaper than paying for three agents to dump unrelated architecture advice. Claude Code’s newer behavior around partial results is helpful too: v2.1.199 and v2.1.200 fixed cases where subagents hit rate limits or server errors and silently failed or returned empty output. You still need clear scopes, but now failures are less likely to look like success.

API routing: don’t use the same model for every phase

The best coding-agent stack in 2026 is not “use the strongest model everywhere.” It’s a routing stack.

Fast model: file search, test discovery, dependency mapping.
Balanced model: small patches, refactors, fixture updates.
Strong model: final design calls, security-sensitive edits, complex debugging.

If your setup supports OpenAI-compatible endpoints, you can put this behind one API surface and route by model name. KissAPI is useful here because teams can keep a single endpoint while switching between Claude, GPT, and other models as workload changes. The point isn’t vendor gymnastics; it’s not paying flagship prices for grep-shaped work.

Minimal Node.js budget guard for agent runs

If you wrap Claude Code or run agents from CI, add a basic budget guard. This example blocks a run when the estimated task budget is too high for the branch type.

const LIMITS = {
  docs: 50_000,
  feature: 180_000,
  hotfix: 90_000
};

function estimateBudget({ agents, maxTokensPerAgent, branchType }) {
  const planned = agents * maxTokensPerAgent;
  const limit = LIMITS[branchType] ?? 75_000;
  if (planned > limit) {
    throw new Error(
      `Planned ${planned} tokens exceeds ${branchType} limit ${limit}. ` +
      `Reduce agents, scope, or model tier.`
    );
  }
  return { planned, limit };
}

estimateBudget({
  agents: 3,
  maxTokensPerAgent: 45_000,
  branchType: "feature"
});

It’s intentionally simple. You can wire it to real usage logs later. The first win is cultural: no background-agent run should be unlimited by default.

Python: summarize logs before handing off to the final agent

One expensive anti-pattern is pasting every subagent transcript into the lead model. Summarize first, then pass only decisions, evidence, and open questions.

def compact_agent_result(name, result):
    return f"""
Agent: {name}
Status: {result['status']}
Files inspected: {', '.join(result['files'][:12])}
Findings:
{result['findings'][:1200]}
Recommended patch:
{result.get('patch_summary', 'None')[:800]}
Open questions:
{result.get('open_questions', 'None')[:500]}
""".strip()

final_context = "\n\n---\n\n".join(
    compact_agent_result(name, result)
    for name, result in agent_results.items()
)

This one habit can cut final-pass context by thousands of tokens per run.

When to stop a background-agent run

Stop early when agents converge on the same finding, when one agent proves the original hypothesis false, or when the next step is a human product decision. More parallelism won’t fix an unclear requirement. It just makes the bill taller.

My preferred workflow is:

One exploration agent maps the repo.
Two or three scoped agents inspect independent areas.
The lead chooses one patch path.
A reviewer agent checks only the diff.
The final model gets a compact summary, not raw transcripts.

That keeps the speed benefit of background agents while avoiding most duplicate context.

FAQ

What changed in Claude Code v2.1.200?

Claude Code v2.1.200 changed the default permission mode to Manual across CLI, VS Code, and JetBrains. It also fixed multiple background-agent reliability issues, including stalled sessions and subagent failures around rate limits.

Do background agents increase API cost?

They can. Each background agent may read files, call tools, retry, and summarize independently. Without scoping and concurrency caps, parallel agents can multiply token use fast.

Should every Claude Code task use subagents?

No. Use subagents when work can be split cleanly. For small edits, final reviews, or unclear product decisions, one focused agent is usually cheaper and safer.

Need a cheaper route for coding-agent traffic?

Use KissAPI as an OpenAI-compatible model gateway for Claude, GPT, and fallback routes. Start free at kissapi.ai/register.

Start Free