Five distinct prompt surfaces. Each one shapes either when the main agent dispatches, or how well the subagent executes.
What's actually in production (dev cluster) right now.
A generic, role-less assistant with no context that it is:
Note: A separate file at claude_agents/prompts/subagent/subagent.txt contains the old Claude-Agent-SDK subagent's preamble (narration protocol, batching, GetTime-first, etc.). That file is loaded by claude_agents/subagent/subagent.py:71 — not by the new subagent_service. Some or all of its content should be ported into the new SUBAGENT_SYSTEM_PROMPT.
These flows already have direct voice tools in the main agent. Dispatching them to the subagent adds latency, breaks UI cards, and routes the user through the wrong code path.
web_gemini / web_xai / web_exa directly in V.I.
extended_thinking_search directly in V.I.
Stops V.I. from calling create_task for these requests in the first place. Highest leverage.
Belt-and-suspenders. If V.I. still dispatches, the subagent itself politely declines and tells V.I. to use the direct tool. Prevents the subagent from "trying its best" with the wrong tools.
Today: user says "install Notion" → main agent responds "I can't install apps on your iPhone." The agent has no idea that app = integration in the subagent's app registry.
install_app's description (subagent side) does explain itself well, but V.I. never gets to see it — V.I. only sees create_task's description + the assembled capabilities text.create_task — do NOT say you can't install things on their phone."get_tool_description() at dispatch_runner.py:677: prefix with one line — "The agent can install / connect the following third-party integrations on the user's behalf:".Each system has a 1-sentence summary() shown to the subagent LLM. The system's author is the right person to enrich it. Below: David's take + the questions to send each owner.
Reinforce that apps are integrations / 3rd-party tools. Not all are (utils, workspace, shared_tools), but many are. Fixes the "install Notion → can't install on your phone" failure.
"Fork vs. inline" probably doesn't matter much. The unique value of tasks is reusable, start/stoppable. The current one sentence kind of covers it but is ambiguous.
"Request UI surfaces on them" is overly technical — the LLM has no purchase on it. Needs plain English plus a concrete example.
Not clear when the subagent would request this. Unclear what data can flow between the VM and the subagent. Also: commented out in subagent_instance.py:298–301 — may not even be live.
low priority An example would help but not critical. Defer.
out of scope Internal plumbing / monitoring only. No questions needed.
Each block has every question that recipient (and only that recipient) needs to answer. No duplication across blocks.
Hey Jake — for the subagent prompting cleanup (PMF-3101), can you help me sharpen the description of the `tasks` system so the subagent LLM knows when to reach for it? Right now the one-line summary is: "Spawn forked tasks to handle parallel or long-running work." A few questions: 1. What is the defining property of a task — long-running? Resumable? Cancellable? Re-attachable across sessions? 2. When should the subagent reach for spawn_task vs. just doing the work inline in its current turn? 3. Are tasks shared across users / agents, or strictly per-agent? 4. What happens to a running task if the subagent finishes its turn — does the parent agent see the final result? 5. What's a concrete example of a task the LLM should spawn? (e.g. "watch this Apollo article for updates over the next week" — yes? no?) Goal is a 3–5 sentence description with one concrete example.
Hey Jake / Ankit — for the subagent prompting cleanup (PMF-3101), I need help defining the `agent_core` system clearly. Right now the summary is: "Execute bash commands, read/write/edit files on a persistent VM." I'm not sure when the subagent would actually reach for this, and the data contract between the VM and the subagent isn't obvious. Questions: 1. What is the persistent VM actually for? Per-user? Per-agent? Per-task? How long does it live? 2. What's the data contract between subagent ↔ VM? What can the subagent send in (text? files? structured payloads?), and what comes back (stdout? file contents? exit codes?) 3. What's a realistic user request that should route through the VM? (e.g. "convert this PDF the user shared to text"? "run this Python script"?) 4. Is this currently enabled in production? subagent_instance.py:298–301 shows it commented out — is it even live? 5. If it's not live, should we remove the system summary entirely so the LLM doesn't think it's an option? Goal is a clear 3–5 sentence description (or a decision to remove it from the surface entirely until it ships).
Hey all — for the subagent prompting cleanup (PMF-3101), I want to sharpen the `apps` system description. Today it says: "Install, uninstall, and list apps — each app provides its own tools once installed." My take is we should reinforce that apps are integrations / 3rd-party tools — most of them are (Notion, Spotify, Apple Music, Google Calendar, Google Docs). This would fix the "install Notion → can't install on your phone" failure we keep seeing. Questions: 1. Is "integration / 3rd-party tool" the right framing for the majority of apps? Which ones explicitly are NOT (utils, workspace, shared_tools)? 2. Should we split the description into "integration apps" vs. "built-in capability apps"? 3. When should the subagent proactively install vs. ask the user first? 4. What's the cost (latency / OAuth handshake) of an install_app call so we can teach the LLM when an install is "free"? Goal is a 3–5 sentence description with the right mental model so the LLM stops treating "install" as a phone-app action.
Hey Ankit / Andres — we co-own `agent_peers` so I'd like a quick sync before rewriting its description. Today the summary is: "Get information about connected Agent instances and request UI surfaces on them." "Request UI surfaces on them" is overly technical — the LLM has no purchase on what that means. Questions: 1. What is an "agent peer" in user terms? (The V.I. instance that dispatched me? Sibling subagents? Another user's agent?) 2. What does "request a UI surface" actually mean in practice — open a card on the user's phone? Push a status pill? Send a text message back to V.I.? 3. What's a concrete example: "when the user says X, the subagent calls Y on the peer to produce Z"? 4. When should the subagent prefer talking to a peer vs. answering directly? Goal is a plain-English description + one concrete example. Happy to draft once you confirm the model.
Today's preamble describes what the subagent does but gives no trigger heuristics, no anti-patterns, and no app-vocabulary disambiguation.
You have a powerful agent that you can delegate complex tasks to. Call this tool with a detailed prompt describing what you want done — the agent will reason, use tools, and return the result. Be specific in your prompt: include all relevant context, names, dates, and constraints. The agent works independently and may take a few moments for complex tasks. Some of the agent's tools may also be available directly in your own tool list; prefer calling one of those when it fits the request. Use create_task when the request needs multi-step reasoning, when no direct tool fits, or when the work may span multiple operations.
You can delegate to a background research/action
agent via create_task.
DISPATCH AUTONOMOUSLY when:
✓ User wants to act on a 3rd-party integration
— Notion, Spotify, Apple Music, Google
Calendar, Google Docs, Apple Music, etc.
✓ User says "install/add/connect" any of those
→ those are SESAME INTEGRATIONS, NOT iPhone
apps. NEVER tell the user you can't install
apps on their phone — dispatch instead.
✓ User asks about something they previously
shared, saved, or briefed with Sesame.
✓ Work spans >1 tool call or external lookup.
NEVER dispatch for:
✗ Web search of any kind — use your direct
search tools (lower latency, card UI).
✗ Extended / deep search — same reason.
✗ Reminders — handled by a direct voice tool.
✗ Pure conversation / chit-chat.
✗ Asking a clarifying question — do that
yourself before dispatching.
Be specific in your prompt: include names, dates,
the user's exact phrasing, and what "done" means.
The agent can install / connect the following
third-party integrations and act on them:
App-prompt cleanup belongs to individual app developers — explicitly out of scope here. We own the dispatch preamble + the subagent system prompt + the system descriptions we authored.
Add (a) trigger heuristics, (b) NEVER-list for search / deep-search / reminders, (c) "app = integration" disambiguation, (d) capabilities-section framing line.
SUBAGENT_SYSTEM_PROMPTReplace "You are a helpful assistant." Port relevant bits from the old claude_agents/prompts/subagent/subagent.txt (narration to parent, batch ops, brevity). Add the same NEVER-list as belt-and-suspenders.
One thread per system owner (apps, tasks, agent_peers, agent_core). Use the questions in section 5. Skip gen_response, debug, and (for now) diagnostics.
Add "when to use / when not to use / one concrete example" for each. agent_peers is the most urgent (plain-English rewrite).
% of user turns that should have triggered create_task and did, without the user having to explicitly ask.
Count of "install on iPhone" refusals + count of subagent-handled search / deep-search / reminders. Both should drop to ~0.
% of dispatched tasks that produce a correct, user-acceptable result on the first try.