MDX Limo
Consul Agent Architecture Refactor Plan

Executive Summary

This document outlines the refactoring of Consul's agent system from a rigid multi-agent orchestrator pattern (28 agents, 114 tools, 13 workflows, 5 layers of indirection) to a single intelligent agent with direct tool access, dynamic tool discovery, and tool-level HITL. The goal: an AI assistant that reasons and acts instead of routes and follows scripts.

The Problem

The current architecture turns a powerful LLM into a dumb router. When a user says "archive the newsletters," the AI:

  1. Identifies newsletters through LLM intelligence (reading sender names)
  2. Loses that intelligence when routing to emailActionWorkflow
  3. The workflow searches label:Newsletter (doesn't exist) and fails
  4. The user gets "I couldn't find any emails matching the newsletter category"

This is a symptom of a systemic issue: over-structured agent design that strips the AI of its reasoning ability.

The Solution

One agent. Direct tool access. Let the AI think.

DimensionCurrentTarget
Agents28 (17 active)1 main + 2-3 specialists
Routing170 lines of instruction rulesLLM reasoning + tool descriptions
ToolsSplit across sub-agentsAll tools on main agent (via ToolSearchProcessor)
HITL450-line workflow state machinesTool-level suspend() + autoResumeSuspendedTools
Layers5 (request → orchestrator → sub-agent → workflow → step → API)2 (request → agent → tool → API)

Why This Will Be Better

Anthropic: "Find the simplest solution possible, and only increase complexity when needed." (Building Effective Agents)

OpenAI: "A single agent can handle many tasks by incrementally adding tools." (Building Agents Guide)

Microsoft: "Don't assume role separation requires multiple agents. Often, a single agent using persona switching and context-aware policies can satisfy role-based behavior." (Cloud Adoption Framework)

LangChain: "Single agents are simpler to build, reason about, and debug." (Multi-Agent Architecture Guide)

Multi-agent is justified only for: security/compliance boundaries, separate teams, or proven single-agent failures. Consul has none of these.


Current Architecture Analysis

Layer Map

1User Message 2 ↓ (1) Chat Endpoint (chat-with-logging.ts) 3 ↓ (2) Web Orchestrator (170 lines of routing rules) 4 ↓ (3) Sub-Agent (gmailQueryAgent, gmailActionAgent, etc.) 5 ↓ (4) HITL Workflow (emailActionWorkflow — 1256 LOC state machine) 6 ↓ (5) Tool → Gmail API

5 layers. 5 handoff points. 5 places where intelligence gets lost.

Registered Components

Agents (24):

  • 3 orchestrators (web, iMessage, email)
  • 10 domain agents (Gmail query/action, Calendar query/action, Drive query/action, Slack query/action, Contacts, Docs)
  • 5 utility agents (scheduling, triage, sales, planning, analysis)
  • 6 support agents (validator, reminder, onboarding, iMessage, etc.)

Workflows (13):

  • 4 HITL approval workflows (email action, calendar action, drive action, slack action)
  • 2 conversational workflows (compose email, schedule meeting)
  • 7 background workflows (daily brief, email triage, sales processing, etc.)

Tools (80+):

  • Gmail: 19 tools
  • Calendar: 14 tools
  • Drive: 20 tools
  • Slack: 13 tools
  • Contacts: 9 tools
  • Docs: 6+ tools
  • Orchestration: 10+ tools (compose, schedule, complex task, resume, etc.)
  • Misc: reminders, feedback, etc.

Where Intelligence Gets Lost

  1. Orchestrator routing (web-orchestrator-agent.ts:176-269): 170 lines of hardcoded if-then rules that turn reasoning into pattern matching.

  2. Query/Action agent split: Forces the orchestrator to decide read vs. write before understanding the request. "Find newsletters then archive them" requires two different agents.

  3. HITL workflows strip AI agency: email-action-workflow.ts is 1256 lines of manual state machine. When the search returns 0 results, the workflow throws an error — it can't reason about alternatives.

  4. smartInboxTool data gap (gmail-tools.ts:2438-2566): Marketing emails get count-only (maxResults: 1), details skipped (if (category === "Marketing") continue), newsletters hardcoded to 0.

  5. Label-based search disconnect: The AI identifies newsletters by sender inference, but the workflow searches by Gmail label (which doesn't exist).

What Works Well (Keep)

  • Tool implementations: Well-isolated, clear input/output schemas, good error handling
  • Three-tier token resolution (token-resolver.ts): Input → RequestContext → Supabase with auto-refresh
  • Middleware pipeline: Auth → body parsing → context population → datetime — clean and necessary
  • MessageDeduplicator processor: Prevents OpenAI Responses API duplicate errors
  • Connected integrations cache: 5-min TTL prevents 6 Supabase queries per request
  • Memory configurations: Web (lastMessages=0 for useChat) vs iMessage (lastMessages=15) is intentional
  • Background workflows: Daily brief, email triage, sales processing — these are batch jobs, not interactive

Target Architecture

Single Agent with Dynamic Tools

1User Message 2 ↓ Chat Endpoint (chat-with-logging.ts — keep as-is) 3 ↓ Consul Agent (single agent, all tools accessible via ToolSearchProcessor) 4 ↓ Tool calls directly (search, archive, send, create event, etc.) 5 ↓ Tool-level suspend() for confirmations 6 ↓ User confirms → tool resumes → API call

2-3 layers instead of 5. The AI reasons about what tools to use.

Core Agent Definition

1import { Agent } from "@mastra/core/agent"; 2import { Memory } from "@mastra/memory"; 3import { ToolSearchProcessor, TokenLimiter, ToolCallFilter } from "@mastra/core/processors"; 4import { MessageDeduplicator } from "../processors/message-deduplicator"; 5 6// Core tools always available (high-frequency, simple) 7import { coreTools } from "../tools/core"; 8 9// All domain tools for ToolSearchProcessor discovery 10import { allDiscoverableTools } from "../tools"; 11 12export const consulAgent = new Agent({ 13 id: "consul-agent", 14 model: "openai/gpt-4.1-mini", 15 16 instructions: async ({ requestContext }) => { 17 // Dynamic, context-aware instructions (see detailed spec below) 18 return buildInstructions(requestContext); 19 }, 20 21 // Core tools always loaded (reminders, feedback — used every session) 22 tools: async ({ requestContext }) => { 23 return { 24 ...coreTools, 25 // Channel-specific tools 26 ...(requestContext?.get("channel") === "imessage" ? imessageTools : {}), 27 }; 28 }, 29 30 memory: new Memory({ 31 options: { 32 lastMessages: false, // useChat manages history client-side 33 semanticRecall: false, 34 workingMemory: { 35 enabled: true, 36 scope: "resource", // Cross-session user preferences 37 schema: userPreferencesSchema, // Structured merge semantics 38 }, 39 }, 40 }), 41 42 inputProcessors: [ 43 new MessageDeduplicator(), // CRITICAL: OpenAI Responses API dedup 44 new ToolSearchProcessor({ 45 tools: allDiscoverableTools, // ~80 tools discoverable on demand 46 search: { topK: 8, minScore: 0.1 }, 47 }), 48 new ToolCallFilter({ 49 exclude: ["compose-email", "schedule-meeting"], 50 }), 51 new TokenLimiter(127000), // Must be LAST 52 ], 53 54 defaultOptions: { 55 autoResumeSuspendedTools: true, // Conversational HITL 56 maxSteps: 10, 57 providerOptions: { 58 openai: { parallel_tool_calls: false }, 59 }, 60 }, 61});

Tool Organization

Core Tools (Always Loaded)

These are high-frequency tools that don't need discovery:

1- createReminder, listReminders, cancelReminder, editReminder 2- submitFeedback 3- resolveRecipient (contact lookup — needed for many flows)

Discoverable Tools (Via ToolSearchProcessor)

Organized by domain, loaded on demand:

1Gmail (19 tools): 2 smartInbox, getEmail, getThread, searchEmails, resolveEmail, listDrafts, listLabels 3 sendEmail*, createDraft, updateDraft, sendDraft 4 archiveEmails*, applyLabel, removeLabel, markAsRead, markAsUnread, star 5 trashEmails*, deleteEmail, deleteDraft 6 7Calendar (14 tools): 8 getEvent, listEvents, listCalendars, searchEvents, getFreeBusy 9 createEvent*, quickAddEvent, updateEvent* 10 moveEvent, setEventColor, addAttendees, removeAttendees 11 deleteEvent*, cancelEvent* 12 13Drive (20 tools): [list, search, download, upload*, create*, share*, delete*, ...] 14Slack (13 tools): [listChannels, getHistory, searchMessages, sendMessage*, ...] 15Contacts (9 tools): [list, get, search, create*, update*, delete*, ...] 16Docs (6+ tools): [get, search, create*, update*, ...] 17 18* = tools with suspend() for HITL confirmation

The Agent Discovers and Loads Tools Dynamically

When user says "archive my newsletters":

  1. Agent calls search_tools("archive email") → finds archiveEmails, searchEmails
  2. Agent calls load_tool("archiveEmails") and load_tool("searchEmails")
  3. Agent calls searchEmails({ query: "from:sam OR from:ai-collective OR from:zulie" })
  4. Agent calls archiveEmails({ messageIds: [...] }) → tool suspends with preview
  5. User confirms → tool resumes → emails archived

The AI reasons about how to find the newsletters (by sender, since it identified them earlier) instead of blindly searching label:Newsletter.

HITL: Tool-Level Suspend/Resume

Replace 1256-line HITL workflows with tool-level suspend/resume.

Example: Archive Emails Tool

1export const archiveEmailsTool = createTool({ 2 id: "archive-emails", 3 description: "Archive one or more emails. Removes them from inbox. Suspends for user confirmation before executing.", 4 inputSchema: z.object({ 5 messageIds: z.array(z.string()).describe("Gmail message IDs to archive"), 6 }), 7 outputSchema: z.object({ 8 archived: z.number(), 9 messageIds: z.array(z.string()), 10 }), 11 suspendSchema: z.object({ 12 preview: z.string().describe("Formatted preview of emails to archive"), 13 count: z.number(), 14 emails: z.array(z.object({ 15 subject: z.string().optional(), 16 from: z.string().optional(), 17 })), 18 }), 19 resumeSchema: z.object({ 20 approved: z.boolean(), 21 }), 22 execute: async (input, context) => { 23 const { resumeData, suspend } = context?.agent ?? {}; 24 const accessToken = await resolveGoogleToken("gmail", undefined, context?.requestContext); 25 26 if (!accessToken) { 27 throw new Error("Gmail not connected. Connect your Gmail account in Settings."); 28 } 29 30 // If not yet approved, fetch previews and suspend 31 if (!resumeData?.approved) { 32 const emails = await batchFetchMetadata(input.messageIds, accessToken); 33 const preview = emails 34 .map(e => `- **${e.subject}** from ${e.from}`) 35 .join("\n"); 36 37 return suspend?.({ 38 preview, 39 count: input.messageIds.length, 40 emails: emails.map(e => ({ subject: e.subject, from: e.from })), 41 }); 42 } 43 44 // Approved — execute 45 await batchModify(accessToken, input.messageIds, { removeLabelIds: ["INBOX"] }); 46 return { archived: input.messageIds.length, messageIds: input.messageIds }; 47 }, 48});

With autoResumeSuspendedTools: true, the conversation flows naturally:

1User: "Archive all the newsletters" 2Agent: [searches emails, finds 7] "I found 7 newsletters to archive: 3 - **AI Weekly** from The AI Collective 4 - **Product Update** from Zulie 5 - ... (5 more) 6 Archive these 7 emails?" 7User: "Yes" 8Agent: [auto-resumes tool] "Done! Archived 7 newsletters."

No workflow. No state machine. The AI handles the entire flow.

Which Current Workflows Convert to Tool Suspend

Current WorkflowLinesReplacementRationale
emailActionWorkflow1256Tool-level suspend on action tools (archive, trash, delete, label)Simple approve/decline. AI handles search/preview.
calendarActionWorkflow1779Tool-level suspend on mutating calendar tools (create, update, delete)Same pattern — preview then confirm.
driveActionWorkflow~800Tool-level suspend on drive mutation toolsSame pattern.
slackActionWorkflow~600Tool-level suspend on sendSlackMessageSame pattern.
composeEmailWorkflow849Consolidated compose-email tool with multi-turn suspendKeep as sophisticated tool with suspend/resume for draft preview + edit cycles.
scheduleMeetingWorkflowInngestKeep as Inngest functionComplex async scheduling with external availability — genuinely needs workflow.

Which Workflows to Keep

WorkflowKeep?Why
dailyBriefWorkflowYesBatch cron job, no user interaction
emailTriageWorkflowYesBackground classification
salesProcessingWorkflowYesBackground email handling
imessageSendWorkflowYesGateway integration
tagNotificationWorkflowYesBackground notification
scheduleMeetingFunction (Inngest)YesComplex async scheduling

These are background/batch workflows, not interactive HITL gates. They stay.

Instructions Design

The agent instructions should be focused on identity and behavior, not routing. Tool descriptions handle the "when to use this tool" question.

1function buildInstructions(requestContext: RequestContext): string { 2 const userName = requestContext?.get("userName"); 3 const userTimezone = requestContext?.get("userTimezone"); 4 const dateTime = requestContext?.get("currentDateTime"); 5 const connected = requestContext?.get("connectedIntegrations"); 6 const agentPrefs = requestContext?.get("agentPreferences"); 7 const assistantName = agentPrefs?.displayName || "Consul"; 8 9 return `You are ${assistantName}, a personal AI executive assistant${userName ? ` for ${userName}` : ""}. 10 11## Identity 12Confident, direct, helpful — not robotic. Brief for actions, thorough for questions. 13${userName ? `Use "${userName.split(" ")[0]}" occasionally, not every message.` : ""} 14 15## Context 16${dateTime ? `Now: ${dateTime.dayOfWeek}, ${dateTime.date} at ${dateTime.time} (${dateTime.timezone})` : `Timezone: ${userTimezone}`} 17 18## Connected Services 19${connected?.gmail ? "Gmail: Connected" : "Gmail: Not connected"} 20${connected?.calendar ? "Google Calendar: Connected" : "Google Calendar: Not connected"} 21${connected?.drive ? "Google Drive: Connected" : "Google Drive: Not connected"} 22${connected?.slack ? "Slack: Connected" : "Slack: Not connected"} 23${connected?.contacts ? "Google Contacts: Connected" : "Google Contacts: Not connected"} 24${connected?.docs ? "Google Docs: Connected" : "Google Docs: Not connected"} 25 26## How to Work 27- Use **search_tools** to find tools for the task, then **load_tool** to make them available. 28- For read operations, call tools directly. For write/delete operations, tools will ask for confirmation. 29- When searching for emails, be creative: search by sender, subject, date — not just labels. 30- If a tool search returns no results, try different keywords. 31- Chain tool calls to accomplish complex tasks (you have up to 10 steps). 32- ALWAYS fetch fresh data — never reuse stale data from conversation history. 33 34## Confirmation Behavior 35- Emails, calendar events, file operations: tools automatically suspend for your confirmation. 36- You'll see a preview. Present it clearly to the user and ask if they want to proceed. 37- When the user confirms, the tool resumes automatically. 38 39## Formatting 40- Use markdown for clarity: **bold** for emphasis, bullet lists for items. 41- Keep responses concise. "Done! Archived 7 newsletters." not "I have successfully completed the archival process for 7 newsletter emails." 42 43## Reminders 44- Pass times in LOCAL format (${userTimezone}) — NO 'Z' suffix. 45 46## Errors 47- Service not connected: "I don't have access to [service]. Connect it in **Settings**." 48- Tool fails: friendly message + offer to retry. 49`; 50}

Key change: No routing rules. No "if gmail connected, route to gmailQueryAgent." The agent sees what's connected and uses tools accordingly. Tool descriptions tell the AI when each tool is appropriate.


Detailed Implementation Plan

Phase 0: Preparation

Estimated scope: Small. No code changes to production.

0.1 — Create feature branch

1git checkout -b refactor/single-agent-architecture

0.2 — Audit tool descriptions Review all 80+ tools and ensure each has a clear, specific description that tells the AI when to use it. This is critical for ToolSearchProcessor accuracy.

Good: "Search Gmail for emails matching criteria. Use Gmail query syntax: from:, subject:, is:unread, newer_than:, etc." Bad: "Search emails"

0.3 — Create tool index file Create apps/agents/src/mastra/tools/index.ts that exports all tools grouped by domain:

1// Core tools — always loaded 2export const coreTools = { 3 createReminder, 4 listReminders, 5 cancelReminder, 6 editReminder, 7 resolveRecipient, 8 submitFeedback, 9}; 10 11// All discoverable tools — for ToolSearchProcessor 12export const allDiscoverableTools = { 13 // Gmail 14 smartInbox: smartInboxTool, 15 getEmail: getEmailTool, 16 getThread: getThreadTool, 17 searchEmails: searchEmailsTool, 18 // ... all other tools 19};

Phase 1: Build the New Agent

Estimated scope: Medium. Create new files alongside existing ones.

1.1 — Create the Consul agent New file: apps/agents/src/mastra/agents/consul-agent.ts

Implements the single agent with:

  • Dynamic instructions via requestContext
  • Core tools always loaded
  • ToolSearchProcessor for discoverable tools
  • Memory with working memory (schema-based)
  • autoResumeSuspendedTools: true
  • Existing processors (MessageDeduplicator, ToolCallFilter, TokenLimiter)

1.2 — Add suspend/resume to write tools Modify existing tool files to add suspendSchema, resumeSchema, and suspend logic to write/delete tools:

Tool FileTools to Modify
gmail-tools.tssendEmail, archiveEmails (new consolidated), trashEmails (new), applyLabel
google-calendar-tools.tscreateEvent, updateEvent, deleteEvent, cancelEvent
google-drive-tools.tsuploadFile, deleteFile, shareFile, trashFile
slack-tools.tssendSlackMessage

Pattern for each:

1// Add to existing tool definition: 2suspendSchema: z.object({ preview: z.string(), count: z.number() }), 3resumeSchema: z.object({ approved: z.boolean() }), 4 5// Wrap execute with suspend logic: 6execute: async (input, context) => { 7 const { resumeData, suspend } = context?.agent ?? {}; 8 if (!resumeData?.approved) { 9 const preview = await buildPreview(input); 10 return suspend?.({ preview, count: input.items.length }); 11 } 12 // Original execute logic 13 return await originalLogic(input, context); 14};

1.3 — Create consolidated archive/trash/organize tools Instead of separate removeInboxLabelTool, markAsReadTool, etc., create higher-level tools:

1// New: Consolidates archive, mark read, mark unread, star, unstar 2export const organizeEmailsTool = createTool({ 3 id: "organize-emails", 4 description: "Organize emails: archive, mark read/unread, star/unstar. Suspends for confirmation on bulk operations.", 5 inputSchema: z.object({ 6 messageIds: z.array(z.string()), 7 action: z.enum(["archive", "markRead", "markUnread", "star", "unstar"]), 8 }), 9 // ... suspend for bulk, execute directly for single 10}); 11 12// New: Consolidates trash and delete 13export const deleteEmailsTool = createTool({ 14 id: "delete-emails", 15 description: "Move emails to trash or permanently delete. Always suspends for confirmation.", 16 inputSchema: z.object({ 17 messageIds: z.array(z.string()), 18 permanent: z.boolean().default(false), 19 }), 20 // ... always suspend (destructive) 21});

1.4 — Refactor composeEmailTool The current compose email workflow (849 lines) becomes a sophisticated tool with multi-turn suspend:

1export const composeEmailTool = createTool({ 2 id: "compose-email", 3 description: "Compose and send an email. Handles recipient lookup, AI drafting, preview, edits, and sending. Suspends for user to review draft before sending.", 4 inputSchema: z.object({ 5 recipient: z.string().describe("Recipient name or email"), 6 contentHint: z.string().describe("What the email should be about"), 7 mode: z.enum(["compose", "reply", "reply_all"]).default("compose"), 8 replyToMessageId: z.string().optional(), 9 }), 10 suspendSchema: z.object({ 11 draft: z.object({ 12 to: z.string(), 13 subject: z.string(), 14 body: z.string(), 15 }), 16 message: z.string(), 17 }), 18 resumeSchema: z.object({ 19 approved: z.boolean(), 20 edits: z.object({ 21 subject: z.string().optional(), 22 body: z.string().optional(), 23 }).optional(), 24 }), 25 execute: async (input, context) => { 26 const { resumeData, suspend } = context?.agent ?? {}; 27 28 if (!resumeData) { 29 // First call: resolve recipient, generate draft, suspend for preview 30 const recipient = await resolveRecipient(input.recipient, context); 31 const draft = await generateDraft(input, recipient, context); 32 return suspend?.({ 33 draft: { to: recipient.email, subject: draft.subject, body: draft.body }, 34 message: "Review this email before sending:", 35 }); 36 } 37 38 if (!resumeData.approved) { 39 return { sent: false, reason: "User declined" }; 40 } 41 42 // Apply edits if any, then send 43 const finalDraft = applyEdits(resumeData); 44 await sendViaGmail(finalDraft, context); 45 return { sent: true, to: finalDraft.to, subject: finalDraft.subject }; 46 }, 47});

1.5 — Fix the smartInboxTool Address the root bug: return marketing/newsletter message IDs, not just counts.

1- // MARKETING: Just get count (don't need details) 2- fetchMessageIds(`label:${marketingTag.gmail_label_id}`, 1) 3+ // MARKETING: Get message IDs (needed for follow-up actions) 4+ fetchMessageIds(`label:${marketingTag.gmail_label_id}`, 20) 5 6- if (category === "Marketing") continue; // Skip marketing details 7+ // Include marketing details so agent can act on them 8 9- newsletters: 0, // Could add separate newsletter detection 10+ newsletters: newsletterCount, // Detected via List-Unsubscribe header

Phase 2: Register and Route

Estimated scope: Small. Wire up the new agent.

2.1 — Update Mastra config (index.ts) Register the new consul agent alongside existing agents (don't remove old ones yet):

1agents: { 2 consulAgent, // NEW single agent 3 // Keep existing for iMessage and email orchestrators 4 imessageOrchestratorAgent, 5 emailOrchestratorAgent, 6 // Keep background-only agents 7 emailTriageAgent, 8 salesAgent, 9},

2.2 — Update chat route Modify chat-with-logging.ts to use the new agent for web chat:

1// Change agent reference from webOrchestratorAgent to consulAgent 2const agent = mastra.getAgent("consul-agent");

2.3 — Update frontend client Update apps/web/lib/mastra/client.ts to point to the new agent ID if needed.

Phase 3: Handle Channel Differences

Estimated scope: Medium. Ensure iMessage and email channels still work.

3.1 — iMessage orchestrator The iMessage orchestrator has unique requirements:

  • Server-side memory (lastMessages: 15)
  • sendResponse tool for replying via gateway
  • One-tool-call-at-a-time for message ordering

Options:

  • Option A (Recommended): Create imessageConsulAgent that extends the core pattern with iMessage-specific memory and tools. Shares the same ToolSearchProcessor and tool library.
  • Option B: Keep existing iMessage orchestrator but point it at consolidated tools instead of sub-agents.
1// Option A: iMessage variant of the Consul agent 2export const imessageConsulAgent = new Agent({ 3 id: "imessage-consul-agent", 4 model: "openai/gpt-4.1-mini", 5 instructions: async ({ requestContext }) => { 6 return buildInstructions(requestContext, { channel: "imessage" }); 7 }, 8 tools: async ({ requestContext }) => ({ 9 ...coreTools, 10 sendResponse: sendResponseTool, 11 startScheduleMeeting: startScheduleMeetingTool, 12 }), 13 memory: new Memory({ 14 options: { 15 lastMessages: 15, 16 workingMemory: { enabled: true, scope: "resource" }, 17 }, 18 }), 19 inputProcessors: [ 20 new MessageDeduplicator(), 21 new ToolSearchProcessor({ 22 tools: allDiscoverableTools, 23 search: { topK: 8, minScore: 0.1 }, 24 }), 25 new ToolCallFilter({ exclude: ["compose-email", "schedule-meeting"] }), 26 new TokenLimiter(127000), 27 ], 28 defaultOptions: { 29 autoResumeSuspendedTools: true, 30 maxSteps: 8, 31 }, 32});

3.2 — Email orchestrator The email orchestrator handles inbound email triage. This is a specialized flow that may benefit from staying as a focused agent with a subset of tools. Evaluate after web + iMessage are migrated.

Phase 4: Cleanup

Estimated scope: Medium. Remove dead code.

4.1 — Remove old orchestrators Once the new agents are stable:

  • Delete agents/orchestrator/web-orchestrator-agent.ts
  • Delete agents/orchestrator/imessage-orchestrator-agent.ts (if Option A)

4.2 — Remove domain agent pairs

  • Delete agents/gmail/gmail-query-agent.ts and gmail-action-agent.ts
  • Delete agents/google-calendar/ query/action pair
  • Delete agents/google-drive/ query/action pair
  • Delete agents/slack/ query/action pair
  • Keep: agents/scheduling/, agents/google-contacts/, agents/google-docs/ (if any contain unique logic not in tools)

4.3 — Remove HITL workflows replaced by tool suspend

  • Delete workflows/hitl/email-action-workflow.ts
  • Delete workflows/hitl/calendar-action-workflow.ts
  • Delete workflows/hitl/drive-action-workflow.ts
  • Delete workflows/hitl/slack-action-workflow.ts
  • Delete workflows/compose-email-workflow.ts

Keep: All background workflows (daily brief, triage, sales, iMessage send, tag notification).

4.4 — Remove unused utility agents Delete any agent not registered in index.ts:

  • planningAgent, validatorAgent, analysisAgent (if only used by removed orchestrators)

4.5 — Update Mastra config Remove deleted agents, workflows from index.ts registration.

Phase 5: Polish and Optimize

Estimated scope: Small. Fine-tune after migration.

5.1 — Tune tool descriptions After deploying, monitor which tools get selected incorrectly and refine descriptions. Per Anthropic: "We spent more time optimizing tools than the overall prompt."

5.2 — Tune ToolSearchProcessor parameters

  • Adjust topK (start with 8, may need more or fewer)
  • Adjust minScore (start with 0.1, increase if irrelevant tools load)

5.3 — Add EnsureFinalResponseProcessor Prevent empty responses when hitting maxSteps:

1new EnsureFinalResponseProcessor(10) // maxSteps = 10

5.4 — Consider observational memory For long conversations, add observational memory to compress old messages:

1memory: new Memory({ 2 options: { 3 observationalMemory: { 4 enabled: true, 5 scope: "resource", 6 observation: { messageTokens: 30_000 }, 7 reflection: { observationTokens: 40_000 }, 8 }, 9 }, 10});

5.5 — KV-Cache optimization Per Manus team's guidance: cached tokens cost 10x less than uncached. Ensure:

  • System prompt prefix is stable (no timestamps in instructions — pass via tool/context)
  • Context is append-only where possible
  • Tool definitions don't change between steps (ToolSearchProcessor handles this)

Migration Strategy

Parallel Deploy (Safe)

  1. Deploy new agent alongside existing agents
  2. Route web chat to new agent, keep iMessage/email on existing
  3. Monitor for 1-2 weeks
  4. Migrate iMessage channel
  5. Clean up old code

Rollback Plan

Keep old orchestrators registered but unused. If the new agent has issues:

  • Switch chat route back to webOrchestratorAgent (one-line change)
  • No data migration needed — same tools, same APIs, same storage

Testing Strategy

  1. Tool discovery: Verify ToolSearchProcessor finds correct tools for common queries
  2. HITL flow: Verify suspend/resume works for all write operations
  3. autoResumeSuspendedTools: Verify natural conversation flow for confirmations
  4. Context preservation: Verify working memory persists user preferences
  5. Edge cases: Multi-step requests, error recovery, missing integrations

Files Changed Summary

New Files

1agents/consul-agent.ts — Main single agent 2agents/imessage-consul-agent.ts — iMessage variant (if Option A) 3tools/index.ts — Tool registry with core + discoverable

Modified Files

1index.ts — Register new agents 2routes/chat-with-logging.ts — Point to new agent 3tools/gmail-tools.ts — Add suspend to write tools, fix smartInbox 4tools/google-calendar-tools.ts — Add suspend to write tools 5tools/google-drive-tools.ts — Add suspend to write tools 6tools/slack-tools.ts — Add suspend to sendMessage 7tools/compose-email-tool.ts — Refactor with suspend/resume

Deleted Files (Phase 4)

1agents/orchestrator/web-orchestrator-agent.ts 2agents/gmail/gmail-query-agent.ts 3agents/gmail/gmail-action-agent.ts 4agents/google-calendar/google-calendar-query-agent.ts 5agents/google-calendar/google-calendar-action-agent.ts 6agents/google-drive/google-drive-query-agent.ts 7agents/google-drive/google-drive-action-agent.ts 8agents/slack/slack-query-agent.ts 9agents/slack/slack-action-agent.ts 10workflows/hitl/email-action-workflow.ts 11workflows/hitl/calendar-action-workflow.ts 12workflows/hitl/drive-action-workflow.ts 13workflows/hitl/slack-action-workflow.ts 14workflows/compose-email-workflow.ts

Unchanged Files

1middleware/index.ts — Keep entire middleware pipeline 2lib/token-resolver.ts — Keep three-tier resolution 3lib/token-fetcher.ts — Keep Supabase fetch 4services/* — Keep all services 5processors/message-deduplicator.ts — Keep (critical for OpenAI) 6workflows/daily-brief-workflow.ts — Keep (background) 7workflows/email-triage-workflow.ts — Keep (background) 8workflows/sales-processing-workflow.ts — Keep (background) 9workflows/imessage-send-workflow.ts — Keep (gateway) 10tools/reminder-tools.ts — Keep as core tools 11All tool implementations — Keep (just add suspend where needed)

Key Risks and Mitigations

RiskImpactMitigation
ToolSearchProcessor returns wrong toolsMedium — agent uses wrong toolTune descriptions, adjust topK/minScore, monitor in production
Tool suspend state lost on restartHigh — HITL confirmations failEnsure LibSQLStore configured for snapshot persistence (already have Turso)
Context window overflow with many toolsMedium — degraded responsesTokenLimiter + ToolSearchProcessor keeps context bounded
Agent makes mistakes without routing guardrailsMedium — wrong actions takenTool-level suspend catches dangerous operations before execution
MessageDeduplicator compatibilityHigh — OpenAI API errorsPort processor as-is, test thoroughly
useChat compatibilityHigh — web chat breaksTest that autoResumeSuspendedTools works with @ai-sdk/react
iMessage channel differencesMedium — different memory needsSeparate iMessage agent variant with its own memory config

Sources