Agent Refactor

Executive Summary

This branch replaces a multi-agent orchestrator architecture (20 agents, 113 tools, 13 workflows) with a single-agent-with-direct-tools architecture. The core thesis: instead of an LLM deciding which sub-agent to route to (an extra hop costing latency and tokens), all ~96 tools are loaded directly onto one agent, with a processor pipeline handling safety, policy, and intelligence concerns.

Key outcomes:

10 specialized agents deleted, replaced by 1 unified agent factory
7 workflows deleted (4 HITL + 3 orchestration), replaced by tool-level suspend/resume
2,904-line monolithic messaging router decomposed into 9-stage pipeline
12 new input/output processors for safety, dedup, compaction, and error recovery
19 composable prompt sections with channel-aware rendering
4-layer cascading tool policy pipeline
Per-tool mutation fingerprinting for dedup

Agent Architecture
Tool System
Processor Pipeline
Prompt Architecture
Middleware
Messaging Gateway
Deleted Workflows & Services
Web App Changes
New Documentation
Tone & Voice Refinements

1. Agent Architecture

Before (main): 20 agents

Agent	Model	Purpose
`webOrchestratorAgent`	gpt-4.1-mini	Central router for web chat, delegated to 7 sub-agents + 4 HITL workflows
`imessageOrchestratorAgent`	gpt-4.1-mini	Central router for iMessage/SMS
`planningAgent`	gpt-4o	ReWOO-style task decomposition
`validatorAgent`	gpt-4o	Maker-Checker result validation
`analysisAgent`	gpt-4o-mini	Content extraction/classification
`googleContactsAgent`	gpt-4o-mini	9 contact tools
`googleDocsAgent`	gpt-4o-mini	13 doc tools
`googleDriveQueryAgent`	gpt-4o-mini	6 drive read tools
`googleDriveActionAgent`	gpt-4o-mini	14 drive write tools
`slackQueryAgent`	gpt-4o-mini	7 slack read tools
`slackActionAgent`	gpt-4o-mini	6 slack write tools
`imessageAgent`	gpt-4o-mini	2 iMessage tools
`reminderAgent`	gpt-4o-mini	Reminder tools
+ 7 retained agents	various	email orchestrator, gmail, calendar, scheduling, triage, sales, onboarding

After (branch): 10 agents

Agent	Model	Purpose
`consulAgent` (NEW)	gpt-4.1	Unified web chat agent, ~96 tools loaded directly
`imessageConsulAgent` (NEW)	gpt-4.1-mini	iMessage/SMS variant, same factory
`emailOrchestratorAgent`	(kept)	Inbound email triage routing
`gmailQueryAgent`	(kept)	Used by email orchestrator
`gmailActionAgent`	(kept)	Used by email orchestrator
`googleCalendarQueryAgent`	(kept)	Used by email orchestrator
`googleCalendarActionAgent`	(kept)	Used by email orchestrator
`schedulingAgent`	(kept)	Scheduling workflow
`emailTriageAgent`	(kept)	Email triage workflow
`salesAgent`	(kept)	Sales processing
`onboardingDemoAgent`	(kept)	New user onboarding

Unified Agent Factory (`agents/consul-agent.ts`, 253 lines)

createConsulAgent(channel) generates both variants from a single code path:

Setting	Web	iMessage
Model	gpt-4.1 (fallback: gpt-4.1-mini)	gpt-4.1-mini (fallback: gpt-4o-mini)
Memory	lastMessages: false	lastMessages: 10
maxSteps	10	8
autoResumeSuspendedTools	false (UI buttons)	true (auto-resume)
Extra processors	--	ConfirmationGate, SendOnceGuard
Compaction thresholds	0.70 / 0.85	0.55 / 0.70
Semantic recall	topK: 3, messageRange: 2	topK: 2, messageRange: 1
Observational memory	30k / 50k tokens	15k / 30k tokens

Memory: Three-tier system:

Working memory (structured user preferences, resource-scoped)
Semantic recall (cross-thread RAG via LibSQL vector)
Observational memory (auto-summaries via gpt-4.1-nano, resource-scoped)

2. Tool System

+5,282 / -2,866 lines across 34 files in tools/

New Infrastructure

Component	File(s)	Purpose
Tool Registry	`tools/registry.ts` (261 lines)	Single source of truth — flat map of all ~96 tools
Tool Index	`tools/index.ts` (101 lines)	Public API: exports registry, groups, policy, legacy compat
Tool Groups	`tools/groups.ts` (268 lines)	Semantic grouping: `"<service>:<level>"` (read/write/confirm). Supports wildcards (`"gmail:*"`) and exclusions (`"gmail:!confirm"`)
Tool Classification	`tools/tool-classification.ts` (269 lines)	Every tool classified as `read`, `write`, or `confirm`. Plus `MUTATION_FINGERPRINT_FIELDS` and `READ_IDENTITY_FIELDS`
Tool Metadata	`tools/tool-metadata.ts` (1,763 lines)	Per-tool: `schemaDescription`, `promptSummary`, `triggers`, `notToBeConfusedWith`, `parameterGuidance`. Includes `buildDisambiguationMatrix()` for system prompt injection
Policy Pipeline	`tools/policy/` (5 files, 402 lines)	4-layer cascade: Channel → ConnectedIntegrations → Safety → UserOverrides. Resolves allowed tools per request
Mutation Fingerprinting	`tools/mutation/` (4 files, 251 lines)	Per-field fingerprinting with SHA-256, `MutationTracker` with 5-min TTL
Confirmation Helper	`tools/lib/with-confirmation.ts` (64 lines)	`requireConfirmation()` — uses Mastra's native suspend/resume. No-op on iMessage (agent prompt handles it)

Deleted Tools (-2,538 lines)

Tool	Lines	Replaced By
`complex-task-tool.ts`	215	Single agent's direct tool chaining
`compose-email-tool.ts`	567	`draft-email-tool.ts` (271 lines)
`resume-workflow-tool.ts`	551	Mastra native suspend/resume
`scheduling/schedule-meeting-tool.ts`	1,116	`find-available-slots-tool.ts` (388 lines)
`start-schedule-meeting-tool.ts`	89	Direct Inngest workflow trigger

New Tools

Tool	Lines	Purpose
`draft-email-tool.ts`	271	Resolves recipient, generates AI draft, returns preview (classified as "read" — no side effects)
`find-available-slots-tool.ts`	388	Fetches scheduling prefs, resolves attendees, queries FreeBusy, finds available slots
`bulkDismissRemindersTool`	~40	Dismiss multiple reminders at once (in `reminder-tools.ts`)

Modified Tools

Confirmation gates added to all confirm-classified tools using requireConfirmation():

Gmail: sendEmail, sendDraft, trashEmail, batchModifyEmails
Calendar: createEvent, updateEvent, quickAddEvent, addAttendees, removeAttendees, deleteEvent, cancelEvent
Drive: shareFile, updatePermission, removePermission, trashFile
Docs: deleteDocument
Slack: sendSlackMessage
Contacts: deleteContact

Gmail tools (+902 lines): New batchFetchFullMessages(), gzip compression, new fetchSentRepliesTool, awaitingReplyTool, smartInboxTool.

Resolve recipient tool: Now uses tiered search (curated relationships → AI recommendations → broader sources) with early return on high-confidence match.

All tool descriptions rewritten to concise, parameter-focused format matching the metadata registry.

3. Processor Pipeline

12 new processors form a layered safety and intelligence system.

Input Processors (per-step)

#	Processor	Hook	Purpose
1	DateTimeInjector	processInput	Injects date/time into user message (not system prompt) for prompt cache stability
2	MessageDeduplicator	processInput	Prevents OpenAI Responses API duplicate `item_reference` errors
3	ToolPolicyProcessor	processInputStep	Resolves allowed tools via 4-layer policy cascade. Caches after step 0
4	ConfirmationGateProcessor (iMessage only)	processInputStep	Blocks CONFIRM tools unless prior turn communicated with user
5	MutationGuardProcessor	processInputStep	Prevents duplicate mutations via fingerprinting. `warn` for writes, `block` for confirms
6	RepeatCallDetector	processInputStep	Prevents redundant read calls via identity-based dedup
7	SendOnceGuard (iMessage only)	processInputStep	After `sendResponse`, disables all tools and forces empty return
8	ErrorRecoveryProcessor	processInputStep	Truncates oversized tool results (4k/8k chars) + classifies errors with recovery hints
9	StagedCompactionProcessor	processInputStep	Summarize-then-prune via gpt-4.1-nano. Channel-aware thresholds
10	ContextWindowGuard	processInputStep	Last-resort: warns at ~32k remaining, strips tools at ~16k remaining
11	EnsureFinalResponseProcessor	processInputStep	On final step, removes tools and forces response
12	TokenLimiter	processInputStep	Hard truncation safety net. Always LAST

Output Processors (post-generation)

Processor	Purpose
ToolResultTrimmer	Head+tail truncation (1.5k/3k chars) before memory saves. Strips verbose fields

Two-Tier Truncation

Layer	When	Limits	Purpose
ErrorRecoveryProcessor	Per-step (LLM view)	4,000 / 8,000 chars	Rich data during current generation
ToolResultTrimmer	Post-generation (memory)	1,500 / 3,000 chars	Lean storage across turns

Mutation Safety Layering

Three distinct processors at different levels:

ToolPolicyProcessor — which tools are available at all
ConfirmationGateProcessor — which tools need prior user communication (iMessage)
MutationGuardProcessor — which mutations have already been executed

4. Prompt Architecture

19 composable sections assembled by a builder pattern (prompts/builder.ts).

Priority Bands

Range	Category	Sections
0-99	Identity & context	`identity` (0), `context` (10)
100-199	Tool sections	`tool-listing` (100), `tool-call-style` (110), `task-planning` (115), `tool-capability-hints` (120), `contextual-references` (130)
200-299	Capability instructions	`email-composition` (200), `meeting-scheduling` (210)
300-399	Behavioral rules	`confirmation-behavior` (300), `cross-service` (310), `memory-recall` (315), `working-memory` (320), `web-formatting` (330), `imessage-formatting` (330), `response-delivery` (335), `reminders` (340), `greetings` (350)
400-499	Error handling & safety	`errors` (400), `critical-rules` (410)

Key Design Decisions

Cache-stable system prompt: Date/time is NOT in the system prompt — DateTimeInjector puts it in user messages. This enables OpenAI's prompt caching (50% discount on cached tokens).
Channel-aware rendering: Many sections render different content per channel (identity tone, confirmation flow, formatting rules).
Conditional sections: email-composition only renders when Gmail connected, meeting-scheduling when Calendar connected, cross-service when 2+ services connected.
Voice calibration: identity section applies toneStyle (formal/brief/casual/balanced) from agent preferences with channel-specific adjustments.
Disambiguation matrix: tool-capability-hints dynamically builds a confusion matrix from tool-metadata.ts for connected services only.

5. Middleware

Decomposed into 5 focused middleware functions in middleware/index.ts:

Middleware	Scope	Purpose
`authMiddleware`	Global	JWT verification, gateway secret auth, sets userId
`bodyParsingMiddleware`	POST requests	Extracts whitelisted context (22 keys) from request body
`contextPopulationMiddleware`	API + custom routes	Fetches profile, preferences, connected integrations from Supabase (5-min cache)
`dateTimeMiddleware`	Global	Formats current date/time in user's timezone
`sessionLoggingMiddleware`	POST /chat	Records chat session activity (non-blocking)

6. Messaging Gateway

The largest single change. Monolithic router.ts (2,904 lines) decomposed into a pipeline architecture (197-line thin orchestrator).

New Architecture

A. Channel Plugin System (channels/)

ChannelPlugin interface with capability metadata (supportsTypingIndicator, supportsReactions, supportsEffects, supportsMarkdown)
ChannelRegistry for lifecycle management
Plugins: IMessagePlugin, SMSPlugin, AgentMailPlugin
Replaces hardcoded if (channel === "imessage") with capability checks

B. Pipeline Stages (pipeline/)

IncomingMessage
  → CapabilityResolver    (resolve userId/orgId)
  → EmailDeduplicator     (skip duplicates)
  → ProfileEnricher       (timezone, name, email, EA identity)
  → ContextBuilder        (datetime, identities, scheduling, integrations)
  → RouteResolver         (priority-based bindings)
  → AgentCaller           (HTTP call with retry + presence)
  → ResponseProcessor     (messaging-handled detection, markdown strip)
  → SessionRecorder       (token attribution)
  → ProspectPostProcessor (lead scoring)

C. Route Binding System (pipeline/bindings/)

Priority	Binding	Action
100	SuspendedSchedulingWorkflow	Resume suspended workflow (email)
90	InngestApproval	Route to Inngest for meeting confirmation
80	CCScheduling	Start new scheduling workflow
50	Prospect	Route to sales-agent
30	EmailChannel	Route to email-orchestrator-agent
30	IMessageChannel	Route to imessage-consul-agent
0	Fallback	Route to consul-agent

D. Session Queue (pipeline/session-queue.ts)

Concurrency management for rapid-fire messages per session
4 modes: queue (FIFO), collect (batch with timeout), interrupt (cancel + restart), followup (queue + combine)

E. ReplyDispatcher (lib/reply-dispatcher.ts)

Unified message delivery: dedup (5s window), presence management, tapback delivery, paced multi-message, markdown stripping, iMessage effects

F. Extracted Libraries (lib/)

agentmail-history.ts — AgentMail conversation fetcher
encryption.ts — AES-256-GCM decryption
intent-detection.ts — Reschedule intent detection
markdown.ts — Markdown stripping
scheduling-api.ts — Scheduling workflow HTTP helpers
tool-response.ts — Tool response parsing helpers

7. Deleted Workflows & Services

Deleted Workflows (-5,532 lines)

Workflow	Lines	Replacement
`hitl/calendar-action-workflow.ts`	1,778	Tool-level `requireConfirmation()` + `autoResumeSuspendedTools`
`hitl/email-action-workflow.ts`	1,218	Same
`hitl/drive-action-workflow.ts`	689	Same
`hitl/slack-action-workflow.ts`	449	Same
`compose-email-workflow.ts`	848	Single agent chains `draftEmail` → `sendEmail` directly
`complex-task-workflow.ts`	501	Single agent with `maxSteps: 10` and direct tool access
`schedule-meeting-workflow.ts`	49	Direct Inngest trigger

Deleted Services (-983 lines)

Service	Lines	Replacement
`approval-service.ts`	557	Mastra native tool suspend/resume
`artifact-approval-handler.ts`	426	Tool-level suspend/resume

Deleted Utilities (-523+ lines)

Utility	Lines	Reason
`utils/step-executor.ts`	313	Only used by deleted `complex-task-workflow.ts`
`utils/variable-resolver.ts`	~100+	Only used by `step-executor.ts`
`types/planning.ts`	318	Type definitions for deleted plan/execute pattern

Retained Workflows (unchanged)

emailTriageWorkflow, dailyBriefWorkflow, salesProcessingWorkflow, tagNotificationWorkflow, imessageSendWorkflow
Inngest workflows (email scheduling, schedule meeting, relationships)

8. Web App Changes

Chat Interface (`chat-interface.tsx`)

Added data-tool-call-suspended rendering: Displays confirmation previews from suspended tools (HITL). Shows suspendPayload.preview or suspendPayload.message as assistant message.
Simplified tool activity display: Removed streamingToolName state, onData callback parsing, and associated useEffect. Now relies solely on useToolActivity hook.

Tool Display Names

Added: draft-email, find-available-slots
Renamed: get-freebusy → get-free-busy, cancel-reminder → dismiss-reminder
Removed: execute-complex-task, executeComplexTask

Relationships Components

Refactored 4 dialog components to use React key prop pattern instead of useEffect for form state sync.

9. New Documentation

File	Lines	Purpose
`docs/AGENT_REFACTOR_PLAN.md`	970	Comprehensive refactor plan with architecture analysis, problem statement, and phased implementation
`docs/PRODUCT_OVERVIEW.md`	234	Product documentation
`docs/TOOL_SUSPENSION_COMPLETE.md`	218	Completed tool-level HITL implementation docs
`docs/TOOL_SUSPENSION_IMPLEMENTATION.md`	95	Technical implementation guide for tool suspension

Across all remaining agents, exclamation marks and corporate pleasantries have been replaced with professional, direct language:

Agent	Before	After
Onboarding	"Keep the experience magical"	"Demonstrate practical utility"
Onboarding	"Show the user how valuable you can be"	"Let the results speak for themselves"
Scheduling	"Happy to help find a time, here are some available:"	"Here are some available times:"
Email Composer	"Got it! Here are some updated times"	"Understood. Here are some updated times"
Email Triage	"Thanks for reaching out!"	"Thank you for reaching out."
iMessage Greetings	"What do you need?"	"How can I be of assistance?"
iMessage Formatting	(none)	Multi-bubble calendar events with meet links

Commit History

Hash	Message
`c07dcea`	refactor(agents): single-agent architecture with intelligence improvements
`b496eb6`	refactor(agents): harden processor pipeline, fix tool classification, and improve tool disambiguation
`5389033`	refactor(agents): improve iMessage greeting tone and add multi-bubble calendar formatting