ContentEngine — Executive System Overview
Author: Alton Wells Date: March 2026 Status: Final Architecture Specification Consolidates: Technical Specification v3 + Addendums #1, #2, #3
Table of Contents
- Executive Summary
- Technology Stack
- Data Architecture
- Master Workflow
- Agent System
- Strategy Layer
- Content Layer
- Production Layer
- Content Type Registry
- Refresh Queue System
- Observability & Audit Trail
- Cost Model
- Risk Matrix
- Future: Filesystem-as-Context
1. Executive Summary
ContentEngine is an autonomous AI content production system that replaces manual content marketing workflows with a three-layer agentic pipeline. The system is built on Mastra (TypeScript agent framework), LangExtract (structured document extraction), and Firecrawl (web crawling and sitemap intelligence). It manages the full content lifecycle from competitive intelligence through publication and performance monitoring.
The system is designed for Consul, a B2B SaaS AI executive assistant targeting CEOs and founders at a $200/month price point. Content must build trust with sophisticated, high-ticket decision-makers, making E-E-A-T signals, programmatic SEO validation, and editorial quality non-negotiable.
Core Architectural Principles
- Humans set strategy and approve output. AI executes everything in between. Three human gates govern strategic decisions, editorial review, and final publication.
- No vectors. No embeddings. Structured extractions, hierarchical summaries, and an explicit content relationship graph replace RAG. Stanford's 2025 research shows embedding precision collapses 87% beyond 50K documents. This approach scales without dimensional decay.
- Programmatic SEO validation is non-negotiable. Every published piece must pass a 10/10 deterministic SEO check across 10 blocking checks, 6 warning checks, and 6 informational metrics.
- Context is navigated, not stuffed. Agents traverse a four-level hierarchy (Domain → Cluster → Page → Entity) loading only what they need. Typical planning context: ~14,000 tokens instead of millions.
- Everything is traceable. Every extraction maps to its source location. Every graph edge has provenance. Every agent decision is recorded in a structured audit trail with token usage, tool call sequences, and cost tracking.
2. Technology Stack
2.1 Core Framework
| Component | Technology | Role |
|---|---|---|
| Mastra | @mastra/core (TypeScript) | Agent definitions, workflow orchestration, tool system, suspend/resume for human gates, Hono server generation |
| Vercel AI SDK | Foundation layer under Mastra | Unified model routing, streaming, structured output, tool calling protocol |
| Zod | Schema validation | Input/output schemas for every agent, tool, and workflow step with compile-time type safety |
2.2 Extraction & Intelligence
| Component | Technology | Role |
|---|---|---|
| LangExtract | Python library (FastAPI sidecar) | Structured extraction from unstructured text. Source-grounded entities. Multi-pass extraction for high recall. |
| Firecrawl | Web crawling API/SDK | Competitor sitemap discovery, page crawling, content extraction. Handles JS-rendered pages and rate limiting. |
| Gemini 2.5 Flash | LLM ($0.15/1M tokens) | Extraction model. Fast, cheap, high quality for structured extraction tasks. |
2.3 LLM Providers
Claude Sonnet 4 (anthropic/claude-sonnet-4-20250514) powers all nine Mastra agents across strategy, content, and production layers. Temperature is configured per agent: 0.8 for the Writer (creative generation), 0.4 for the Editor (precision), 0.1–0.3 for production agents (mechanical execution). Gemini 2.5 Flash handles all LangExtract extraction pipelines and hierarchical summary generation.
2.4 Data & Hosting
PostgreSQL (no pgvector) serves as the primary database for all structured data, extraction entities, graph adjacency tables, summaries, and content plans using JSONB for flexible extraction attributes. Drizzle ORM provides type-safe database access. The application layer uses Next.js 15+ for the UI (calendar, editor, dashboards) hosted on Vercel, with agent workers and the LangExtract sidecar running on Railway. Trigger.dev handles durable job scheduling for crawls, extractions, summary regeneration, and post-publish monitoring.
2.5 System Architecture
2.6 External APIs
| API | Purpose |
|---|---|
| Semrush / Ahrefs | Keyword data, search volume, difficulty, SERP features, competitor rankings |
| Google Search Console | Impressions, clicks, CTR, average position per query (OAuth2) |
| Google Indexing / IndexNow | Fast crawl requests for newly published content |
| CMS Adapter | WordPress REST / Sanity / Contentful via adapter pattern |
3. Data Architecture
ContentEngine replaces conventional RAG/vector embeddings with a three-layer memory model. All data is stored in PostgreSQL as structured, queryable records.
3.1 Three-Layer Memory Model
3.2 Layer 1 — Structured Extraction (LangExtract)
Every document entering the system — competitor pages, our content, SERPs, AI Overviews, brand voice samples — is processed through LangExtract extraction pipelines. Raw text becomes structured, source-grounded entities stored in typed Postgres tables. Agents query structured data, not fuzzy similarity scores.
Six extraction classes are defined, each with a fixed schema, dedicated prompt description, and a minimum of three few-shot examples:
| Class | Entity Types | Trigger |
|---|---|---|
| competitor_page | topic, claim, keyword_signal, content_structure, cta, entity_reference | On discovery or change (weekly scan) |
| our_page | Same as competitor + internal_link | On publish or bootstrap |
| serp | serp_result, serp_feature, paa_question | Weekly per tracked keyword |
| ai_overview | aio_claim, aio_structure | When AIO detected in SERP |
| brand_voice | tone_marker, vocabulary_preference, sentence_pattern | On strategy create/update |
| keyword_data | Deferred for MVP | Direct from Semrush API |
Integration decision: LangExtract runs as a Python FastAPI sidecar (not the unofficial Node SDK) because critical features — multi-pass extraction, cross-chunk coreference resolution, and controlled generation via Gemini schema constraints — are Python-only. A circuit breaker pattern (3 consecutive failures opens the circuit) mitigates the Python–TypeScript bridge as a single point of failure.
3.3 Layer 2 — Hierarchical Document Summaries
A four-level summary tree enables agents to navigate from broad domain context to specific entity-level detail, loading only relevant branches:
Agents start at Level 0 and drill down only into relevant branches. Summaries are regenerated from fresh extractions and timestamped for versioning.
3.4 Layer 3 — Content Relationship Graph
An adjacency table in Postgres with typed edges connects content entities. This replaces vector similarity for all "find related content" operations. Each edge carries a confidence score, provenance (which agent or system created it), and a last-validated timestamp.
| Edge Type | Source → Target | Meaning |
|---|---|---|
covers_topic | page → topic | Page covers this topic (with depth) |
targets_keyword | page → keyword | Page targets this keyword (with rank) |
competes_with | our_page → competitor_page | Pages compete for same keyword |
outperforms | competitor_page → our_page | Competitor ranks higher for shared keyword |
gap | topic → (null) | Topic with competitor coverage but zero ours |
cannibalizes | our_page → our_page | Both target same primary keyword |
links_to | our_page → our_page | Actual internal link exists |
should_link_to | our_page → our_page | Agent-recommended linking opportunity |
child_of | topic → topic_cluster | Hierarchical topic relationship |
4. Master Workflow
ContentEngine operates across three workflow layers with human control at each transition. Workflows own the sequence ("what happens next"); agents own the execution ("how do I do this step"). This is deterministic orchestration with agentic execution — no supervisor/sub-agent patterns.
4.1 End-to-End Flow
4.2 Human Gates
Each human touchpoint is a Mastra workflow suspension. The workflow suspends and provides a structured payload describing the review task. The Next.js app renders the appropriate UI and calls the resume endpoint when the human completes their action.
| Gate | Human Actions | Est. Time |
|---|---|---|
| Gate 1: Calendar Review | Review AI-generated content plan, approve/reject/edit items, add manual items, set schedule | 15–30 min per cycle |
| Gate 2: Brief Approval | Review outline, confirm direction, adjust scope, approve or request changes | 5–10 min per brief |
| Gate 3: Draft Review | Deep edit, add personal experience and insights, place images, final voice check, approve or reject | 15–30 min per piece |
Design principle: Image placement is manual at Gate 3. The Writer Agent outputs
[IMAGE: description]markers as placement suggestions. Image selection requires brand aesthetic judgment, rights verification, and contextual sensitivity that current AI image generation does not handle reliably at production quality.
5. Agent System
5.1 Agent Configuration
Every agent's model, temperature, and step budget are configurable at runtime via a database settings table. No model IDs are hardcoded. This allows model swaps without redeployment. Changes take effect on the next agent invocation.
| Agent | Layer | Model | Max Steps | Temp |
|---|---|---|---|---|
competitive-intelligence | Strategy | Claude Sonnet 4 | 12 | 0.5 |
search-landscape | Strategy | Claude Sonnet 4 | 10 | 0.5 |
content-strategy | Strategy | Claude Sonnet 4 | 20 | 0.7 |
content-brief | Content | Claude Sonnet 4 | 15 | 0.6 |
writer | Content | Claude Sonnet 4 | 12 | 0.8 |
editor | Content | Claude Sonnet 4 | 10 | 0.4 |
final-cleanup | Production | Claude Sonnet 4 | 6 | 0.2 |
publishing | Production | Claude Sonnet 4 | 15 | 0.1 |
seo-autofix | Production | Claude Sonnet 4 | 8 | 0.3 |
Temperature rationale: Strategy agents are moderate (creative planning grounded in data). The Writer is highest (creative generation). The Editor is low (precision). Production agents are near-zero (mechanical execution).
5.2 Tool Inventory (26 Tools)
Tools are grouped into four categories. Each agent receives only the tools it needs — a smaller tool surface means fewer irrelevant calls, lower token usage, and easier debugging.
| Category | Count | Tools |
|---|---|---|
| Shared | 8 | readDomainSummary, readClusterSummaries, readPageSummaries, queryExtractions, traverseContentGraph, webSearch, queryContentPlan, readContentStrategy |
| Strategy | 8 | queryCompetitorChanges, queryKeywordPerformance, querySerpExtractions, queryAiOverviewTracking, semrushKeywordResearch, gscPerformanceQuery, addContentPlanItem, updateContentPlanItem |
| Content | 3 | loadApprovedBrief, queryOurPages, verifyInternalLinks |
| Production | 7 | formatForCms, uploadToCms, setMetadata, pingIndexingApi, triggerPostPublishPipeline, schedulePostPublishMonitoring, logToAuditTrail |
5.3 Tool-to-Agent Assignment
5.4 Context Navigation Pattern
The Content Strategy Agent — the most complex decision-maker with 10 tools — demonstrates how agents navigate the hierarchy to build precisely relevant context:
- Read active strategy directives (~500 tokens)
- Read combined domain summary for the big picture (~500 tokens)
- Ingest Competitive Intelligence report from workflow state (~2,000 tokens)
- Ingest Search Landscape report from workflow state (~2,000 tokens)
- Read per-pillar cluster summaries (~3,000 tokens)
- Check current calendar to avoid duplication (~1,000 tokens)
- Traverse graph for gaps, cannibalization, and outperformance (~1,500 tokens)
- Drill into specific page summaries for top candidates (~2,000 tokens)
Total: ~14,000 tokens of precisely relevant context, versus the impossibility of stuffing 847+ full pages into a context window.
6. Strategy Layer
6.1 Competitive Intelligence Agent
Continuously analyzes the competitor database — structured LangExtract data, not raw HTML — and produces actionable competitive insights. Runs weekly for full analysis and daily for a lightweight change digest.
Outputs: Competitor moves (with relevance scoring and source extraction IDs), content gaps (with estimated impact), positioning insights. Every finding includes provenance for traceability.
6.2 Search Landscape Agent
Monitors keyword performance, SERP composition, AI Overview appearances, and search trends using structured SERP data. Runs daily for ranking changes and AI Overview monitoring, weekly for full landscape analysis.
Outputs: Ranking changes with trend classification, AI Overview citation alerts, emerging keyword opportunities, declining content flags, SERP feature opportunities.
6.3 Content Strategy Agent
The brain of the system. Synthesizes both prior agent outputs with content inventory, strategy directives, and graph relationships to produce a prioritized, scheduled content plan.
Scoring model: strategic_alignment × search_opportunity × competitive_urgency × gap_severity. The agent checks the graph for cannibalization before recommending new content, respects human-added calendar items as fixed constraints, and suggests scheduling based on capacity.
Outputs: Prioritized plan items with title, target keyword, content type, rationale, competitive context, schedule date, priority (1–3), estimated impact, internal link targets, and graph evidence. Plan items are written to the database with source: "ai_generated" and the workflow suspends for human calendar review.
7. Content Layer
7.1 Content Brief Agent
For each approved plan item, generates a detailed brief including: complete H2/H3 outline with keyword mapping per section, competitor differentiation strategy (using structured extraction data), internal linking targets from the graph, external resource recommendations, and brand voice requirements. The brief is saved and the workflow suspends for human approval (Gate 2).
7.2 Writer Agent
Receives the approved brief, brand voice extractions, and strategy context as dynamic instructions. Produces a complete first draft that follows the outline exactly, hits word count targets (±10%), integrates keywords naturally, includes all specified links, and marks image placement opportunities as [IMAGE: description] for human insertion.
Deliberate constraint: The Writer has only 4 tools (webSearch, queryExtractions, traverseContentGraph, and the brief context). Most of its context comes from the pre-assembled brief, not from live queries.
7.3 Editor Agent
Reviews the draft against seven dimensions: language correctness, verbal consistency (terminology, voice), brand voice adherence (compared against extracted patterns), factual grounding (claims verified against brief sources and web), structural quality, link integrity (all internal links verified against published pages), and keyword optimization.
Outputs: Overall pass/needs_revision assessment, specific edits with location, type, severity (critical/suggested), and fix suggestions. Voice consistency score (0–100), readability score, and optionally a revised draft.
7.4 Revision Loop
If the Editor returns "needs_revision" with critical edits, the draft returns to the Writer with the edit list as additional context. Maximum 2 revision cycles. After 2 cycles, the draft proceeds to human review regardless — humans catch what agents miss. This prevents infinite loops while maintaining quality.
8. Production Layer
8.1 Human Draft Review (Gate 3)
The most important step in the entire system. The workflow suspends and presents the draft alongside the brief in a side-by-side editor interface. The human reviewer:
- Reads and assesses the full draft against the brief
- Adds personal experience and original insights (the irreplaceable 20%)
- Places and curates images at
[IMAGE:]markers - Edits for voice and brand consistency
- Fact-checks statistics and claims
- Approves or rejects (rejection sends back to Content Layer with notes)
8.2 Final Cleanup Agent
A lightweight technical-only pass after human edits. Checks markdown formatting validity, image alt text and dimensions, internal link resolution, heading hierarchy integrity, and consistent list formatting. Does not change tone, wording, or content.
8.3 SEO Validation Engine
Deterministic code, not an LLM. Operates as a CI/CD deployment gate with three severity tiers:
| Tier | Behavior | Checks |
|---|---|---|
| BLOCKING (10) | Publication halted. Auto-fix attempted (max 3 cycles). If still failing, escalate to human. | Meta title, meta description, heading hierarchy, keyword presence, keyword density, internal linking, schema validity, URL/slug, date integrity, structural compliance |
| WARNING (6) | Logged and tracked. Does not block. Contributes to content health score. | Image density, external linking, readability score, content depth coverage, FAQ quality, mobile/performance |
| INFO (6) | Logged for analytics and trending only. | Word count delta, keyword density exact, schema richness, link density ratio, reading time, citation-ready block count |
8.4 Publishing & Post-Publish Pipeline
The post-publish pipeline closes the data loop: extraction → summary regeneration (L2 → L1 → L0) → graph edge construction → bidirectional linking → monitoring. Content is not rolled back on pipeline failure — the content is live, and internal data syncs on the next scheduled job.
9. Content Type Registry
ContentEngine produces eight content types, each with a distinct structural template, schema mapping, internal linking profile, image density rule, and refresh cadence.
| Type | Length | Primary Schema | Key Requirements |
|---|---|---|---|
| Blog Post | 1,500–2,500 words | BlogPosting + FAQPage | Min 4 internal links, 3 images, FAQ required |
| Listicle | 1,500–3,000 words | Article + ItemList | 1 image per list item, numbered H2s required |
| Guide | 3,000–5,000 words | Article + FAQPage | Min 8 internal links, 5 images, citation-ready definition block |
| How-To | 2,000–4,000 words | HowTo + FAQPage | 1 image per step, troubleshooting section, HowToStep schema |
| Comparison | 2,000–4,000 words | Article + Product/ItemList | Head-to-head or roundup variants, comparison table required |
| Case Study | 1,500–2,500 words | Article + Review | Customer quote required, min 2 quantified results |
| Pillar Page | 2,500–4,000 words | CollectionPage | Min 15 internal links, updated when child content publishes |
| Glossary | 500–1,200 words | DefinedTerm + FAQPage | Under 50-word definition block, related term cross-links |
Each type carries a structural scaffold that the Writer Agent must follow exactly and the SEO Validation Engine checks at the deployment gate. Every page carries a @graph array of JSON-LD entities including Organization, WebSite, WebPage, the content-type-specific schema, author Person entity, and BreadcrumbList.
9.1 Schema Architecture
9.2 Author Entities & E-E-A-T Strategy
Three author entities are defined, each with a dedicated profile page at /authors/[slug]/:
- Alton Wells — Primary human author with full E-E-A-T credentials
- Stan — Secondary human author
- Auctor — AI editorial assistant, presented transparently. Profile describes the human-AI editorial process: articles are researched and drafted using AI, then reviewed, enriched with original insights, and approved by a human author.
Author profile pages include headshot/avatar, bio, expertise tags linked to pillar pages, social links, recent article feed, and ProfilePage + Person schema markup.
10. Refresh Queue System
Every published page is re-evaluated on a maximum 90-day cycle. Re-evaluation does not mean automatic refresh — it means the system scores refresh urgency and only pages crossing a threshold enter the active queue.
10.1 Urgency Score (0–100)
| Signal | Weight | Data Source | Scoring Logic |
|---|---|---|---|
| Position Decay | 30% | GSC performance | (position drop / 10) × 100, capped at 100 |
| Traffic Decline | 25% | GSC performance | (% impression decline, 30d vs prior 30d) × 100 |
| Content Age | 20% | our_pages.last_updated | (days since update / 90) × 100 |
| Competitive Displacement | 15% | Graph outperforms edges | New/worsened outperforms edges × 25 |
| Factual Staleness | 10% | Extraction claim entities | (dated claims older than current year / total) × 100 |
10.2 Refresh Tiers
Refreshes are mixed into the content calendar at a 70/30 ratio (new content / refreshes). Light refreshes consume 0.25 capacity units, moderate 0.5, and heavy 1.0. As the content library grows, the ratio naturally shifts. If the refresh queue is empty, 100% of capacity goes to new content.
11. Observability & Audit Trail
Every agent invocation produces exactly one audit log row capturing workflow context, timing, token usage, cost estimates, the ordered sequence of tool invocations, and output summaries.
11.1 What Gets Logged
- Workflow ID, run ID, step ID, and agent ID for full traceability
- Model ID used (resolved from config at invocation time)
- Start time, completion time, and duration in milliseconds
- Steps used vs. max steps configured
- Input tokens, output tokens, total tokens, and estimated cost (USD)
- Ordered tool call sequence with input summaries, output summaries, per-call duration, and token usage
- Status:
success,failed,failed_max_steps, ortimeout - Error message and type if applicable
11.2 Dashboard Views
- Cost by period: daily/weekly/monthly spend breakdown by agent
- Agent efficiency: average steps used, max-steps failures, step utilization ratio
- Workflow run detail: full step-by-step timeline for any specific run
- Slowest agents: average and max duration for performance optimization
11.3 Error Handling
- maxSteps policy: When an agent exhausts its step budget, the workflow captures the partial output, marks the step as
failed_max_steps, logs the full tool call history, sends a Slack notification, and halts the workflow run. This is not a silent failure. - Malformed output: Zod validation catches invalid structured output. One retry with a corrective note. If retry fails, step fails.
- Tool failures: Agents see error messages and can retry or use alternative approaches. The LangExtract sidecar uses a circuit breaker (3 consecutive failures opens the circuit, 60-second recovery window).
- Post-publish pipeline: Each job retries 3× with exponential backoff (10s, 30s, 90s). Failed jobs do not roll back the publication — the content is live, but internal data is temporarily out of sync until the next scheduled job catches up.
11.4 SEO Validation Tracking
Every validation run (pre-publish, refresh evaluation, scheduled weekly audit, manual) is stored with full check results. A weekly Trigger.dev job runs the complete validation suite against all published pages to catch degradation from external changes. Validation score decay feeds into the refresh urgency score.
12. Cost Model
Monthly estimates based on 50 published pieces, 10 competitors, and 100 tracked keywords:
| Category | Monthly Estimate | Notes |
|---|---|---|
| Claude Sonnet 4 (all agents) | $200–400 | ~$4–8 per piece across strategy, writing, editing, briefs, cleanup |
| Gemini 2.5 Flash (extraction + summaries) | $50–100 | Extraction itself is ~$1.62/mo; bulk is summary generation |
| Firecrawl | $40–80 | Competitor sitemap crawling + page scraping |
| Semrush API | $119–229 | Business plan for keyword/SERP API access |
| Hosting (Vercel + Railway) | $50–100 | App + workers + LangExtract sidecar |
| PostgreSQL (managed) | $25–50 | Neon, Supabase, or Railway |
| GSC API / Image generation | $0 | Free API; images are human-placed |
| Total | $485–960 |
13. Risk Matrix
| Risk | Prob. | Impact | Mitigation |
|---|---|---|---|
| LangExtract extraction quality inconsistent | Med | Med | Multi-pass extraction, high-quality few-shot examples (versioned in git, tested via regression suite), validation checks |
| Hierarchical summaries drift from source | Med | Med | Summaries regenerated daily from fresh extractions; timestamped and versioned |
| Graph relationship staleness | Med | Low | Weekly re-validation; confidence scores decay over time; stale edges flagged |
| Python–TypeScript bridge failure | Low | High | Health check endpoint, circuit breaker pattern, auto-restart, fallback to queued retry |
| Gemini model version drift | Med | Med | Pin model version, 10-document regression suite, 15% extraction change threshold blocks deployment |
| LLM output quality variance | High | Med | Multi-agent review pipeline + human gate + programmatic SEO checks |
| Google targeting AI content | Med | High | 80/20 human-AI method ensures genuine Experience + Expertise in every piece |
| Hallucination in published content | Med | High | Fact-check via extracted claims + human review + LangExtract source grounding |
| Content cannibalization at scale | Med | Med | Graph cannibalizes edges + Strategy Agent checks before planning |
14. Future: Filesystem-as-Context
The current hierarchical summary approach works well but has a ceiling: summaries are pre-generated snapshots. As the content library scales to thousands of pages, keeping summaries fresh becomes continuous compute cost, and pre-computing what context agents need is inherently wasteful.
The filesystem-as-context pattern (inspired by Andrej Karpathy's context engineering framework and Anthropic's Skills system) offers a superior approach at scale: instead of pre-loading context, structure all system knowledge as a navigable filesystem. Agents use ls, grep, glob, and file reading to pull exactly the context they need for the current task.
Context efficiency gain: An agent navigating the filesystem builds ~3,500 tokens of precisely relevant context for a planning task, compared to ~14,000 tokens with the hierarchical summary approach — because the agent decides what to load based on the actual task.
Implementation effort: 5–7 weeks on top of the base system: filesystem generation pipeline (2–3 weeks), sandbox environment per agent session (1–2 weeks), filesystem-aware agent prompts (1 week), and a hybrid SQL + filesystem approach for real-time data.
Recommended path: Build the base system using hierarchical summaries + graph first. Validate at current scale. Implement the filesystem layer when agent context quality becomes a bottleneck (likely at 500+ pages, 10+ competitors, 50+ pieces/month).
This is a living specification. Addendums #1–3 provide full implementation detail for each layer.