ContentEngine — Executive System Overview

Author: Alton Wells Date: March 2026 Status: Final Architecture Specification Consolidates: Technical Specification v3 + Addendums #1, #2, #3

Executive Summary
Technology Stack
Data Architecture
Master Workflow
Agent System
Strategy Layer
Content Layer
Production Layer
Content Type Registry
Refresh Queue System
Observability & Audit Trail
Cost Model
Risk Matrix
Future: Filesystem-as-Context

1. Executive Summary

ContentEngine is an autonomous AI content production system that replaces manual content marketing workflows with a three-layer agentic pipeline. The system is built on Mastra (TypeScript agent framework), LangExtract (structured document extraction), and Firecrawl (web crawling and sitemap intelligence). It manages the full content lifecycle from competitive intelligence through publication and performance monitoring.

The system is designed for Consul, a B2B SaaS AI executive assistant targeting CEOs and founders at a $200/month price point. Content must build trust with sophisticated, high-ticket decision-makers, making E-E-A-T signals, programmatic SEO validation, and editorial quality non-negotiable.

Core Architectural Principles

Humans set strategy and approve output. AI executes everything in between. Three human gates govern strategic decisions, editorial review, and final publication.
No vectors. No embeddings. Structured extractions, hierarchical summaries, and an explicit content relationship graph replace RAG. Stanford's 2025 research shows embedding precision collapses 87% beyond 50K documents. This approach scales without dimensional decay.
Programmatic SEO validation is non-negotiable. Every published piece must pass a 10/10 deterministic SEO check across 10 blocking checks, 6 warning checks, and 6 informational metrics.
Context is navigated, not stuffed. Agents traverse a four-level hierarchy (Domain → Cluster → Page → Entity) loading only what they need. Typical planning context: ~14,000 tokens instead of millions.
Everything is traceable. Every extraction maps to its source location. Every graph edge has provenance. Every agent decision is recorded in a structured audit trail with token usage, tool call sequences, and cost tracking.

2. Technology Stack

2.1 Core Framework

Component	Technology	Role
Mastra	`@mastra/core` (TypeScript)	Agent definitions, workflow orchestration, tool system, suspend/resume for human gates, Hono server generation
Vercel AI SDK	Foundation layer under Mastra	Unified model routing, streaming, structured output, tool calling protocol
Zod	Schema validation	Input/output schemas for every agent, tool, and workflow step with compile-time type safety

2.2 Extraction & Intelligence

Component	Technology	Role
LangExtract	Python library (FastAPI sidecar)	Structured extraction from unstructured text. Source-grounded entities. Multi-pass extraction for high recall.
Firecrawl	Web crawling API/SDK	Competitor sitemap discovery, page crawling, content extraction. Handles JS-rendered pages and rate limiting.
Gemini 2.5 Flash	LLM ($0.15/1M tokens)	Extraction model. Fast, cheap, high quality for structured extraction tasks.

2.3 LLM Providers

Claude Sonnet 4 (anthropic/claude-sonnet-4-20250514) powers all nine Mastra agents across strategy, content, and production layers. Temperature is configured per agent: 0.8 for the Writer (creative generation), 0.4 for the Editor (precision), 0.1–0.3 for production agents (mechanical execution). Gemini 2.5 Flash handles all LangExtract extraction pipelines and hierarchical summary generation.

2.4 Data & Hosting

PostgreSQL (no pgvector) serves as the primary database for all structured data, extraction entities, graph adjacency tables, summaries, and content plans using JSONB for flexible extraction attributes. Drizzle ORM provides type-safe database access. The application layer uses Next.js 15+ for the UI (calendar, editor, dashboards) hosted on Vercel, with agent workers and the LangExtract sidecar running on Railway. Trigger.dev handles durable job scheduling for crawls, extractions, summary regeneration, and post-publish monitoring.

2.5 System Architecture

2.6 External APIs

API	Purpose
Semrush / Ahrefs	Keyword data, search volume, difficulty, SERP features, competitor rankings
Google Search Console	Impressions, clicks, CTR, average position per query (OAuth2)
Google Indexing / IndexNow	Fast crawl requests for newly published content
CMS Adapter	WordPress REST / Sanity / Contentful via adapter pattern

3. Data Architecture

ContentEngine replaces conventional RAG/vector embeddings with a three-layer memory model. All data is stored in PostgreSQL as structured, queryable records.

3.1 Three-Layer Memory Model

3.2 Layer 1 — Structured Extraction (LangExtract)

Every document entering the system — competitor pages, our content, SERPs, AI Overviews, brand voice samples — is processed through LangExtract extraction pipelines. Raw text becomes structured, source-grounded entities stored in typed Postgres tables. Agents query structured data, not fuzzy similarity scores.

Six extraction classes are defined, each with a fixed schema, dedicated prompt description, and a minimum of three few-shot examples:

Class	Entity Types	Trigger
competitor_page	topic, claim, keyword_signal, content_structure, cta, entity_reference	On discovery or change (weekly scan)
our_page	Same as competitor + internal_link	On publish or bootstrap
serp	serp_result, serp_feature, paa_question	Weekly per tracked keyword
ai_overview	aio_claim, aio_structure	When AIO detected in SERP
brand_voice	tone_marker, vocabulary_preference, sentence_pattern	On strategy create/update
keyword_data	Deferred for MVP	Direct from Semrush API

Integration decision: LangExtract runs as a Python FastAPI sidecar (not the unofficial Node SDK) because critical features — multi-pass extraction, cross-chunk coreference resolution, and controlled generation via Gemini schema constraints — are Python-only. A circuit breaker pattern (3 consecutive failures opens the circuit) mitigates the Python–TypeScript bridge as a single point of failure.

3.3 Layer 2 — Hierarchical Document Summaries

A four-level summary tree enables agents to navigate from broad domain context to specific entity-level detail, loading only relevant branches:

Agents start at Level 0 and drill down only into relevant branches. Summaries are regenerated from fresh extractions and timestamped for versioning.

3.4 Layer 3 — Content Relationship Graph

An adjacency table in Postgres with typed edges connects content entities. This replaces vector similarity for all "find related content" operations. Each edge carries a confidence score, provenance (which agent or system created it), and a last-validated timestamp.

Edge Type	Source → Target	Meaning
`covers_topic`	page → topic	Page covers this topic (with depth)
`targets_keyword`	page → keyword	Page targets this keyword (with rank)
`competes_with`	our_page → competitor_page	Pages compete for same keyword
`outperforms`	competitor_page → our_page	Competitor ranks higher for shared keyword
`gap`	topic → (null)	Topic with competitor coverage but zero ours
`cannibalizes`	our_page → our_page	Both target same primary keyword
`links_to`	our_page → our_page	Actual internal link exists
`should_link_to`	our_page → our_page	Agent-recommended linking opportunity
`child_of`	topic → topic_cluster	Hierarchical topic relationship

4. Master Workflow

ContentEngine operates across three workflow layers with human control at each transition. Workflows own the sequence ("what happens next"); agents own the execution ("how do I do this step"). This is deterministic orchestration with agentic execution — no supervisor/sub-agent patterns.

4.1 End-to-End Flow

4.2 Human Gates

Each human touchpoint is a Mastra workflow suspension. The workflow suspends and provides a structured payload describing the review task. The Next.js app renders the appropriate UI and calls the resume endpoint when the human completes their action.

Gate	Human Actions	Est. Time
Gate 1: Calendar Review	Review AI-generated content plan, approve/reject/edit items, add manual items, set schedule	15–30 min per cycle
Gate 2: Brief Approval	Review outline, confirm direction, adjust scope, approve or request changes	5–10 min per brief
Gate 3: Draft Review	Deep edit, add personal experience and insights, place images, final voice check, approve or reject	15–30 min per piece

Design principle: Image placement is manual at Gate 3. The Writer Agent outputs [IMAGE: description] markers as placement suggestions. Image selection requires brand aesthetic judgment, rights verification, and contextual sensitivity that current AI image generation does not handle reliably at production quality.

5. Agent System

5.1 Agent Configuration

Every agent's model, temperature, and step budget are configurable at runtime via a database settings table. No model IDs are hardcoded. This allows model swaps without redeployment. Changes take effect on the next agent invocation.

Agent	Layer	Model	Max Steps	Temp
`competitive-intelligence`	Strategy	Claude Sonnet 4	12	0.5
`search-landscape`	Strategy	Claude Sonnet 4	10	0.5
`content-strategy`	Strategy	Claude Sonnet 4	20	0.7
`content-brief`	Content	Claude Sonnet 4	15	0.6
`writer`	Content	Claude Sonnet 4	12	0.8
`editor`	Content	Claude Sonnet 4	10	0.4
`final-cleanup`	Production	Claude Sonnet 4	6	0.2
`publishing`	Production	Claude Sonnet 4	15	0.1
`seo-autofix`	Production	Claude Sonnet 4	8	0.3

Temperature rationale: Strategy agents are moderate (creative planning grounded in data). The Writer is highest (creative generation). The Editor is low (precision). Production agents are near-zero (mechanical execution).

5.2 Tool Inventory (26 Tools)

Tools are grouped into four categories. Each agent receives only the tools it needs — a smaller tool surface means fewer irrelevant calls, lower token usage, and easier debugging.

Category	Count	Tools
Shared	8	readDomainSummary, readClusterSummaries, readPageSummaries, queryExtractions, traverseContentGraph, webSearch, queryContentPlan, readContentStrategy
Strategy	8	queryCompetitorChanges, queryKeywordPerformance, querySerpExtractions, queryAiOverviewTracking, semrushKeywordResearch, gscPerformanceQuery, addContentPlanItem, updateContentPlanItem
Content	3	loadApprovedBrief, queryOurPages, verifyInternalLinks
Production	7	formatForCms, uploadToCms, setMetadata, pingIndexingApi, triggerPostPublishPipeline, schedulePostPublishMonitoring, logToAuditTrail

5.3 Tool-to-Agent Assignment

The Content Strategy Agent — the most complex decision-maker with 10 tools — demonstrates how agents navigate the hierarchy to build precisely relevant context:

Read active strategy directives (~500 tokens)
Read combined domain summary for the big picture (~500 tokens)
Ingest Competitive Intelligence report from workflow state (~2,000 tokens)
Ingest Search Landscape report from workflow state (~2,000 tokens)
Read per-pillar cluster summaries (~3,000 tokens)
Check current calendar to avoid duplication (~1,000 tokens)
Traverse graph for gaps, cannibalization, and outperformance (~1,500 tokens)
Drill into specific page summaries for top candidates (~2,000 tokens)

Total: ~14,000 tokens of precisely relevant context, versus the impossibility of stuffing 847+ full pages into a context window.

6. Strategy Layer

6.1 Competitive Intelligence Agent

Continuously analyzes the competitor database — structured LangExtract data, not raw HTML — and produces actionable competitive insights. Runs weekly for full analysis and daily for a lightweight change digest.

Outputs: Competitor moves (with relevance scoring and source extraction IDs), content gaps (with estimated impact), positioning insights. Every finding includes provenance for traceability.

6.2 Search Landscape Agent

Monitors keyword performance, SERP composition, AI Overview appearances, and search trends using structured SERP data. Runs daily for ranking changes and AI Overview monitoring, weekly for full landscape analysis.

Outputs: Ranking changes with trend classification, AI Overview citation alerts, emerging keyword opportunities, declining content flags, SERP feature opportunities.

6.3 Content Strategy Agent

The brain of the system. Synthesizes both prior agent outputs with content inventory, strategy directives, and graph relationships to produce a prioritized, scheduled content plan.

Scoring model: strategic_alignment × search_opportunity × competitive_urgency × gap_severity. The agent checks the graph for cannibalization before recommending new content, respects human-added calendar items as fixed constraints, and suggests scheduling based on capacity.

Outputs: Prioritized plan items with title, target keyword, content type, rationale, competitive context, schedule date, priority (1–3), estimated impact, internal link targets, and graph evidence. Plan items are written to the database with source: "ai_generated" and the workflow suspends for human calendar review.

7. Content Layer

7.1 Content Brief Agent

For each approved plan item, generates a detailed brief including: complete H2/H3 outline with keyword mapping per section, competitor differentiation strategy (using structured extraction data), internal linking targets from the graph, external resource recommendations, and brand voice requirements. The brief is saved and the workflow suspends for human approval (Gate 2).

7.2 Writer Agent

Receives the approved brief, brand voice extractions, and strategy context as dynamic instructions. Produces a complete first draft that follows the outline exactly, hits word count targets (±10%), integrates keywords naturally, includes all specified links, and marks image placement opportunities as [IMAGE: description] for human insertion.

Deliberate constraint: The Writer has only 4 tools (webSearch, queryExtractions, traverseContentGraph, and the brief context). Most of its context comes from the pre-assembled brief, not from live queries.

7.3 Editor Agent

Reviews the draft against seven dimensions: language correctness, verbal consistency (terminology, voice), brand voice adherence (compared against extracted patterns), factual grounding (claims verified against brief sources and web), structural quality, link integrity (all internal links verified against published pages), and keyword optimization.

Outputs: Overall pass/needs_revision assessment, specific edits with location, type, severity (critical/suggested), and fix suggestions. Voice consistency score (0–100), readability score, and optionally a revised draft.

7.4 Revision Loop

If the Editor returns "needs_revision" with critical edits, the draft returns to the Writer with the edit list as additional context. Maximum 2 revision cycles. After 2 cycles, the draft proceeds to human review regardless — humans catch what agents miss. This prevents infinite loops while maintaining quality.

8. Production Layer

8.1 Human Draft Review (Gate 3)

The most important step in the entire system. The workflow suspends and presents the draft alongside the brief in a side-by-side editor interface. The human reviewer:

Reads and assesses the full draft against the brief
Adds personal experience and original insights (the irreplaceable 20%)
Places and curates images at [IMAGE:] markers
Edits for voice and brand consistency
Fact-checks statistics and claims
Approves or rejects (rejection sends back to Content Layer with notes)

8.2 Final Cleanup Agent

A lightweight technical-only pass after human edits. Checks markdown formatting validity, image alt text and dimensions, internal link resolution, heading hierarchy integrity, and consistent list formatting. Does not change tone, wording, or content.

8.3 SEO Validation Engine

Deterministic code, not an LLM. Operates as a CI/CD deployment gate with three severity tiers:

Tier	Behavior	Checks
BLOCKING (10)	Publication halted. Auto-fix attempted (max 3 cycles). If still failing, escalate to human.	Meta title, meta description, heading hierarchy, keyword presence, keyword density, internal linking, schema validity, URL/slug, date integrity, structural compliance
WARNING (6)	Logged and tracked. Does not block. Contributes to content health score.	Image density, external linking, readability score, content depth coverage, FAQ quality, mobile/performance
INFO (6)	Logged for analytics and trending only.	Word count delta, keyword density exact, schema richness, link density ratio, reading time, citation-ready block count

8.4 Publishing & Post-Publish Pipeline

The post-publish pipeline closes the data loop: extraction → summary regeneration (L2 → L1 → L0) → graph edge construction → bidirectional linking → monitoring. Content is not rolled back on pipeline failure — the content is live, and internal data syncs on the next scheduled job.

9. Content Type Registry

ContentEngine produces eight content types, each with a distinct structural template, schema mapping, internal linking profile, image density rule, and refresh cadence.

Type	Length	Primary Schema	Key Requirements
Blog Post	1,500–2,500 words	BlogPosting + FAQPage	Min 4 internal links, 3 images, FAQ required
Listicle	1,500–3,000 words	Article + ItemList	1 image per list item, numbered H2s required
Guide	3,000–5,000 words	Article + FAQPage	Min 8 internal links, 5 images, citation-ready definition block
How-To	2,000–4,000 words	HowTo + FAQPage	1 image per step, troubleshooting section, HowToStep schema
Comparison	2,000–4,000 words	Article + Product/ItemList	Head-to-head or roundup variants, comparison table required
Case Study	1,500–2,500 words	Article + Review	Customer quote required, min 2 quantified results
Pillar Page	2,500–4,000 words	CollectionPage	Min 15 internal links, updated when child content publishes
Glossary	500–1,200 words	DefinedTerm + FAQPage	Under 50-word definition block, related term cross-links

Each type carries a structural scaffold that the Writer Agent must follow exactly and the SEO Validation Engine checks at the deployment gate. Every page carries a @graph array of JSON-LD entities including Organization, WebSite, WebPage, the content-type-specific schema, author Person entity, and BreadcrumbList.

9.1 Schema Architecture

9.2 Author Entities & E-E-A-T Strategy

Three author entities are defined, each with a dedicated profile page at /authors/[slug]/:

Alton Wells — Primary human author with full E-E-A-T credentials
Stan — Secondary human author
Auctor — AI editorial assistant, presented transparently. Profile describes the human-AI editorial process: articles are researched and drafted using AI, then reviewed, enriched with original insights, and approved by a human author.

Author profile pages include headshot/avatar, bio, expertise tags linked to pillar pages, social links, recent article feed, and ProfilePage + Person schema markup.

10. Refresh Queue System

Every published page is re-evaluated on a maximum 90-day cycle. Re-evaluation does not mean automatic refresh — it means the system scores refresh urgency and only pages crossing a threshold enter the active queue.

10.1 Urgency Score (0–100)

Signal	Weight	Data Source	Scoring Logic
Position Decay	30%	GSC performance	(position drop / 10) × 100, capped at 100
Traffic Decline	25%	GSC performance	(% impression decline, 30d vs prior 30d) × 100
Content Age	20%	our_pages.last_updated	(days since update / 90) × 100
Competitive Displacement	15%	Graph `outperforms` edges	New/worsened outperforms edges × 25
Factual Staleness	10%	Extraction claim entities	(dated claims older than current year / total) × 100

10.2 Refresh Tiers

Refreshes are mixed into the content calendar at a 70/30 ratio (new content / refreshes). Light refreshes consume 0.25 capacity units, moderate 0.5, and heavy 1.0. As the content library grows, the ratio naturally shifts. If the refresh queue is empty, 100% of capacity goes to new content.

11. Observability & Audit Trail

Every agent invocation produces exactly one audit log row capturing workflow context, timing, token usage, cost estimates, the ordered sequence of tool invocations, and output summaries.

11.1 What Gets Logged

Workflow ID, run ID, step ID, and agent ID for full traceability
Model ID used (resolved from config at invocation time)
Start time, completion time, and duration in milliseconds
Steps used vs. max steps configured
Input tokens, output tokens, total tokens, and estimated cost (USD)
Ordered tool call sequence with input summaries, output summaries, per-call duration, and token usage
Status: success, failed, failed_max_steps, or timeout
Error message and type if applicable

11.2 Dashboard Views

Cost by period: daily/weekly/monthly spend breakdown by agent
Agent efficiency: average steps used, max-steps failures, step utilization ratio
Workflow run detail: full step-by-step timeline for any specific run
Slowest agents: average and max duration for performance optimization

11.3 Error Handling

maxSteps policy: When an agent exhausts its step budget, the workflow captures the partial output, marks the step as failed_max_steps, logs the full tool call history, sends a Slack notification, and halts the workflow run. This is not a silent failure.
Malformed output: Zod validation catches invalid structured output. One retry with a corrective note. If retry fails, step fails.
Tool failures: Agents see error messages and can retry or use alternative approaches. The LangExtract sidecar uses a circuit breaker (3 consecutive failures opens the circuit, 60-second recovery window).
Post-publish pipeline: Each job retries 3× with exponential backoff (10s, 30s, 90s). Failed jobs do not roll back the publication — the content is live, but internal data is temporarily out of sync until the next scheduled job catches up.

11.4 SEO Validation Tracking

Every validation run (pre-publish, refresh evaluation, scheduled weekly audit, manual) is stored with full check results. A weekly Trigger.dev job runs the complete validation suite against all published pages to catch degradation from external changes. Validation score decay feeds into the refresh urgency score.

12. Cost Model

Monthly estimates based on 50 published pieces, 10 competitors, and 100 tracked keywords:

Category	Monthly Estimate	Notes
Claude Sonnet 4 (all agents)	$200–400	~$4–8 per piece across strategy, writing, editing, briefs, cleanup
Gemini 2.5 Flash (extraction + summaries)	$50–100	Extraction itself is ~$1.62/mo; bulk is summary generation
Firecrawl	$40–80	Competitor sitemap crawling + page scraping
Semrush API	$119–229	Business plan for keyword/SERP API access
Hosting (Vercel + Railway)	$50–100	App + workers + LangExtract sidecar
PostgreSQL (managed)	$25–50	Neon, Supabase, or Railway
GSC API / Image generation	$0	Free API; images are human-placed
Total	$485–960

13. Risk Matrix

Risk	Prob.	Impact	Mitigation
LangExtract extraction quality inconsistent	Med	Med	Multi-pass extraction, high-quality few-shot examples (versioned in git, tested via regression suite), validation checks
Hierarchical summaries drift from source	Med	Med	Summaries regenerated daily from fresh extractions; timestamped and versioned
Graph relationship staleness	Med	Low	Weekly re-validation; confidence scores decay over time; stale edges flagged
Python–TypeScript bridge failure	Low	High	Health check endpoint, circuit breaker pattern, auto-restart, fallback to queued retry
Gemini model version drift	Med	Med	Pin model version, 10-document regression suite, 15% extraction change threshold blocks deployment
LLM output quality variance	High	Med	Multi-agent review pipeline + human gate + programmatic SEO checks
Google targeting AI content	Med	High	80/20 human-AI method ensures genuine Experience + Expertise in every piece
Hallucination in published content	Med	High	Fact-check via extracted claims + human review + LangExtract source grounding
Content cannibalization at scale	Med	Med	Graph `cannibalizes` edges + Strategy Agent checks before planning

14. Future: Filesystem-as-Context

The current hierarchical summary approach works well but has a ceiling: summaries are pre-generated snapshots. As the content library scales to thousands of pages, keeping summaries fresh becomes continuous compute cost, and pre-computing what context agents need is inherently wasteful.

The filesystem-as-context pattern (inspired by Andrej Karpathy's context engineering framework and Anthropic's Skills system) offers a superior approach at scale: instead of pre-loading context, structure all system knowledge as a navigable filesystem. Agents use ls, grep, glob, and file reading to pull exactly the context they need for the current task.

Context efficiency gain: An agent navigating the filesystem builds ~3,500 tokens of precisely relevant context for a planning task, compared to ~14,000 tokens with the hierarchical summary approach — because the agent decides what to load based on the actual task.

Implementation effort: 5–7 weeks on top of the base system: filesystem generation pipeline (2–3 weeks), sandbox environment per agent session (1–2 weeks), filesystem-aware agent prompts (1 week), and a hybrid SQL + filesystem approach for real-time data.

Recommended path: Build the base system using hierarchical summaries + graph first. Validate at current scale. Implement the filesystem layer when agent context quality becomes a bottleneck (likely at 500+ pages, 10+ competitors, 50+ pieces/month).

This is a living specification. Addendums #1–3 provide full implementation detail for each layer.