ContentEngine — Technical Specification v3

Author: Alton Wells
Date: March 2026
Status: Final Architecture Specification

Executive Summary

ContentEngine is an autonomous AI content production system built on Mastra (TypeScript agent framework), LangExtract (structured document extraction), and Firecrawl (web crawling/sitemap intelligence). It replaces the conventional RAG/vector embedding approach with a three-layer memory architecture: structured extraction via LangExtract, hierarchical document summaries, and an explicit content relationship graph — all stored in PostgreSQL.

The system operates across three workflow layers — Strategy, Content, and Production — with human control points at strategic decision-making (content calendar approval), editorial review (draft quality gate), and final publication (image placement, last-mile polish). AI handles everything between those gates: competitive intelligence gathering, search landscape analysis, content planning, brief generation, writing, editing, SEO optimization, and publishing.

Core architectural principles:

Humans set strategy and approve output. AI executes everything in between.
No vectors. No embeddings. Structured extractions + hierarchical summaries + graph relationships replace RAG. Stanford's 2025 research shows embedding precision collapses 87% beyond 50K documents. Our approach scales without dimensional decay.
Programmatic SEO validation is non-negotiable. Every published piece must pass a 10/10 deterministic SEO check. No exceptions.
Context is navigated, not stuffed. Agents traverse a hierarchy (Domain → Cluster → Page → Entity) to load only what they need. Total context per planning session: ~13,000 tokens instead of millions.
Everything is traceable. Every extraction maps to its source location. Every graph edge has provenance. Every agent decision can be audited.

Frameworks, Libraries & Integrations
Data Architecture
Master Workflow
Strategy Layer — Detailed Specification
Content Layer — Detailed Specification
Production Layer — Detailed Specification
Content Calendar & Application Interface
Programmatic SEO Validation Engine
Cost Model
Risk Matrix
Future: Filesystem-as-Context Architecture

1. Frameworks, Libraries & Integrations

Core Framework

Component	Technology	Role
Mastra	`@mastra/core` (TypeScript)	Agent definitions, workflow orchestration, tool system, suspend/resume, Hono server generation, Mastra Studio debugging
Vercel AI SDK	Foundation layer under Mastra	Unified model routing (`anthropic/claude-sonnet-4-20250514`), streaming, structured output, tool calling protocol
Zod	Schema validation throughout	Input/output schemas for every agent, tool, and workflow step. Compile-time type safety.

Extraction & Intelligence

Component	Technology	Role
LangExtract	Python library (Google, Apache 2.0)	Structured extraction from unstructured text. Maps every entity to exact source location. Multi-pass extraction for high recall on long documents. Runs as FastAPI sidecar.
Firecrawl	Web crawling API/SDK	Competitor sitemap discovery, page crawling, content extraction. Handles JavaScript-rendered pages, rate limiting, and anti-bot bypassing. Replaces custom sitemap crawlers.
Gemini 2.5 Flash	LLM (via LangExtract)	Extraction model. Fast, cheap ($0.15/1M tokens), high quality for structured extraction tasks.

LLM Providers

Model	Use Case	Why
Claude Sonnet 4 (`anthropic/claude-sonnet-4-20250514`)	All Mastra agents (strategy, writing, editing, briefs)	Best quality-to-cost ratio for complex reasoning, long-form writing, and multi-step planning
Gemini 2.5 Flash	LangExtract extraction pipelines, hierarchical summary generation	Fast + cheap for structured extraction and summarization. LangExtract's recommended default

Data & Storage

Component	Technology	Role
PostgreSQL	Primary database (no pgvector)	All structured data, extraction entities, graph adjacency table, summaries, content plans. JSONB for flexible extraction attributes.
Drizzle ORM	TypeScript ORM	Type-safe database access from Mastra tools and API routes

Application & Deployment

Component	Technology	Role
Next.js 15+	Web framework	App UI (calendar, editor, dashboards), API routes, SSR
Trigger.dev	Durable job scheduling	Scheduled crawls, extraction jobs, summary regeneration, post-publish monitoring. Retry on failure.
Slack API + Email	Notifications	Human review alerts, competitor change digests, ranking alerts
Vercel	App hosting	Next.js app, serverless API routes
Railway	Worker hosting	Mastra agent workers, LangExtract FastAPI service, Trigger.dev jobs

External APIs

API	Purpose	Integration Method
Semrush or Ahrefs	Keyword data, search volume, difficulty, SERP features, competitor rankings	REST API via Mastra tool
Google Search Console	Our impressions, clicks, CTR, average position per query	OAuth2 via Mastra tool
Firecrawl	Competitor sitemap crawling, page content extraction	SDK/API via Mastra tool + scheduled jobs
Google Indexing API / IndexNow	Fast crawl requests for newly published content	REST API via Publishing Agent
CMS (WordPress REST / Sanity / Contentful)	Content publication endpoint	Adapter pattern — Mastra tool per CMS

Integration Architecture

┌─────────────────────────────────────────────────────────┐
│                    Next.js Application                   │
│            (Calendar, Editor, Dashboards)                │
└────────────────────────┬────────────────────────────────┘
                         │ API Routes
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   Mastra Agent Server                    │
│         (Hono HTTP, auto-generated endpoints)            │
│                                                         │
│  Agents ←→ Tools ←→ PostgreSQL                          │
│                  ←→ LangExtract Service (HTTP)           │
│                  ←→ Firecrawl API                        │
│                  ←→ Semrush/Ahrefs API                   │
│                  ←→ Google Search Console API             │
│                  ←→ CMS API                              │
└────────────────────────┬────────────────────────────────┘
                         │
          ┌──────────────┼──────────────┐
          ▼              ▼              ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  PostgreSQL  │ │  LangExtract │ │  Trigger.dev │
│  (all data)  │ │  (FastAPI)   │ │  (scheduled  │
│              │ │  Python 3.11 │ │   jobs)      │
└──────────────┘ └──────────────┘ └──────────────┘

2. Data Architecture

2.1 Memory Model: Three Layers Replacing RAG

Layer 1 — LangExtract Structured Extraction: Every document entering the system (competitor pages, our content, SERPs, brand voice samples) is processed through LangExtract extraction pipelines. Raw text becomes structured, source-grounded entities in typed Postgres tables. Agents query structured data, not fuzzy similarity scores.

Layer 2 — Hierarchical Document Summaries: A four-level summary tree where agents navigate from broad (domain-level) to specific (entity-level) context. Each level is an LLM-generated summary of the level below it. Agents start at Level 0 and drill down only into relevant branches.

Level 0: Domain Summary (~500 tokens)
  └── Level 1: Cluster Summaries (~300 tokens each, one per topic pillar)
       └── Level 2: Page Summaries (~150 tokens each, one per page)
            └── Level 3: LangExtract Entities (structured rows per page)

Layer 3 — Content Relationship Graph: An adjacency table in Postgres with typed edges connecting content entities. Replaces vector similarity for all "find related content" operations.

CREATE TABLE content_relationships (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_type VARCHAR(50) NOT NULL,   -- 'our_page', 'competitor_page', 'keyword', 'topic'
    source_id UUID NOT NULL,
    relationship_type VARCHAR(50) NOT NULL,
    target_type VARCHAR(50) NOT NULL,
    target_id UUID NOT NULL,
    metadata JSONB DEFAULT '{}',
    confidence FLOAT DEFAULT 1.0,
    created_by VARCHAR(50) NOT NULL,    -- 'system', 'agent:competitive-intel', 'human'
    created_at TIMESTAMP DEFAULT now(),
    last_validated TIMESTAMP DEFAULT now()
);

CREATE INDEX idx_rel_source ON content_relationships(source_type, source_id);
CREATE INDEX idx_rel_target ON content_relationships(target_type, target_id);
CREATE INDEX idx_rel_type ON content_relationships(relationship_type);

Relationship types:

Edge Type	Source → Target	What It Means
`covers_topic`	page → topic	This page covers this topic (with depth: shallow/deep)
`targets_keyword`	page → keyword	This page targets this keyword (with current_rank)
`competes_with`	our_page → competitor_page	These pages compete for the same keyword/topic
`same_topic_as`	competitor_page → (null or our_page)	Competitor covers topic we may or may not have
`links_to`	our_page → our_page	Actual internal link exists
`should_link_to`	our_page → our_page	Agent-recommended linking opportunity
`cannibalizes`	our_page → our_page	Both target same primary keyword
`outperforms`	competitor_page → our_page	Competitor ranks higher for shared keyword
`gap`	topic → (null)	Topic with competitor coverage but zero ours
`child_of`	topic → topic_cluster	Hierarchical topic relationship
`refreshes`	our_page → our_page	Newer version should replace older

2.2 Database Schema

Competitor Intelligence:

Table	Key Fields
`competitors`	id, name, domain, industry_vertical, notes, created_at
`competitor_pages`	id, competitor_id, url, title, meta_description, h1, word_count, published_at, last_modified, content_hash, raw_text, last_crawled_at
`competitor_extractions`	id, page_id, extraction_class, extraction_text, attributes (JSONB), source_location, extraction_run_id, created_at
`competitor_changes`	id, page_id, change_type (new/updated/removed), detected_at, diff_summary

Search & Keywords:

Table	Key Fields
`keywords`	id, keyword, search_volume, difficulty, cpc, intent, cluster_id, last_refreshed
`keyword_clusters`	id, name, primary_keyword_id, topic, priority
`serp_snapshots`	id, keyword_id, snapshot_date, raw_data
`serp_extractions`	id, snapshot_id, extraction_class, extraction_text, attributes (JSONB)
`ai_overview_tracking`	id, keyword_id, detected_at, our_site_cited, cited_sources (JSONB), summary_text

Our Content:

Table	Key Fields
`our_pages`	id, url, title, slug, content_type, status, published_at, last_updated, word_count, content_body, raw_text
`our_page_extractions`	id, page_id, extraction_class, extraction_text, attributes (JSONB), source_location
`our_page_seo`	id, page_id, meta_title, meta_description, h1, h2s (JSONB), canonical_url, schema_markup, internal_links_out (JSONB), internal_links_in (JSONB), seo_score
`our_page_performance`	id, page_id, date, impressions, clicks, ctr, avg_position, sessions, bounce_rate

Content Calendar & Strategy:

Table	Key Fields
`content_strategy`	id, name, description, target_audience, brand_voice_guidelines, content_pillars (JSONB), priorities (JSONB), active
`content_plan_items`	id, strategy_id, title, target_keyword_id, content_type (enum: blog_post, guide, landing_page, comparison, case_study, product_page), status (enum: planned, brief_pending, brief_approved, writing, editing, review, revision, published), scheduled_date, priority (1-3), notes, source (enum: ai_generated, human_added), created_at, updated_at
`content_briefs`	id, plan_item_id, outline (JSONB), target_word_count, target_keywords (JSONB), competitor_references (JSONB), internal_link_targets (JSONB), research_notes, approved, approved_by, approved_at
`content_drafts`	id, plan_item_id, version, content_body, seo_score, editor_notes, status (enum: draft, edited, review, approved, rejected, published)
`brand_voice_extractions`	id, strategy_id, extraction_class, extraction_text, attributes (JSONB)

Hierarchical Summaries:

Table	Key Fields
`domain_summaries`	id, scope (ours/competitor/combined), summary_text, metrics_snapshot (JSONB), generated_at
`cluster_summaries`	id, cluster_id, scope, summary_text, page_count, performance (JSONB), gap_analysis, competitor_comparison, generated_at
`page_summaries`	id, page_id, page_type (ours/competitor), summary_text, topics (JSONB), keywords (JSONB), quality_score, generated_at

3. Master Workflow

Human touchpoints (red nodes):

Gate	What Human Does	Est. Time
Calendar Review	Review AI-generated plan, approve/reject/edit items, add own items, set schedule	15–30 min per cycle
Brief Approval	Review outline, confirm direction, adjust scope	5–10 min per brief
Draft Review	Deep edit, add personal experience/insights, place images, final voice check	15–30 min per piece

4. Strategy Layer

4.1 Competitive Intelligence Agent

Purpose: Continuously analyzes the competitor database (structured LangExtract data, not raw HTML) and produces actionable competitive insights.

Agent Definition:

const competitiveIntelAgent = new Agent({
  id: "competitive-intelligence",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: `You are a competitive intelligence analyst for a content marketing team.
    You analyze structured competitor data to identify strategic threats, opportunities,
    and content gaps. Be specific — cite competitor names, URLs, and extracted data points.
    Prioritize findings by business impact.`,
  tools: {
    readDomainSummary,
    readClusterSummaries,
    queryCompetitorChanges,
    queryCompetitorExtractions,
    traverseCompetitionGraph,
    webSearch,
  },
  maxSteps: 12,
});

Tools:

Tool	Input Schema	What It Does
`readDomainSummary`	`{ scope: "ours" \| "competitor" \| "combined" }`	Returns the Level 0 domain summary (~500 tokens of high-level competitive landscape)
`readClusterSummaries`	`{ clusterId?: string, scope?: string }`	Returns Level 1 cluster summaries, optionally filtered. Each ~300 tokens with competitor comparison data
`queryCompetitorChanges`	`{ competitorId?: string, since: Date, changeType?: string }`	Queries `competitor_changes` table for recent content moves. Filterable by competitor and change type
`queryCompetitorExtractions`	`{ extractionClass: string, keyword?: string, competitorId?: string, since?: Date }`	Searches `competitor_extractions` table by entity class, keyword match, competitor, and date range
`traverseCompetitionGraph`	`{ startNodeType: string, startNodeId: string, edgeTypes: string[], maxDepth: number }`	Walks the `content_relationships` graph following specified edge types. Returns connected nodes with relationship metadata
`webSearch`	`{ query: string }`	Live web search for validation and fresh intelligence

Output Schema:

const CompetitiveIntelOutput = z.object({
  competitorMoves: z.array(z.object({
    competitor: z.string(),
    action: z.enum(["new_content", "content_update", "new_topic", "new_feature"]),
    details: z.string(),
    relevanceToUs: z.enum(["high", "medium", "low"]),
    suggestedResponse: z.string(),
    sourceExtractionIds: z.array(z.string()),
  })),
  contentGaps: z.array(z.object({
    topic: z.string(),
    competitorsCovering: z.array(z.string()),
    ourCoverage: z.enum(["none", "weak", "adequate"]),
    opportunity: z.string(),
    estimatedImpact: z.enum(["high", "medium", "low"]),
    graphEdgeId: z.string(),
  })),
  positioningInsights: z.string(),
  generatedAt: z.date(),
});

Schedule: Weekly full analysis. Daily change digest (lightweight — only queryCompetitorChanges + readDomainSummary).

4.2 Search Landscape Agent

Purpose: Monitors keyword performance, SERP composition, AI Overview appearances, and search trends using structured SERP data (LangExtract-processed, not raw HTML parsing).

Agent Definition:

const searchLandscapeAgent = new Agent({
  id: "search-landscape",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: `You are a search landscape analyst. You monitor keyword rankings,
    SERP features, AI Overview appearances, and search trends for our content.
    Identify ranking wins, losses, emerging opportunities, and threats from AI search.
    Always include specific keywords, positions, and URLs in your analysis.`,
  tools: {
    queryKeywordPerformance,
    querySerpExtractions,
    queryAiOverviewTracking,
    semrushKeywordResearch,
    gscPerformanceQuery,
  },
  maxSteps: 10,
});

Tools:

Tool	Input Schema	What It Does
`queryKeywordPerformance`	`{ keywordId?: string, minPositionChange?: number, dateRange: DateRange }`	Our ranking data from `our_page_keywords` with change deltas
`querySerpExtractions`	`{ keywordId: string, extractionClass: string }`	Structured SERP features from `serp_extractions` (featured snippets, PAA, AI Overviews, etc.)
`queryAiOverviewTracking`	`{ keywordId?: string, ourSiteCited?: boolean, since?: Date }`	AI Overview presence and citation status from `ai_overview_tracking`
`semrushKeywordResearch`	`{ keywords: string[], market: string }`	Fresh keyword data from Semrush API (volume, difficulty, CPC, trends)
`gscPerformanceQuery`	`{ urls?: string[], queries?: string[], dateRange: DateRange }`	Real-time data from Google Search Console

Output Schema:

const SearchLandscapeOutput = z.object({
  rankingChanges: z.array(z.object({
    keyword: z.string(),
    url: z.string(),
    previousPosition: z.number(),
    currentPosition: z.number(),
    trend: z.enum(["rising", "falling", "stable"]),
  })),
  aiOverviewAlerts: z.array(z.object({
    keyword: z.string(),
    ourSiteCited: z.boolean(),
    topCitedSources: z.array(z.string()),
    recommendation: z.string(),
  })),
  emergingKeywords: z.array(z.object({
    keyword: z.string(),
    volume: z.number(),
    difficulty: z.number(),
    relevance: z.string(),
    opportunity: z.string(),
  })),
  decliningContent: z.array(z.object({
    url: z.string(),
    keyword: z.string(),
    positionDrop: z.number(),
    suggestedAction: z.string(),
  })),
  serpFeatureOpportunities: z.array(z.object({
    keyword: z.string(),
    feature: z.string(),
    currentHolder: z.string(),
    ourEligibility: z.string(),
  })),
  generatedAt: z.date(),
});

Schedule: Daily for ranking changes and AI Overview monitoring. Weekly for full landscape analysis.

4.3 Content Strategy Agent (The Planner)

Purpose: The brain. Synthesizes Competitive Intelligence output, Search Landscape output, our content inventory (via hierarchical summaries), our strategy directives (human-set), and graph relationships to produce a prioritized, scheduled content plan.

Agent Definition:

const contentStrategyAgent = new Agent({
  id: "content-strategy",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: async ({ threadId }) => {
    const strategy = await db.getActiveStrategy();
    return `You are the content strategy director. Your job is to create a prioritized
      content plan that maximizes organic search impact.

      CURRENT STRATEGY:
      Pillars: ${strategy.contentPillars.join(", ")}
      Priorities: ${strategy.priorities}
      Target Audience: ${strategy.targetAudience}
      
      RULES:
      - Every plan item MUST have a content_type (blog_post, guide, comparison, etc.)
      - Every plan item MUST target a specific keyword with volume + difficulty data
      - Score candidates by: strategic_alignment × search_opportunity × competitive_urgency × gap_severity
      - Check graph for cannibalization before recommending new content
      - Suggest scheduling dates based on priority and current calendar capacity
      - Human-added calendar items are FIXED CONSTRAINTS — plan around them`;
  },
  tools: {
    readContentStrategy,
    readDomainSummary,
    readClusterSummaries,
    readPageSummaries,
    queryExtractions,
    queryContentPlan,
    traverseContentGraph,
    webSearch,
    addContentPlanItem,
    updateContentPlanItem,
  },
  maxSteps: 20,
});

How the agent navigates context (the hierarchy in action):

Step 1: readContentStrategy()
        → Human-set pillars, priorities, brand voice (~500 tokens)

Step 2: readDomainSummary({ scope: "combined" })
        → "We have 847 pages, competitors have X. Strongest/weakest areas." (~500 tokens)

Step 3: [Ingest Competitive Intelligence Agent output]
        → Competitor moves, content gaps, positioning (~2,000 tokens)

Step 4: [Ingest Search Landscape Agent output]
        → Ranking changes, AI Overview alerts, emerging keywords (~2,000 tokens)

Step 5: readClusterSummaries({ scope: "ours" })
        → Per-pillar coverage depth, performance, gaps (~3,000 tokens for 10 clusters)

Step 6: queryContentPlan({ status: ["planned", "in_progress"] })
        → What's already scheduled (avoid duplication) (~1,000 tokens)

Step 7: traverseContentGraph({ edgeTypes: ["gap", "cannibalizes", "outperforms"] })
        → Structural opportunities and conflicts (~1,500 tokens)

Step 8: For top candidates → readPageSummaries() for specific clusters
        → Drill into relevant pages only (~2,000 tokens)

Step 9: For specific competitive comparisons → queryExtractions()
        → Entity-level detail only when needed (~1,500 tokens)

TOTAL: ~14,000 tokens of precisely relevant context
(vs. impossible: stuffing 847 full pages into context)

Tools:

Tool	Input Schema	What It Does
`readContentStrategy`	`{}`	Returns active strategy directives
`readDomainSummary`	`{ scope }`	Level 0 summary
`readClusterSummaries`	`{ clusterId?, scope? }`	Level 1 summaries
`readPageSummaries`	`{ clusterId?, pageType?, limit? }`	Level 2 summaries, filtered
`queryExtractions`	`{ extractionClass, keyword?, pageType?, since? }`	Level 3 entity queries against any extraction table
`queryContentPlan`	`{ status?, contentType?, dateRange? }`	Current calendar items
`traverseContentGraph`	`{ startNodeType, startNodeId?, edgeTypes, maxDepth }`	Graph traversal
`webSearch`	`{ query }`	Topic viability research
`addContentPlanItem`	`ContentPlanItem` schema	Writes new item to calendar
`updateContentPlanItem`	`{ id, updates }`	Modifies existing item

Output Schema:

const ContentPlanOutput = z.object({
  planItems: z.array(z.object({
    title: z.string(),
    targetKeyword: z.string(),
    contentType: z.enum([
      "blog_post", "guide", "landing_page", 
      "comparison", "case_study", "product_page"
    ]),
    rationale: z.string(),
    competitiveContext: z.string(),
    suggestedScheduleDate: z.date(),
    priority: z.enum(["1", "2", "3"]),
    estimatedImpact: z.string(),
    researchLinks: z.array(z.string()),
    internalLinkTargets: z.array(z.string()),
    graphEvidence: z.array(z.string()),
  })),
  strategyNotes: z.string(),
  calendarSummary: z.string(),
});

→ Workflow SUSPENDS here. Plan items are written to content_plan_items with source: "ai_generated". Human reviews in the Calendar UI, approves/rejects/edits items, adds their own items. On approval, workflow resumes and approved items are queued for brief generation.

4.4 Content Brief Agent

Purpose: For each approved content_plan_item, generates a detailed content brief that the Writer Agent executes against.

Agent Definition:

const contentBriefAgent = new Agent({
  id: "content-brief",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: `You generate detailed content briefs for approved content plan items.
    Each brief must include a complete outline with H2/H3 structure, keyword mapping
    per section, competitor differentiation strategy, internal linking targets (from graph),
    external resource recommendations, and specific SEO requirements.
    
    Use competitor extraction data to identify what top-ranking pages cover and where
    they fall short. Your brief should give the Writer a clear path to creating content
    that outperforms the current top results.`,
  tools: {
    queryCompetitorExtractions,
    traverseCompetitionGraph,
    readPageSummaries,
    queryOurExtractions,
    queryBrandVoiceExtractions,
    webSearch,
  },
  maxSteps: 15,
});

Tools:

Tool	Input Schema	What It Does
`queryCompetitorExtractions`	`{ keyword, extractionClass, limit }`	Gets structured entities from top-ranking competitor pages for the target keyword
`traverseCompetitionGraph`	`{ startNodeType: "keyword", startNodeId, edgeTypes: ["competes_with", "same_topic_as"] }`	Finds direct competitor pages and coverage gaps
`readPageSummaries`	`{ pageType: "competitor", keyword }`	Level 2 summaries of relevant competitor pages
`queryOurExtractions`	`{ extractionClass: "topic", keyword }`	What we already cover (avoid repetition)
`queryBrandVoiceExtractions`	`{ strategyId }`	Extracted tone markers, vocabulary preferences, sentence patterns
`webSearch`	`{ query }`	Find resources, data sources, expert references

Output Schema:

const ContentBriefOutput = z.object({
  title: z.string(),
  targetKeyword: z.string(),
  secondaryKeywords: z.array(z.string()),
  searchIntent: z.enum(["informational", "navigational", "transactional", "commercial"]),
  targetWordCount: z.number(),
  contentFormat: z.string(),
  outline: z.array(z.object({
    heading: z.string(),
    level: z.enum(["h2", "h3"]),
    keyPoints: z.array(z.string()),
    targetKeywords: z.array(z.string()),
    suggestedWordCount: z.number(),
    competitorGap: z.string(),
  })),
  competitorAnalysis: z.object({
    topPages: z.array(z.object({
      url: z.string(),
      strengths: z.array(z.string()),
      weaknesses: z.array(z.string()),
    })),
    differentiators: z.array(z.string()),
  }),
  internalLinkTargets: z.array(z.object({
    url: z.string(),
    anchorTextSuggestion: z.string(),
    contextNote: z.string(),
  })),
  externalResources: z.array(z.object({
    url: z.string(),
    description: z.string(),
    useCase: z.enum(["cite_as_source", "link_for_reader", "reference_for_accuracy"]),
  })),
  toneAndStyle: z.string(),
  seoRequirements: z.object({
    metaTitleGuideline: z.string(),
    metaDescriptionGuideline: z.string(),
    schemaType: z.string(),
    featuredSnippetTarget: z.boolean(),
  }),
});

→ Workflow SUSPENDS here. Brief written to content_briefs with approved: false. Human reviews in the app, approves or requests changes. On approval, workflow resumes and brief is passed to the Content Layer.

5. Content Layer

5.1 Writer Agent

Purpose: Receives an approved content brief and produces a complete first draft that follows the outline, hits word count targets, incorporates keywords naturally, includes internal and external links, and matches brand voice.

Agent Definition:

const writerAgent = new Agent({
  id: "writer",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: async ({ briefId }) => {
    const brief = await db.getBrief(briefId);
    const brandVoice = await db.getBrandVoiceExtractions(brief.strategyId);
    
    return `You are an expert content writer. Produce a complete, publication-ready
      draft following the brief below.

      WRITING RULES:
      - Follow the outline exactly. Hit the word count targets per section (±10%).
      - Integrate target keywords naturally — never stuff.
      - Primary keyword MUST appear in: first paragraph, at least one H2, and naturally throughout.
      - Include all specified internal links with contextual, varied anchor text.
      - Include external resource links where specified in the brief.
      - Mark image placement opportunities as [IMAGE: description of what should go here]
        — a human will place actual images later.
      - Write in markdown with proper heading hierarchy (H1 → H2 → H3, no skips).
      - Short paragraphs (2-4 sentences). Mix sentence lengths.

      BRAND VOICE:
      Tone markers: ${brandVoice.toneMarkers.join(", ")}
      Vocabulary preferences: ${brandVoice.vocabularyPreferences.join(", ")}
      Avoid: ${brandVoice.wordsToAvoid.join(", ")}
      
      CONTENT BRIEF:
      ${JSON.stringify(brief, null, 2)}`;
  },
  tools: {
    webSearch,
    queryOurExtractions,
    traverseInternalLinks,
  },
  maxSteps: 8,
});

Tools:

Tool	What It Does
`webSearch`	Real-time fact verification during writing
`queryOurExtractions`	Check consistency with existing content (structured queries)
`traverseInternalLinks`	Find additional linking opportunities via graph

Output: Full markdown content body with frontmatter, internal links, external links, and [IMAGE: ...] placement markers for human image insertion.

5.2 Editor Agent

Purpose: Reviews the Writer's draft for language correctness, verbal consistency, brand voice adherence, factual grounding, structural quality, link integrity, and keyword optimization. Produces specific edits and a revised draft.

Agent Definition:

const editorAgent = new Agent({
  id: "editor",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: async ({ briefId }) => {
    const brandVoice = await db.getBrandVoiceExtractions(briefId);
    
    return `You are a senior content editor. Review the draft against the content brief
      and brand voice standards. Your job is precision, not rewriting.

      CHECK EACH OF THESE:
      1. LANGUAGE: Grammar, spelling, punctuation, sentence structure errors
      2. VERBAL CONSISTENCY: Same term used throughout (don't switch "users"/"customers" randomly),
         consistent formatting, consistent active/passive voice
      3. BRAND VOICE: Compare against these extracted patterns:
         Tone: ${brandVoice.toneMarkers.join(", ")}
         Vocabulary: ${brandVoice.vocabularyPreferences.join(", ")}
         Flag any sections that drift from established voice.
      4. FACTUAL GROUNDING: Flag any claims, statistics, or attributions that aren't
         supported by the brief's source material or web-verifiable
      5. STRUCTURE: Heading hierarchy compliance, section length balance, transition quality
      6. LINKS: All internal links point to real pages? Anchor text natural and diversified?
      7. KEYWORDS: Primary keyword in title/H1/first paragraph/H2s? Density 0.5-2.5%?

      For each issue: specify location, type, severity (critical/suggested), and fix.
      If overall assessment is "needs_revision" with critical issues, provide revised content.`;
  },
  tools: {
    queryOurPages,
    queryBrandVoiceExtractions,
    webSearch,
  },
  maxSteps: 8,
});

Output Schema:

const EditorOutput = z.object({
  overallAssessment: z.enum(["pass", "needs_revision"]),
  revisionCount: z.number(),
  edits: z.array(z.object({
    location: z.string(),
    type: z.enum(["grammar", "voice", "factual", "structural", "keyword", "link"]),
    severity: z.enum(["critical", "suggested"]),
    original: z.string(),
    suggested: z.string(),
    rationale: z.string(),
  })),
  voiceConsistencyScore: z.number().min(0).max(100),
  readabilityScore: z.number(),
  revisedContent: z.string().optional(),
});

5.3 Content Layer Workflow

const contentLayerWorkflow = createWorkflow({
  id: "content-layer",
  inputSchema: z.object({ briefId: z.string() }),
  outputSchema: z.object({ draftId: z.string() }),
})
  .then(loadApprovedBriefStep)           // Load brief from DB
  .then(writerAgentStep)                 // Writer produces draft
  .then(editorAgentStep)                 // Editor reviews
  .branch({
    condition: ({ editorOutput }) =>
      editorOutput.overallAssessment === "needs_revision"
      && editorOutput.revisionCount < 2,
    trueStep: writerRevisionStep,        // Back to writer with edit context
    falseStep: finalizeDraftStep,
  })
  .then(saveDraftToDbStep)               // Persist to content_drafts
  .commit();

Revision loop: If the Editor returns needs_revision with critical edits, the draft goes back to the Writer with the edit list as additional context. Maximum 2 revision cycles. After 2 cycles, the draft proceeds to human review regardless (humans catch what agents miss).

6. Production Layer

6.1 Human Review & Editing

This is the most important step in the entire system. The workflow suspends via .waitForEvent("human-review-complete") and the human reviewer performs all of the following in the app's editor interface:

What the human does:

Action	Detail
Read & assess	Full draft review against the brief (shown side-by-side)
Add personal experience	Original insights, firsthand accounts, expert commentary — the irreplaceable 20%
Place images	Select/create images, position them in content, write or refine alt text. Images are human-curated, not AI-generated.
Edit for voice	Adjust tone, phrasing, personality to match brand
Fact-check	Verify statistics, claims, attributions against source material
Approve or reject	Approve sends to SEO validation. Reject sends back to Content Layer with notes.

Why image placement is manual: Image selection requires brand aesthetic judgment, rights verification, and contextual sensitivity that current AI image generation doesn't handle reliably at production quality. The [IMAGE: ...] markers from the Writer Agent serve as placement suggestions — the human decides what actually goes there.

6.2 Final Cleanup Agent

Purpose: A lightweight pass after human edits to ensure formatting consistency, link integrity, and proper markdown structure. Not a creative agent — strictly a technical cleanup.

const finalCleanupAgent = new Agent({
  id: "final-cleanup",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: `You are a technical proofreader. The content has been human-edited.
    Check ONLY for:
    - Markdown formatting validity (no broken syntax)
    - Image tags have alt text and dimensions
    - All internal links still resolve (you'll verify via tool)
    - No orphaned heading hierarchy (H3 without parent H2)
    - Consistent list formatting
    
    Do NOT change tone, wording, or content. Only fix technical issues.`,
  tools: { verifyInternalLinks },
  maxSteps: 4,
});

6.3 SEO Validation → Publishing Flow

After cleanup, the content enters the programmatic SEO validation engine (defined in detail in Section 8). If it scores 10/10, it proceeds to the Publishing Agent. If it fails any check, the Final Cleanup Agent attempts auto-fixes for the specific failures, then revalidates. Maximum 3 fix-revalidate cycles. If still failing, escalate to human with a specific failure report.

Publishing Agent (tool-driven, minimal LLM reasoning):

const publishingAgent = new Agent({
  id: "publishing",
  model: "anthropic/claude-sonnet-4-20250514",
  instructions: `Execute the publication pipeline. Each tool must succeed before proceeding
    to the next. Log every action to the audit trail.`,
  tools: {
    formatForCms,
    uploadToCms,
    setMetadata,
    updateXmlSitemap,
    pingIndexingApi,
    updateOurPagesDb,
    triggerLangExtractPipeline,
    triggerSummaryRegeneration,
    triggerGraphRelationshipBuilder,
    triggerBidirectionalLinking,
    schedulePostPublishMonitoring,
    logToAuditTrail,
  },
  maxSteps: 15,
});

Post-publish pipeline (critical — this closes the data loop):

Content published to CMS
  ↓
LangExtract processes the new page → our_page_extractions
  ↓
Page summary generated → page_summaries (Level 2)
  ↓
Cluster summary regenerated → cluster_summaries (Level 1)
  ↓
Domain summary regenerated → domain_summaries (Level 0)
  ↓
Graph edges built:
  - covers_topic edges (from extracted topics)
  - targets_keyword edges (from keyword data)
  - links_to edges (from actual links in content)
  - should_link_to analysis: find existing pages that should link TO the new page
  - Execute bidirectional linking: update existing pages with new internal links
  ↓
Post-publish monitoring scheduled:
  - 24h: Verify page is indexed (GSC)
  - 7d: Initial rankings and impressions
  - 30d: Full performance review against projected targets
  - Auto-flag underperformers for refresh queue

7. Content Calendar & Application Interface

7.1 Content Calendar View

The calendar is the primary human control surface. It shows planned, in-progress, and published content on a timeline.

Calendar item data model:

interface ContentPlanItem {
  id: string;
  title: string;
  targetKeyword: string;
  contentType: "blog_post" | "guide" | "landing_page" | "comparison" | "case_study" | "product_page";
  status: "planned" | "brief_pending" | "brief_approved" | "writing" | "editing" | "review" | "revision" | "published";
  scheduledDate: Date;
  priority: 1 | 2 | 3;
  notes: string;
  source: "ai_generated" | "human_added";
  assignedWorkflowRunId?: string;
  createdAt: Date;
  updatedAt: Date;
}

Calendar capabilities:

Feature	Detail
Monthly/weekly/list views	Standard calendar views with color-coded content types and status indicators
Drag-and-drop rescheduling	Move items between dates. Constraint: can't schedule past today for unpublished items
Add item manually	Human creates a new `content_plan_item` with `source: "human_added"`. These are treated as fixed constraints by the Strategy Agent
AI-generated vs. human-added	Visually distinguished (e.g., AI items have a subtle indicator). Both are equal in the system once approved
Content type badges	Each item shows its type (Blog, Guide, Comparison, etc.) as a color-coded badge
Status pipeline	Visual indicator showing where each item is in the pipeline (planned → brief → writing → editing → review → published)
Click-through	Click any item to see its brief, current draft, SEO score, and workflow status
"Generate Plan" button	Triggers the Strategy Agent to analyze current data and propose new items
Bulk approve/reject	Multi-select AI-generated items for batch approval

7.2 Content Editor / Review Interface

Feature	Detail
Side-by-side view	Brief on left, draft on right
Inline editing	Full rich text editor with change tracking
Image placement	Drag-and-drop image upload at `[IMAGE: ...]` marker positions
SEO score panel	Live-updating 10-point SEO check as human edits
Comment/annotation	Leave notes for future reference or AI revision context
Approve / Request Changes / Reject	Action buttons that resume or restart the workflow

7.3 Additional Views

View	Purpose
Dashboard	Pipeline status, today's publications, competitor alerts, ranking movers, AI Overview tracking
Competitor Monitor	Competitor list, new/changed page feed, per-competitor content analysis, side-by-side comparison
Keyword & Search	Keyword tracker with ranking history, SERP feature tracking, AI Overview monitoring, GSC integration
SEO Audit	10-point check results per piece, historical scores, site-wide health, internal link map
Strategy Settings	Brand voice config, content pillars, competitor list management, target keywords, workflow config

8. Programmatic SEO Validation Engine

This is deterministic code, not an LLM. Every check has binary pass/fail logic. All 10 must pass for publication.

Check 1: Meta Title

{
  name: "Meta Title",
  validate: (content) => {
    const title = content.metaTitle;
    const checks = [
      { pass: title.length >= 50 && title.length <= 60, reason: `Length ${title.length}, need 50-60` },
      { pass: containsKeyword(title, content.primaryKeyword), reason: "Missing primary keyword" },
      { pass: await isUnique("meta_title", title), reason: "Duplicate meta title exists" },
      { pass: !willTruncate(title), reason: "Will truncate in SERPs" },
    ];
    return { passed: checks.every(c => c.pass), failures: checks.filter(c => !c.pass) };
  }
}

Check 2: Meta Description

Length: 150–160 characters
Contains primary keyword
Includes call-to-action or value proposition
Unique across site

Check 3: Heading Hierarchy

Exactly one H1 containing primary keyword
H2s use secondary keywords
No skipped levels (H1 → H3 without H2)
Logical nesting throughout

Check 4: Keyword Optimization

Primary keyword in: title, H1, first 100 words, at least one H2, meta description
Keyword density: 0.5%–2.5%
Secondary keywords present naturally
No keyword stuffing patterns (3+ identical phrases in sequence)

Check 5: Internal Linking

Minimum 3 internal links
All resolve to real published pages (verified against our_pages table)
Anchor text is descriptive (no "click here", "read more")
Anchor text is diversified (not all exact-match keyword)
Links are contextually placed (not dumped in a footer list)

Check 6: External Linking

At least 1 external link to authoritative source
No links to competitor domains (checked against competitors table blocklist)
External links are contextually relevant
Proper rel attributes on new-tab links

Check 7: Content Quality Metrics

Word count within ±10% of brief target
Readability score within configured range
No duplicate content (checked against our_page_extractions for same primary keyword via cannibalizes graph edge — not vector similarity)
No paragraph exceeds 300 words
Sentence length variety present

Check 8: Technical SEO

Valid JSON-LD schema markup present and parseable
Canonical URL set correctly
Open Graph tags: og:title, og:description, og:image
Twitter Card tags present
All images have alt text
All images have width/height dimensions

Check 9: URL & Slug

URL-friendly (lowercase, hyphens, no special characters)
Contains primary keyword or close variant
Under 60 characters
No duplicate slug in our_pages table

Check 10: Mobile & Performance

All images have explicit width/height (prevents Cumulative Layout Shift)
Images use loading="lazy" (except above-the-fold hero)
No inline styles that break mobile viewport
Tables have responsive handling
No excessively large embedded content

Validation output:

const SeoValidationResult = z.object({
  score: z.string(),           // "10/10", "8/10", etc.
  passed: z.boolean(),
  checks: z.array(z.object({
    id: z.number(),
    name: z.string(),
    passed: z.boolean(),
    details: z.string(),
    failureReason: z.string().optional(),
    autoFixable: z.boolean(),
  })),
});

9. Cost Model

Category	Monthly Estimate (50 pieces)	Notes
Claude Sonnet 4 (all agents)	$200–400	~$4–8 per piece across strategy, writing, editing, briefs, cleanup
Gemini 2.5 Flash (LangExtract + summaries)	$50–100	Continuous extraction of competitor + our pages + SERPs + summary generation
Firecrawl	$40–80	Competitor sitemap crawling + page scraping (depends on competitor count)
Semrush API	$119–229	Business plan for keyword/SERP API access
Image generation	$0	Human-placed — no API cost
Hosting (Vercel + Railway)	$50–100	App + workers + LangExtract service
PostgreSQL (managed)	$25–50	Neon, Supabase, or Railway
GSC API	Free
Total	$485–960/month

10. Risk Matrix

Risk	Likelihood	Impact	Mitigation
LangExtract extraction quality inconsistent	Medium	Medium	Multi-pass extraction (3 passes), high-quality few-shot examples, validation checks, prompt iteration
Hierarchical summaries drift from source data	Medium	Medium	Summaries regenerated daily from fresh extractions; timestamped and versioned
Graph relationship staleness	Medium	Low	Weekly re-validation; confidence scores decay over time; stale edges flagged in agent context
LangExtract Python ↔ Mastra TypeScript bridge failure	Low	High	Health check endpoint, auto-restart, fallback to direct Mastra LLM extraction
Firecrawl rate limiting / anti-bot blocks	Medium	Medium	Respectful crawl scheduling, Firecrawl's built-in evasion, fallback to cached content
LLM output quality variance	High	Medium	Multi-agent review pipeline + human gate + programmatic SEO checks
Google algorithm targeting AI content	Medium	High	80/20 human-AI method ensures genuine Experience + Expertise in every piece
Hallucination in published content	Medium	High	Fact-check via extracted claims + human review + LangExtract source grounding
Content cannibalization at scale	Medium	Medium	Graph `cannibalizes` edges + Strategy Agent explicitly checks before planning

11. Future: Filesystem-as-Context Architecture

The Problem This Solves

Even with the hierarchical summary approach, there's an architectural ceiling: summaries are pre-generated snapshots. As the content library grows to thousands of pages and the competitive landscape shifts daily, keeping summaries fresh and relevant becomes a continuous compute cost. More fundamentally, pre-computing what context an agent might need is inherently wasteful — you're guessing ahead of time which summaries will matter for which tasks.

The filesystem-as-context pattern, articulated by Andrej Karpathy's "context engineering" framework and demonstrated by Anthropic's Skills system, offers a potentially superior approach: don't pre-load context. Let agents navigate to it on demand.

The Core Idea

Instead of generating hierarchical summaries that agents read passively, you structure all system knowledge as a navigable filesystem. Agents use ls, grep, glob, and file reading to pull exactly the context they need for the current task, building their context window incrementally with only signal, never noise.

/contentengine/
├── strategy/
│   ├── STRATEGY.md              ← Current pillars, priorities, audience
│   ├── brand-voice/
│   │   ├── VOICE_GUIDE.md       ← Extracted tone markers, vocabulary rules
│   │   └── samples/
│   │       ├── best-blog-post.md
│   │       └── best-guide.md
│   └── calendar/
│       ├── 2026-03.md           ← March calendar in structured markdown
│       └── 2026-04.md
│
├── competitors/
│   ├── INDEX.md                 ← Competitor list with domains, last crawled
│   ├── competitor-a/
│   │   ├── OVERVIEW.md          ← LangExtract summary of their content strategy
│   │   ├── recent-changes.md    ← Last 30 days of content changes
│   │   └── pages/
│   │       ├── their-fine-tuning-guide.md    ← Extracted entities as structured MD
│   │       └── their-deployment-guide.md
│   └── competitor-b/
│       └── ...
│
├── our-content/
│   ├── INDEX.md                 ← Page inventory with URLs, types, performance
│   ├── by-cluster/
│   │   ├── ai-ml/
│   │   │   ├── CLUSTER_OVERVIEW.md   ← Performance, gaps, competitor comparison
│   │   │   ├── fine-tuning-guide.md  ← Extracted entities + performance data
│   │   │   └── lora-explained.md
│   │   └── devops/
│   │       └── ...
│   └── by-status/
│       ├── needs-refresh/            ← Pages flagged for updating
│       └── underperforming/          ← Pages below performance threshold
│
├── keywords/
│   ├── INDEX.md                 ← Keyword clusters with priority
│   ├── cluster-ai-ml.md        ← Keywords, volumes, our ranks, competitor ranks
│   └── cluster-devops.md
│
├── search/
│   ├── ai-overviews.md          ← AI Overview tracking for priority keywords
│   ├── serp-features.md         ← Featured snippet, PAA tracking
│   └── trends.md                ← Emerging/declining search trends
│
└── graph/
    ├── gaps.md                  ← Topics competitors cover that we don't
    ├── cannibalization.md       ← Pages targeting same keywords
    ├── linking-opportunities.md ← should_link_to edges as structured list
    └── competitive-overlaps.md  ← competes_with edges with rank comparison

How an Agent Would Navigate

When the Content Strategy Agent needs to plan next month's content:

1. Agent reads /strategy/STRATEGY.md                     (~500 tokens)
   → Understands pillars, priorities, audience

2. Agent runs: ls /competitors/                           (~100 tokens)
   → Sees competitor directories

3. Agent reads /competitors/INDEX.md                      (~300 tokens)
   → Gets competitor overview and recent activity summary

4. Agent reads /graph/gaps.md                             (~800 tokens)
   → Sees all content gaps as structured list

5. Agent reads /keywords/cluster-ai-ml.md                 (~600 tokens)
   → Sees keyword opportunities in the priority cluster

6. Agent runs: grep -l "edge deployment" /competitors/*/pages/  (~50 tokens)
   → Finds which competitors have edge deployment content

7. Agent reads /competitors/competitor-a/pages/edge-deploy.md   (~400 tokens)
   → Gets structured extraction of their specific page

8. Agent reads /strategy/calendar/2026-04.md               (~300 tokens)
   → Sees what's already scheduled for April

9. Agent reads /our-content/by-status/needs-refresh/       (~400 tokens)
   → Sees which existing content needs updating

Total context built: ~3,500 tokens of precisely relevant data

Compare this to the hierarchical summary approach (~14,000 tokens, some of which may be irrelevant to this specific planning task). The filesystem approach lets the agent decide what to load based on the actual task, not pre-generated summaries that try to anticipate what might be needed.

What It Would Take to Implement

This is a significant architectural change but builds cleanly on top of the LangExtract + Graph foundation already specified in this document. The core work:

Filesystem generation pipeline. A scheduled job that reads from PostgreSQL (extractions, summaries, graph edges, performance data) and writes structured markdown files to a mounted filesystem. Each file follows a consistent schema: frontmatter with metadata, then structured content. This is the bridge between the database and the agent's navigable context. Estimated effort: 2–3 weeks for the generation logic, templates, and scheduling.
Sandbox environment per agent session. Each agent invocation gets a read-only mounted view of the filesystem. Mastra's tool system exposes ls, cat, grep, and glob as tools the agent can call. The agent navigates the filesystem using bash-like commands it already knows from training data. This is simpler than building custom SQL-backed query tools — the filesystem IS the query interface. Estimated effort: 1–2 weeks for the sandbox tooling and Mastra integration.
Filesystem-aware agent prompts. Agent instructions are updated to describe the filesystem structure and navigation patterns. Instead of "use the readClusterSummaries tool," the prompt says "the competitor data is in /competitors/. Start by reading the INDEX.md, then drill into specific competitor directories as needed." This leverages the model's existing training on filesystem navigation. Estimated effort: 1 week of prompt engineering and testing.
Hybrid SQL + filesystem approach. Not everything moves to the filesystem. High-frequency queries (keyword rankings, performance metrics, real-time GSC data) stay in PostgreSQL with dedicated tools. The filesystem handles the slower-changing strategic context: content inventories, competitive analysis, brand voice, editorial plans. The agent has both filesystem tools and database tools available and chooses the right one for the task. Estimated effort: included in items 1–3 above.
Write-back pattern. When agents need to create outputs (content plan items, briefs), they write to specific directories (e.g., /strategy/calendar/drafts/) which a sync job picks up and persists to PostgreSQL. This keeps the database as the authoritative source while giving agents a natural write interface. Estimated effort: 1 week.

Total estimated implementation effort: 5–7 weeks on top of the base system specified in this document.

The tradeoff is clear: The filesystem approach produces tighter, more relevant context windows (3,500 tokens vs. 14,000) because agents load only what they actually need for the current task. The cost is an additional generation pipeline and the operational complexity of keeping the filesystem in sync with the database. For a system operating at scale (500+ pages, 10+ competitors, 50+ pieces per month), the context efficiency gains likely justify the investment. For smaller operations, the hierarchical summary approach specified in the main architecture is sufficient.

The recommended path: build the base system using the hierarchical summary + graph approach first (Sections 2–8 of this spec), validate it works at your current scale, then implement the filesystem layer as a context optimization when agent context quality becomes a bottleneck.

This is a living specification. Update as architectural decisions are made during implementation.

ContentEngine — Technical Specification v3

Executive Summary

Table of Contents

1. Frameworks, Libraries & Integrations

Core Framework

Extraction & Intelligence

LLM Providers

Data & Storage

Application & Deployment

External APIs

Integration Architecture

2. Data Architecture

2.1 Memory Model: Three Layers Replacing RAG

2.2 Database Schema

3. Master Workflow

4. Strategy Layer

4.1 Competitive Intelligence Agent

4.2 Search Landscape Agent

4.3 Content Strategy Agent (The Planner)

4.4 Content Brief Agent

5. Content Layer

5.1 Writer Agent

5.2 Editor Agent

5.3 Content Layer Workflow

6. Production Layer

6.1 Human Review & Editing

6.2 Final Cleanup Agent

6.3 SEO Validation → Publishing Flow

7. Content Calendar & Application Interface

7.1 Content Calendar View

7.2 Content Editor / Review Interface

7.3 Additional Views

8. Programmatic SEO Validation Engine

Check 1: Meta Title

Check 2: Meta Description

Check 3: Heading Hierarchy

Check 4: Keyword Optimization

Check 5: Internal Linking

Check 6: External Linking

Check 7: Content Quality Metrics

Check 8: Technical SEO

Check 9: URL & Slug

Check 10: Mobile & Performance

9. Cost Model

10. Risk Matrix

11. Future: Filesystem-as-Context Architecture

The Problem This Solves

The Core Idea

How an Agent Would Navigate

What It Would Take to Implement