Alton Wells — AI Content Engine: Architecture & Scope
Project Codename: ContentEngine
Framework: Mastra (TypeScript)
Status: Architecture Review Draft
Date: March 2026
1. System Overview
This document defines the architecture, data model, agent design, workflow orchestration, and scope for an autonomous AI-powered content production system. The system operates across three layers — Strategy, Content, and Production — mirroring the flow diagram, with a fourth Data Infrastructure layer underpinning everything.
The core loop is:
1Competitors + Market Data + Our Performance + Our Strategy
2 ↓
3 AI generates a Content Plan (calendar)
4 ↓
5 Human reviews/adjusts the plan
6 ↓
7 AI autonomously executes each piece through the pipeline
8 ↓
9 Human reviews final output
10 ↓
11 Programmatic SEO validation (must pass 10/10)
12 ↓
13 PublishThe system is designed so that humans set strategy and approve output, while AI handles everything in between — research, planning, writing, editing, optimization, image generation, and SEO validation.
2. Data Infrastructure Layer
This is the foundation. Every agent in the system reads from and writes to these data stores. Without clean, continuously-updated data, the agents cannot make good decisions.
2.1 Competitor Intelligence Database
Purpose: A living, continuously-updated database of every competitor — their domains, sitemaps, page inventory, content topics, publishing frequency, and content changes over time.
Schema (core tables):
| Table | Key Fields | Update Frequency |
|---|---|---|
competitors | id, name, domain, industry_vertical, notes | Manual + agent-suggested |
competitor_sitemaps | id, competitor_id, sitemap_url, last_crawled_at, page_count | Daily crawl |
competitor_pages | id, competitor_id, url, title, meta_description, h1, h2s[], word_count, published_at, last_modified, content_hash | Daily crawl |
competitor_content_analysis | id, page_id, topics[], keywords[], content_type, estimated_traffic, serp_position, quality_score | Weekly analysis |
competitor_changes | id, page_id, change_type (new/updated/removed), detected_at, diff_summary | Daily diff |
Data Collection Agents/Jobs:
| Job | Schedule | What It Does |
|---|---|---|
| Sitemap Crawler | Daily, 2 AM | Fetches and parses all competitor sitemaps. Detects new pages, removed pages, and last-modified changes. Stores raw sitemap XML and parsed entries. |
| Page Scraper | Daily, 3 AM | For new/changed pages, fetches the page, extracts title, meta, headings, word count, content body. Computes content hash for change detection. |
| Content Analyzer | Weekly, Sunday | Runs an LLM analysis pass over new/changed competitor pages. Extracts topics, identifies content type (blog, guide, landing page, comparison), estimates quality. |
| Competitor Diff Reporter | Daily, 6 AM | Generates a digest of all competitor content changes in the last 24 hours. Posts to Slack/dashboard. Flags high-priority changes (new pages targeting our keywords). |
Key Design Decision: Store the full text content of competitor pages (not just metadata). This feeds the Strategy Agent's ability to analyze positioning, messaging gaps, and content depth. Store as plain text extracted from HTML, not raw HTML.
2.2 Search & Keyword Performance Database
Purpose: Centralized store of keyword research data, SERP landscape, search trends, and generative search (AI Overview) appearances.
Schema (core tables):
| Table | Key Fields | Update Frequency |
|---|---|---|
keywords | id, keyword, search_volume, difficulty, cpc, intent (informational/transactional/etc), cluster_id | Weekly refresh |
keyword_clusters | id, name, primary_keyword_id, topic, priority | Manual + AI-suggested |
serp_snapshots | id, keyword_id, snapshot_date, organic_results[], featured_snippet, people_also_ask[], ai_overview_present, ai_overview_sources[] | Weekly |
ai_overview_tracking | id, keyword_id, detected_at, our_site_cited (bool), cited_sources[], summary_text | Daily for priority keywords |
search_trends | id, keyword_id, date, volume_index, yoy_change | Monthly |
Data Sources & Integrations:
| Source | What It Provides | Integration Method |
|---|---|---|
| Semrush / Ahrefs API | Search volume, difficulty, CPC, SERP features, competitor rankings | REST API via Mastra tool |
| Google Search Console API | Our impressions, clicks, CTR, average position per query | OAuth2 API via Mastra tool |
| Google Trends API (unofficial) | Relative search interest over time | Scraping or unofficial API |
| SERP Scraper | Full SERP snapshots including AI Overview detection | Custom tool (SerpAPI or Browserbase) |
| Generative search monitors | AI Overview presence and citation tracking | Custom OpenClaw cron or Mastra scheduled workflow |
2.3 Our Content & Performance Database
Purpose: Complete inventory of all our published content, its structure, performance metrics, and metadata. This is what the agents use to understand "what we have" and "how it's performing."
Schema (core tables):
| Table | Key Fields | Update Frequency |
|---|---|---|
our_pages | id, url, title, slug, content_type, status (published/draft/planned), published_at, last_updated, word_count, content_body | On publish/update |
our_page_seo | id, page_id, meta_title, meta_description, h1, h2s[], canonical_url, schema_markup, internal_links_out[], internal_links_in[], seo_score | On publish + weekly audit |
our_page_performance | id, page_id, date, impressions, clicks, ctr, avg_position, sessions, bounce_rate, avg_time_on_page | Daily from GSC + Analytics |
our_page_keywords | id, page_id, keyword_id, target (bool), current_position, position_change_7d, position_change_30d | Daily |
content_embeddings | id, page_id, chunk_index, embedding_vector, chunk_text | On publish/update |
Key Design Decision: Maintain a vector embedding index of all our content (using Mastra's RAG pipeline). This powers internal linking automation, content gap detection, and duplicate/cannibalization detection.
2.4 Content Strategy & Planning Database
Purpose: The content calendar, strategy directives, brand voice guidelines, and editorial configuration that humans set and the AI operates against.
Schema (core tables):
| Table | Key Fields | Update Frequency |
|---|---|---|
content_strategy | id, name, description, target_audience, brand_voice_guidelines, content_pillars[], priorities, active (bool) | Manual (human-set) |
content_plan_items | id, strategy_id, title, target_keyword_id, content_type, status (planned/in_progress/review/published), scheduled_date, assigned_agent_run_id, priority, notes, source (ai_generated/human_added) | AI-generated + human-edited |
content_briefs | id, plan_item_id, outline, target_word_count, target_keywords[], competitor_references[], internal_link_targets[], research_notes, approved (bool) | AI-generated, human-approved |
content_drafts | id, plan_item_id, version, content_body, seo_score, editor_notes, status (draft/edited/approved/rejected) | AI-generated per pipeline run |
brand_voice_samples | id, strategy_id, sample_text, tone_tags[], notes | Manual upload |
Key Design Decision: The content_plan_items table has a source field — either ai_generated or human_added. Both the AI planning agent and human users can add items to the calendar. The AI respects human-added items as fixed constraints when generating its plans.
3. Strategy Layer — Agents & Workflows
This layer corresponds to the top section of your flow diagram. The agents here consume all the data infrastructure, synthesize it, and produce actionable content plans.
3.1 Competitive Intelligence Agent
1ID: competitive-intelligence-agent
2Model: anthropic/claude-sonnet-4-20250514Role: Continuously analyzes the competitor database and produces strategic insights about competitor positioning, content gaps, and opportunities.
Inputs:
- Competitor pages database (new and changed content)
- Our content database (for comparison)
- Keyword performance data
Tools:
query_competitor_pages— Searches competitor content by topic, keyword, or content typequery_our_pages— Searches our content for comparisoncompetitor_diff_summary— Gets recent competitor content changesweb_search— Validates findings against live web data
Outputs (structured, Zod-validated):
1{
2 competitorMoves: [{
3 competitor: string,
4 action: "new_content" | "content_update" | "new_topic",
5 details: string,
6 relevanceToUs: "high" | "medium" | "low",
7 suggestedResponse: string,
8 }],
9 contentGaps: [{
10 topic: string,
11 competitorsCovering: string[],
12 ourCoverage: "none" | "weak" | "adequate",
13 opportunity: string,
14 estimatedImpact: "high" | "medium" | "low",
15 }],
16 positioningInsights: string,
17}Schedule: Weekly (full analysis), daily (change digest only).
3.2 Search Landscape Agent
1ID: search-landscape-agent
2Model: anthropic/claude-sonnet-4-20250514Role: Monitors keyword performance, SERP changes, AI Overview appearances, and search trends. Identifies where we're winning, losing, and where new opportunities are emerging.
Inputs:
- Keyword + SERP snapshot database
- Our page performance data (GSC)
- AI Overview tracking data
- Search trend data
Tools:
query_keyword_performance— Pulls our ranking data for specific keywordsquery_serp_snapshots— Gets SERP landscape for keywordsquery_ai_overview_tracking— Checks AI Overview presence and our citation statussemrush_keyword_research— Fetches fresh keyword data from Semrush APIgsc_performance_query— Pulls real-time data from Google Search Console
Outputs (structured):
1{
2 rankingChanges: [{ keyword, previousPosition, currentPosition, trend, url }],
3 aiOverviewAlerts: [{ keyword, ourSiteCited, topCitedSources, recommendation }],
4 emergingKeywords: [{ keyword, volume, difficulty, relevance, opportunity }],
5 decliningContent: [{ url, keyword, positionDrop, suggestedAction }],
6 searchTrendShifts: [{ topic, direction, magnitude, implication }],
7}Schedule: Daily for ranking changes, weekly for full landscape analysis.
3.3 Content Strategy Agent (The Planner)
1ID: content-strategy-agent
2Model: anthropic/claude-sonnet-4-20250514Role: The brain of the Strategy Layer. Synthesizes outputs from the Competitive Intelligence Agent, Search Landscape Agent, our content inventory, our strategy directives, and brand guidelines to produce a prioritized content plan.
This is the agent that "thinks through all of these and comes up with a plan" from your diagram.
Inputs:
- Competitive Intelligence Agent output
- Search Landscape Agent output
- Current content strategy (human-set pillars, priorities, brand voice)
- Existing content plan items (both AI-generated and human-added)
- Our content inventory + performance data
- Content embeddings (to avoid duplication)
Tools:
query_content_strategy— Gets current strategy directivesquery_content_plan— Gets existing planned/scheduled itemsquery_our_content_inventory— Searches our published contentvector_similarity_search— Checks if proposed topics overlap with existing contentweb_search— Researches topics for viability and resource discoveryadd_content_plan_item— Writes new items to the content calendarupdate_content_plan_item— Modifies existing plan items
Workflow:
11. Load current strategy directives and priorities
22. Ingest Competitive Intelligence report
33. Ingest Search Landscape report
44. Review existing content plan (what's already scheduled)
55. Review our content inventory (what we've already published)
66. Identify gaps between strategy goals and current coverage
77. Generate candidate content ideas with rationale
88. Score and prioritize candidates by:
9 - Strategic alignment (does it serve our pillars?)
10 - Search opportunity (volume × 1/difficulty)
11 - Competitive urgency (are competitors gaining ground?)
12 - Content gap severity (do we have zero coverage?)
13 - GEO potential (can this be cited by AI search?)
149. Check for duplication/cannibalization via vector similarity
1510. Research top candidates (web search for viability, resource links)
1611. Produce prioritized content plan with scheduling recommendations
1712. Write plan items to database → SUSPEND for human reviewOutputs:
1{
2 planItems: [{
3 title: string,
4 targetKeyword: string,
5 contentType: "blog_post" | "guide" | "landing_page" | "comparison" | "case_study",
6 rationale: string, // Why this piece, why now
7 competitiveContext: string, // What competitors are doing
8 suggestedScheduleDate: Date,
9 priority: 1 | 2 | 3,
10 estimatedImpact: string,
11 researchLinks: string[], // Pre-found resources to link/reference
12 internalLinkTargets: string[], // Our existing pages to link to/from
13 }],
14 strategyNotes: string, // Agent's high-level strategic thinking
15 calendarSummary: string, // Overview of the proposed schedule
16}Human Interaction Point: After the agent generates the plan, the workflow suspends (Mastra .waitForEvent()). The human reviews the proposed calendar in the app UI, can approve items, reject items, edit items, add their own items, and reorder priorities. When they click "Approve Plan," the workflow resumes and the approved items are queued for execution.
3.4 Content Brief Agent
1ID: content-brief-agent
2Model: anthropic/claude-sonnet-4-20250514Role: For each approved content plan item, generates a detailed content brief that the Writer Agent will execute against.
Inputs:
- Approved content plan item (from calendar)
- Competitor content analysis (what top-ranking pages cover)
- Keyword data (primary, secondary, related terms)
- Our existing content (for internal linking opportunities)
- Brand voice guidelines
Tools:
query_serp_snapshots— Gets current SERP landscape for the target keywordscrape_competitor_page— Extracts structure and content from top-ranking competitor pagesquery_our_content_inventory— Finds internal linking opportunitiesweb_search— Finds additional resources, data sources, expert quotesvector_similarity_search— Ensures the brief doesn't duplicate existing content
Outputs (structured):
1{
2 title: string,
3 targetKeyword: string,
4 secondaryKeywords: string[],
5 searchIntent: string,
6 targetWordCount: number,
7 contentFormat: string, // "How-to guide", "Listicle", "Deep dive", etc.
8 outline: [{
9 heading: string,
10 level: "h2" | "h3",
11 keyPoints: string[],
12 targetKeywords: string[], // Keywords to naturally include in this section
13 suggestedWordCount: number,
14 }],
15 competitorAnalysis: string, // What top 5 do well, where they fall short
16 differentiators: string[], // Our unique angles
17 internalLinkTargets: [{
18 url: string,
19 anchorTextSuggestion: string,
20 contextNote: string,
21 }],
22 externalResources: [{
23 url: string,
24 description: string,
25 useCase: string, // "Cite as source", "Link for reader", "Reference for accuracy"
26 }],
27 toneAndStyle: string,
28 audienceNotes: string,
29 seoRequirements: {
30 metaTitleGuideline: string,
31 metaDescriptionGuideline: string,
32 schemaType: string,
33 featuredSnippetTarget: boolean,
34 },
35}4. Content Layer — Agents & Workflows
This layer corresponds to the middle section of your flow diagram. It takes an approved content brief and produces a polished draft.
4.1 Writer Agent
1ID: writer-agent
2Model: anthropic/claude-sonnet-4-20250514
3Instructions: Dynamic — loads brand voice guidelines + brief-specific tone directivesRole: Receives a content brief and produces a complete first draft that follows the outline, hits the word count targets, incorporates keywords naturally, includes internal and external links, and matches the brand voice.
Key Prompting Strategy:
The Writer Agent's system prompt is assembled dynamically for each piece:
1Base Instructions (static):
2 - Writing quality standards
3 - Formatting rules (heading hierarchy, paragraph length, etc.)
4 - Link insertion patterns
5 - Keyword integration rules (natural, not stuffed)
6
7+ Brand Voice Guidelines (from content_strategy table):
8 - Tone descriptors
9 - Sample passages
10 - Vocabulary preferences / words to avoid
11
12+ Brief-Specific Directives (from content_briefs table):
13 - The full outline with section-level instructions
14 - Target keywords per section
15 - Differentiators to emphasize
16 - Internal/external links to includeTools:
web_search— For real-time fact verification during writingquery_our_content— To check consistency with existing published contentvector_similarity_search— To find additional internal linking opportunities during writing
Output: Full markdown content body with frontmatter metadata, internal links, external links, and image placement markers ([IMAGE: description of needed image]).
4.2 Editor Agent
1ID: editor-agent
2Model: anthropic/claude-sonnet-4-20250514
3Instructions: Focused on language correctness, verbal consistency, and brand voice adherenceRole: Reviews the Writer Agent's draft for language correctness, verbal consistency, factual accuracy, brand voice adherence, and structural quality. Does NOT rewrite — provides specific edits and corrections.
Checks performed:
| Check Category | What It Evaluates |
|---|---|
| Language Correctness | Grammar, spelling, punctuation, sentence structure |
| Verbal Consistency | Consistent terminology (don't switch between "users" and "customers" randomly), consistent formatting of terms, consistent voice (active vs. passive) |
| Brand Voice Adherence | Compares against brand voice samples. Flags sections that drift from established tone |
| Factual Consistency | Cross-references claims with the brief's source material. Flags unsubstantiated statistics |
| Structural Quality | Heading hierarchy compliance, section length balance, transition quality, intro/conclusion effectiveness |
| Link Quality | Verifies all internal links point to real pages, checks anchor text naturalness, validates external link relevance |
| Keyword Integration | Confirms target keywords appear in required positions (title, H1, first paragraph, H2s) without over-optimization |
Output:
1{
2 overallAssessment: "pass" | "needs_revision",
3 editsList: [{
4 location: string, // Section/paragraph identifier
5 type: "grammar" | "voice" | "factual" | "structural" | "keyword" | "link",
6 severity: "critical" | "suggested",
7 original: string,
8 suggested: string,
9 rationale: string,
10 }],
11 voiceConsistencyScore: number, // 0-100
12 readabilityScore: number, // Flesch-Kincaid or similar
13 revisedContent: string, // Full draft with all edits applied
14}Workflow Logic: If overallAssessment === "needs_revision" and critical edits exist, the revised content loops back through the Writer Agent with the edit list as context. Maximum 2 revision cycles, then force-proceed to human review.
4.3 Content Layer Workflow (Mastra)
1const contentLayerWorkflow = createWorkflow({
2 id: "content-layer-pipeline",
3 inputSchema: z.object({ briefId: z.string() }),
4 outputSchema: z.object({ draftId: z.string(), seoScore: z.number() }),
5})
6 .then(loadBriefStep) // Load the approved content brief
7 .then(writerAgentStep) // Writer produces first draft
8 .then(editorAgentStep) // Editor reviews and revises
9 .branch({ // Conditional: needs another pass?
10 condition: ({ editorOutput }) =>
11 editorOutput.overallAssessment === "needs_revision"
12 && editorOutput.revisionCount < 2,
13 trueStep: writerRevisionStep, // Loop back to writer with edits
14 falseStep: finalizeDraftStep, // Proceed to final draft
15 })
16 .then(saveDraftStep) // Persist final draft to database
17 .commit();5. Production Layer — Agents, Checks & Publishing
This layer corresponds to the bottom section of your flow diagram. It takes the polished draft through human review, image generation, final edits, programmatic SEO validation, and publishing.
5.1 Human Review (Suspend/Resume)
This is the critical quality gate.
When the Content Layer produces a final draft:
- Draft is saved to
content_draftswith statusawaiting_review - Notification sent (Slack, email, or in-app)
- Workflow suspends via
.waitForEvent("human-review-complete") - Human opens the draft in the app UI
- Human can:
- Approve — proceed as-is
- Edit — make changes directly, then approve
- Reject — send back to Content Layer with notes (triggers a new Writer Agent run with human feedback as context)
- Add personal experience/insights — the critical 20% from the 80/20 method
- On approval, workflow resumes with the human-edited content
UI Requirements:
- Side-by-side view: AI draft vs. content brief
- Inline editing with change tracking
- Comment/annotation capability
- SEO score preview (live-updating as human edits)
- One-click approve/reject buttons
5.2 Image Generation Agent
1ID: image-generation-agent
2Model: Image generation API (DALL-E 3, Midjourney API, or Flux)Role: Processes the [IMAGE: description] markers in the content and generates appropriate images.
Workflow:
- Parse all image markers from the approved content
- For each marker, generate a detailed image prompt based on the description and content context
- Generate the image via API
- Optimize the image (compress, resize to target dimensions)
- Generate SEO-optimized alt text
- Upload to CDN / media library
- Replace markers in content with proper
<img>tags including alt text, width/height, and loading="lazy"
Output: Updated content body with all image markers replaced by actual image references + alt text.
5.3 Final Edits Agent
1ID: final-edits-agent
2Model: anthropic/claude-sonnet-4-20250514Role: One last pass after human edits and image insertion. Ensures human edits didn't break formatting, images are properly placed, links still work, and the content is publication-ready.
Checks:
- Markdown/HTML formatting validity
- Image placement and alt text quality
- Link integrity (no broken internal links)
- Consistent formatting after human edits
- Final readability pass
5.4 Programmatic SEO Validation (Must Score 10/10)
This is not an LLM agent — it's a deterministic, code-based validation engine. Every check has a binary pass/fail result. All 10 must pass.
1const seoChecks = {
2 1: {
3 name: "Meta Title",
4 check: (content) => {
5 // Length: 50-60 characters
6 // Contains primary keyword
7 // Unique (not used by any other page)
8 // No truncation risk
9 }
10 },
11 2: {
12 name: "Meta Description",
13 check: (content) => {
14 // Length: 150-160 characters
15 // Contains primary keyword
16 // Includes call-to-action or value proposition
17 // Unique across site
18 }
19 },
20 3: {
21 name: "Heading Hierarchy",
22 check: (content) => {
23 // Exactly one H1
24 // H1 contains primary keyword
25 // H2s use secondary keywords
26 // No skipped levels (H1 → H3 without H2)
27 // Logical nesting
28 }
29 },
30 4: {
31 name: "Keyword Optimization",
32 check: (content) => {
33 // Primary keyword in: title, H1, first 100 words, at least one H2, meta description
34 // Keyword density: 0.5% - 2.5% (not over-optimized)
35 // Secondary keywords present naturally
36 // No keyword stuffing patterns detected
37 }
38 },
39 5: {
40 name: "Internal Linking",
41 check: (content) => {
42 // Minimum 3 internal links
43 // All internal links point to valid, published pages
44 // Anchor text is descriptive (no "click here")
45 // Anchor text is diversified (not all exact-match keyword)
46 // Links are contextually relevant
47 }
48 },
49 6: {
50 name: "External Linking",
51 check: (content) => {
52 // At least 1 external link to authoritative source
53 // External links use rel="noopener" on new-tab links
54 // No links to competitor domains (configurable blocklist)
55 // External links are contextually relevant
56 }
57 },
58 7: {
59 name: "Content Quality Metrics",
60 check: (content) => {
61 // Word count meets target (within 10% of brief target)
62 // Readability score within acceptable range (configurable)
63 // No duplicate content detected (cosine similarity < threshold vs. existing pages)
64 // Paragraph length: no paragraphs over 300 words
65 // Sentence variety: mix of short and long sentences
66 }
67 },
68 8: {
69 name: "Technical SEO",
70 check: (content) => {
71 // Valid schema markup (JSON-LD) present and parseable
72 // Canonical URL set correctly
73 // Open Graph tags present (og:title, og:description, og:image)
74 // Twitter Card tags present
75 // Image alt text on all images
76 // Image dimensions specified
77 }
78 },
79 9: {
80 name: "URL & Slug",
81 check: (content) => {
82 // Slug is URL-friendly (lowercase, hyphens, no special chars)
83 // Slug contains primary keyword or close variant
84 // Slug length: under 60 characters
85 // No duplicate slug in our pages database
86 }
87 },
88 10: {
89 name: "Mobile & Performance",
90 check: (content) => {
91 // All images have width/height (prevents CLS)
92 // Images use lazy loading
93 // No inline styles that break mobile
94 // Table responsiveness handled
95 // No excessively large embedded content
96 }
97 }
98};Validation Flow:
- Run all 10 checks against the content
- If all pass → proceed to publish
- If any fail → generate a fix report, send back to Final Edits Agent with specific failure details, re-validate after fixes
- Maximum 3 fix cycles, then escalate to human with the failure report
Output:
1{
2 score: "10/10" | "9/10" | etc,
3 passed: boolean,
4 checks: [{
5 id: number,
6 name: string,
7 passed: boolean,
8 details: string, // What was checked
9 failureReason?: string, // Why it failed (if applicable)
10 autoFixable: boolean, // Can the agent fix this automatically?
11 }],
12}5.5 Publishing Agent
1ID: publishing-agent
2Model: N/A (primarily tool-driven, minimal LLM reasoning)Role: Takes the validated, 10/10 content and publishes it to the CMS.
Actions:
- Format content for CMS (markdown → CMS content blocks, or HTML)
- Upload images to CMS media library (if not already CDN-hosted)
- Set all metadata (title, description, slug, canonical, schema, OG tags)
- Set publication date (from content plan schedule)
- Create/update XML sitemap entry
- Ping Google Indexing API / IndexNow for fast crawling
- Update our content database (
our_pages,our_page_seo) - Generate content embeddings and add to vector index
- Run bidirectional internal linking (find places in existing content to link to the new page)
- Schedule social media distribution (if configured)
- Log publish event to audit trail
Post-Publish Monitoring:
- 24-hour check: Verify page is indexed (Google Search Console)
- 7-day check: Initial ranking data and impressions
- 30-day check: Performance review against projected targets
- Auto-flag underperforming content for refresh consideration
6. Master Workflow Orchestration
The entire system is orchestrated as a Mastra workflow that chains the three layers:
1┌─────────────────────────────────────────────────────────────────┐
2│ SCHEDULED TRIGGERS │
3│ │
4│ Daily: Competitor crawl, SERP monitoring, rank tracking │
5│ Weekly: Full competitive analysis, search landscape report │
6│ On-demand: Human triggers plan generation │
7│ On-schedule: Content calendar items trigger execution │
8└─────────────────────────┬───────────────────────────────────────┘
9 │
10 ▼
11┌─────────────────────────────────────────────────────────────────┐
12│ STRATEGY LAYER WORKFLOW │
13│ │
14│ 1. Competitive Intelligence Agent (async, data collection) │
15│ 2. Search Landscape Agent (async, data collection) │
16│ 3. Content Strategy Agent (synthesis + plan generation) │
17│ 4. ── SUSPEND ── Human reviews/edits content plan ── RESUME ── │
18│ 5. Content Brief Agent (generates brief per approved item) │
19│ 6. ── SUSPEND ── Human approves brief ── RESUME ── │
20└─────────────────────────┬───────────────────────────────────────┘
21 │
22 ▼
23┌─────────────────────────────────────────────────────────────────┐
24│ CONTENT LAYER WORKFLOW │
25│ │
26│ 7. Writer Agent (produces first draft from brief) │
27│ 8. Editor Agent (reviews, revises) │
28│ 9. ── LOOP ── if needs_revision && cycles < 2 → back to 7 ── │
29│ 10. Final draft saved to database │
30└─────────────────────────┬───────────────────────────────────────┘
31 │
32 ▼
33┌─────────────────────────────────────────────────────────────────┐
34│ PRODUCTION LAYER WORKFLOW │
35│ │
36│ 11. ── SUSPEND ── Human reviews draft (edits, adds │
37│ experience, approves) ── RESUME ── │
38│ 12. Image Generation Agent │
39│ 13. Final Edits Agent (post-human cleanup) │
40│ 14. Programmatic SEO Validation (10/10 required) │
41│ 15. ── LOOP ── if score < 10/10 → auto-fix → revalidate ── │
42│ 16. Publishing Agent (push to CMS, index, distribute) │
43│ 17. Post-publish monitoring scheduled │
44└─────────────────────────────────────────────────────────────────┘Suspension Points (Human Touchpoints)
| # | Where | What Human Does | Estimated Time |
|---|---|---|---|
| 1 | After content plan generation | Review calendar, approve/reject/edit items, add own items | 15-30 min per planning cycle |
| 2 | After content brief generation | Review brief, confirm outline and direction | 5-10 min per brief |
| 3 | After final draft produced | Deep review, add personal insights, edit for voice, approve | 10-20 min per piece |
Total human time per content piece: ~30-60 minutes (compared to 4-8 hours for fully manual content creation).
7. Application UI Requirements
The system needs a web application (likely Next.js given Mastra's TypeScript ecosystem) with these core views:
7.1 Dashboard
- Content pipeline status (items in each stage)
- Today's scheduled publications
- Competitor change alerts (last 24h)
- Ranking changes (significant movers)
- AI Overview citation tracking
- Content performance summary
7.2 Competitor Monitor
- Competitor list with domain, page count, last crawled
- New/changed page feed (chronological)
- Per-competitor content analysis (topics, volume, quality)
- Side-by-side content comparison (their page vs. ours on same topic)
7.3 Keyword & Search Performance
- Keyword tracker with rankings over time
- SERP feature tracking (featured snippets, AI Overviews)
- Search volume trends
- Keyword cluster view
- GSC data integration (impressions, clicks, CTR)
7.4 Content Calendar
- Calendar view of planned/scheduled content
- Drag-and-drop rescheduling
- Status indicators (planned → in_progress → review → published)
- AI-generated items vs. human-added items (visually distinguished)
- One-click to view brief, draft, or published piece
- "Generate Plan" button that triggers the Strategy Agent
7.5 Content Editor / Review Interface
- Full content preview
- Side-by-side: brief vs. draft
- Inline editing with change tracking
- SEO score panel (live-updating)
- Comment/annotation system
- Approve / Request Changes / Reject buttons
- Image preview and alt text editing
7.6 SEO Audit View
- 10-point SEO check results for each piece
- Historical SEO scores across all content
- Site-wide SEO health metrics
- Internal linking map visualization
- Technical SEO issue tracker
7.7 Strategy Settings
- Brand voice configuration (samples, tone descriptors)
- Content pillars and priorities
- Competitor list management
- Target keyword management
- Publishing workflow configuration (which checks are required, auto-publish vs. manual)
8. Technology Stack
| Component | Technology | Rationale |
|---|---|---|
| Agent Framework | Mastra (@mastra/core) | TypeScript-native, workflow orchestration, RAG, tools, suspend/resume |
| Runtime | Node.js 20+ / Bun | Mastra's supported runtimes |
| Web Framework | Next.js 15+ | Mastra ecosystem alignment, SSR, API routes |
| Database | PostgreSQL + pgvector | Relational data + vector embeddings in one database |
| ORM | Drizzle ORM | TypeScript-native, great Postgres support |
| Vector Search | pgvector (via Mastra RAG) | Unified with primary database, no separate vector DB needed |
| LLM Provider | Anthropic Claude Sonnet 4 (primary) | Quality + cost balance for content generation |
| Image Generation | DALL-E 3 or Flux API | High quality, API-accessible |
| SEO Data API | Semrush or Ahrefs API | Keyword data, SERP snapshots, competitor data |
| Search Console | Google Search Console API | Our ranking/performance data |
| CMS Integration | WordPress REST API, Sanity, or Contentful | Depends on existing CMS — tool adapter pattern |
| Job Scheduling | Trigger.dev or Mastra cron | Durable execution for scheduled jobs |
| Notifications | Slack API + email | Human review notifications |
| Hosting | Vercel (app) + Railway/Render (workers) | Serverless for app, persistent processes for agents |
| Monitoring | Mastra Studio + custom observability | Agent tracing, workflow debugging |
9. Build Phases
Phase 1: Foundation (Weeks 1-3)
- PostgreSQL schema setup (all tables defined in Section 2)
- Mastra project scaffolding with agent/tool/workflow structure
- Basic Next.js app shell with authentication
- Google Search Console API integration (tool)
- Semrush/Ahrefs API integration (tool)
- Competitor sitemap crawler (scheduled job)
- Competitor page scraper (scheduled job)
- Our content inventory ingestion + embedding pipeline
Phase 2: Strategy Layer (Weeks 4-6)
- Competitive Intelligence Agent
- Search Landscape Agent
- Content Strategy Agent (plan generation)
- Content Brief Agent
- Content Calendar UI (view, edit, approve)
- Suspend/resume workflow for plan approval
- Dashboard v1 (basic metrics)
Phase 3: Content Layer (Weeks 7-9)
- Writer Agent with dynamic prompt assembly
- Editor Agent with revision loop
- Content Layer workflow with branching
- Content Editor / Review UI
- Brand voice configuration UI
- Suspend/resume workflow for human review
Phase 4: Production Layer (Weeks 10-12)
- Image Generation Agent
- Final Edits Agent
- Programmatic SEO Validation engine (10 checks)
- Publishing Agent (CMS integration)
- Post-publish monitoring jobs
- SEO Audit View UI
Phase 5: Polish & Automation (Weeks 13-15)
- Full end-to-end workflow testing
- Internal linking automation (bidirectional)
- AI Overview monitoring
- Competitor change alert system
- Performance dashboards
- Prompt optimization based on output quality data
- Documentation and runbooks
Phase 6: Optimization (Ongoing)
- Mastra eval framework integration (measure content quality over time)
- A/B test different agent prompts and models
- Cost optimization (model routing based on task complexity)
- Scale testing (50+ pieces/month throughput)
- Content refresh pipeline (automated identification and updating of stale content)
10. Cost Estimation (Monthly, at Scale)
| Cost Category | Estimate (50 pieces/month) | Notes |
|---|---|---|
| LLM API (Claude Sonnet) | $200-400 | ~$4-8 per piece across all agents |
| SEO Data API (Semrush) | $119-229 | Business plan for API access |
| Image Generation | $50-100 | ~$1-2 per piece for 2-3 images each |
| Hosting (Vercel + Railway) | $50-100 | App + background workers |
| PostgreSQL (managed) | $25-50 | Neon, Supabase, or Railway Postgres |
| Google Search Console | Free | API access included |
| Total | ~$450-880/month |
This compares favorably to manual content production costs of $200-500+ per piece at the same quality level.
11. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| LLM output quality inconsistency | High | Medium | Multi-agent review pipeline + human gate + programmatic checks |
| Google algorithm update targeting AI content | Medium | High | 80/20 human-AI method ensures genuine expertise in every piece |
| API cost overruns at scale | Medium | Medium | Token budget per piece, model routing (cheaper models for simple tasks), caching |
| Competitor data accuracy (scraping failures) | Medium | Low | Fallback to API data, alerting on crawl failures, manual override |
| Hallucination in published content | Medium | High | Fact-check agent + human review + RAG grounding + source citation requirements |
| Over-optimization (content feels robotic) | Medium | Medium | Brand voice samples, editor agent voice consistency checks, human tone review |
| Workflow complexity / debugging difficulty | Medium | Medium | Mastra Studio tracing, comprehensive logging, workflow versioning |
| CMS integration fragility | Low | Medium | Adapter pattern — swap CMS connectors without changing pipeline logic |
This document is a living specification. It should be updated as architectural decisions are made during implementation and as the system evolves through testing and production use.