SimpleNews Agent System

A sophisticated multi-agent pipeline for autonomous AI news research, writing, and publication.

Overview
System Architecture
Agent Pipeline
Research Agent
Writer Agent
Enrichment Agent
Publishing Pipeline
Data Models
External Integrations
Cost Optimization
Quality Assurance

Overview

SimpleNews employs a multi-agent orchestration system that autonomously discovers newsworthy AI topics, generates original articles, and publishes them with full editorial quality controls. The system leverages Claude's Agent SDK for intelligent research, structured output generation for consistent article quality, and semantic deduplication to prevent content overlap.

Key Capabilities

Capability	Implementation
Autonomous Research	Claude Agent SDK with 14 MCP tools
Multi-Platform Discovery	X, HN, Reddit, GitHub, arXiv
Article Generation	Claude Sonnet with structured output
Link Enrichment	Server-side web search integration
Semantic Deduplication	pgvector with 90% similarity threshold
ISR Publishing	Next.js incremental static regeneration

System Architecture

The system follows a sequential pipeline architecture with autonomous decision-making at the research phase.

Directory Structure

agent/
├── src/
│   ├── run.ts              # Main orchestration pipeline
│   ├── research-agent.ts   # Claude Agent SDK integration
│   ├── writer.ts           # Article generation
│   ├── enrichment-agent.ts # Web search enrichment
│   ├── publisher.ts        # Supabase publishing
│   ├── deduplication.ts    # Semantic duplicate check
│   ├── embeddings.ts       # OpenAI embedding calls
│   ├── scheduler.ts        # Interval-based scheduling
│   └── tools/              # MCP tool implementations
│       ├── x-search.ts     # X/Twitter API
│       ├── hackernews.ts   # HN search
│       ├── reddit.ts       # Reddit API
│       ├── arxiv.ts        # arXiv papers
│       └── github-trending.ts

Agent Pipeline

The complete pipeline executes in 6 distinct phases, with comprehensive error handling and cost tracking at each stage.

Research Agent

The Research Agent is the system's autonomous discovery engine, powered by Claude's Agent SDK with Model Context Protocol (MCP) integration.

Architecture

Property	Value
Model	`claude-sonnet-4-5-20250929`
Max Turns	40
Budget	$1.50 per run
Output	Structured JSON with validation

Research Strategy

The agent follows a multi-phase discovery strategy that prioritizes grassroots sources before mainstream outlets:

MCP Tools

The research agent has access to 14 specialized tools:

Diversity Requirements

The agent enforces strict diversity rules to ensure balanced coverage:

Requirement	Minimum
Indie/Community findings	2
Non-X platform sources	1
Categories represented	2+
Max from same company	3

Output Schema

interface ResearchFinding {
  topic: string
  summary: string
  detailed_context: string      // Full briefing for writer
  why_newsworthy: string
  category: "models" | "companies" | "research" | "policy" | "tools" | "funding"
  engagement_level: "viral" | "high" | "moderate"
  source_type: "mainstream" | "indie" | "community" | "research" | "open_source"
  source_platforms: string[]    // ["x", "hackernews", "reddit", ...]
  source_posts: {
    url: string
    author_handle: string
    text_snippet: string
    likes: number
    retweets: number
  }[]
  suggested_tags: string[]
}

Writer Agent

The Writer Agent transforms research findings into polished news articles with consistent structure and quality.

Processing Flow

Article Structure

Each generated article follows a consistent format:

Opening Paragraph - News lead with key facts
H2 Sections - Organized by topic
Bullet Points - For lists and features
Key Takeaways - 3-5 summarized points
Citations - Structured source attribution

Output Schema

interface ArticleDraft {
  title: string
  content: string              // Markdown format
  excerpt: string              // 1-2 sentences
  category: string
  tags: string[]
  key_takeaways: string[]      // 3-5 items required
  source_urls: string[]
  citations: {
    title: string
    url: string
    author?: string
    publication?: string
    publication_date?: string
  }[]
}

Enrichment Agent

The Enrichment Agent enhances articles with authoritative inline links and improved citations through web search.

Enrichment Process

Validation Rules

Check	Requirement
Word Count Change	+25% max, -10% min
Link Protocol	HTTPS only
Anchor Text	Descriptive (no "click here")
Content Integrity	Title and key paragraphs preserved

Publishing Pipeline

The publishing pipeline ensures only unique, high-quality articles reach the database.

ISR Revalidation

After publishing, the system triggers Next.js Incremental Static Regeneration:

Home Page (/) - Updated article list
Article Page (/news/{slug}) - New article accessible

Data Models

Database Schema

E-E-A-T Fields

Google's E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) fields are embedded in the article schema:

Field	Value
`author_name`	"SimpleNews AI"
`author_credentials`	"AI Research Assistant"
`methodology`	"AI-generated from verified sources"
`citations`	Structured JSON array
`fact_check_status`	pending \| verified \| flagged \| failed

External Integrations

API Costs

Service	Cost
Claude API	$3/M input,$ 15/M output
X API	$0.005/post,$ 0.01/user lookup
Web Search	$0.01/search
OpenAI Embeddings	$0.00002/1K tokens
Hacker News	Free
Reddit	Free
arXiv	Free
GitHub	Free (rate limited)

Cost Optimization

The system employs multiple strategies to minimize API costs:

Optimization Techniques

Caching - X API responses cached for 30 minutes
Free Tool Priority - hn_front_page, reddit_rising, github_trending called first
Broad Queries - Single x_indie_discovery call with OR query vs. multiple calls
Batch Processing - 3 articles per Claude API call
Search Limits - Max 5 web searches per article enrichment
Budget Enforcement - $1.50 hard limit per research run

Quality Assurance

Multi-Layer Quality Controls

Error Handling

Component	Strategy
Research Agent	Credit balance retry (3 failures = abort)
Writer Agent	3 retries with exponential backoff
Enrichment Agent	Graceful degradation (return original)
Publisher	Continue with remaining articles on failure

Summary

The SimpleNews Agent System demonstrates a production-grade approach to AI-powered content generation:

Autonomous Discovery - Claude Agent SDK enables intelligent tool selection
Multi-Platform Coverage - 7+ discovery sources with diversity enforcement
Quality Assurance - Structured outputs, validation layers, semantic dedup
Cost Efficiency - Caching, free tool priority, budget caps
Full Transparency - Cost tracking, audit trails, source attribution

The system produces 5-10 original news articles per run, each with proper citations, enriched links, and E-E-A-T compliance for SEO best practices.

SimpleNews Agent System

Table of Contents

Overview

Key Capabilities

System Architecture

Directory Structure

Agent Pipeline

Research Agent

Architecture

Research Strategy

MCP Tools

Diversity Requirements

Output Schema

Writer Agent

Processing Flow

Article Structure

Output Schema

Enrichment Agent

Enrichment Process

Validation Rules

Publishing Pipeline

ISR Revalidation

Data Models

Database Schema

E-E-A-T Fields

External Integrations

API Costs

Cost Optimization

Optimization Techniques

Quality Assurance

Multi-Layer Quality Controls

Error Handling

Summary