SimpleNews Agent System
A sophisticated multi-agent pipeline for autonomous AI news research, writing, and publication.
Table of Contents
- Overview
- System Architecture
- Agent Pipeline
- Research Agent
- Writer Agent
- Enrichment Agent
- Publishing Pipeline
- Data Models
- External Integrations
- Cost Optimization
- Quality Assurance
Overview
SimpleNews employs a multi-agent orchestration system that autonomously discovers newsworthy AI topics, generates original articles, and publishes them with full editorial quality controls. The system leverages Claude's Agent SDK for intelligent research, structured output generation for consistent article quality, and semantic deduplication to prevent content overlap.
Key Capabilities
| Capability | Implementation |
|---|---|
| Autonomous Research | Claude Agent SDK with 14 MCP tools |
| Multi-Platform Discovery | X, HN, Reddit, GitHub, arXiv |
| Article Generation | Claude Sonnet with structured output |
| Link Enrichment | Server-side web search integration |
| Semantic Deduplication | pgvector with 90% similarity threshold |
| ISR Publishing | Next.js incremental static regeneration |
System Architecture
The system follows a sequential pipeline architecture with autonomous decision-making at the research phase.
Directory Structure
1agent/
2├── src/
3│ ├── run.ts # Main orchestration pipeline
4│ ├── research-agent.ts # Claude Agent SDK integration
5│ ├── writer.ts # Article generation
6│ ├── enrichment-agent.ts # Web search enrichment
7│ ├── publisher.ts # Supabase publishing
8│ ├── deduplication.ts # Semantic duplicate check
9│ ├── embeddings.ts # OpenAI embedding calls
10│ ├── scheduler.ts # Interval-based scheduling
11│ └── tools/ # MCP tool implementations
12│ ├── x-search.ts # X/Twitter API
13│ ├── hackernews.ts # HN search
14│ ├── reddit.ts # Reddit API
15│ ├── arxiv.ts # arXiv papers
16│ └── github-trending.tsAgent Pipeline
The complete pipeline executes in 6 distinct phases, with comprehensive error handling and cost tracking at each stage.
Research Agent
The Research Agent is the system's autonomous discovery engine, powered by Claude's Agent SDK with Model Context Protocol (MCP) integration.
Architecture
| Property | Value |
|---|---|
| Model | claude-sonnet-4-5-20250929 |
| Max Turns | 40 |
| Budget | $1.50 per run |
| Output | Structured JSON with validation |
Research Strategy
The agent follows a multi-phase discovery strategy that prioritizes grassroots sources before mainstream outlets:
MCP Tools
The research agent has access to 14 specialized tools:
Diversity Requirements
The agent enforces strict diversity rules to ensure balanced coverage:
| Requirement | Minimum |
|---|---|
| Indie/Community findings | 2 |
| Non-X platform sources | 1 |
| Categories represented | 2+ |
| Max from same company | 3 |
Output Schema
1interface ResearchFinding {
2 topic: string
3 summary: string
4 detailed_context: string // Full briefing for writer
5 why_newsworthy: string
6 category: "models" | "companies" | "research" | "policy" | "tools" | "funding"
7 engagement_level: "viral" | "high" | "moderate"
8 source_type: "mainstream" | "indie" | "community" | "research" | "open_source"
9 source_platforms: string[] // ["x", "hackernews", "reddit", ...]
10 source_posts: {
11 url: string
12 author_handle: string
13 text_snippet: string
14 likes: number
15 retweets: number
16 }[]
17 suggested_tags: string[]
18}Writer Agent
The Writer Agent transforms research findings into polished news articles with consistent structure and quality.
Processing Flow
Article Structure
Each generated article follows a consistent format:
- Opening Paragraph - News lead with key facts
- H2 Sections - Organized by topic
- Bullet Points - For lists and features
- Key Takeaways - 3-5 summarized points
- Citations - Structured source attribution
Output Schema
1interface ArticleDraft {
2 title: string
3 content: string // Markdown format
4 excerpt: string // 1-2 sentences
5 category: string
6 tags: string[]
7 key_takeaways: string[] // 3-5 items required
8 source_urls: string[]
9 citations: {
10 title: string
11 url: string
12 author?: string
13 publication?: string
14 publication_date?: string
15 }[]
16}Enrichment Agent
The Enrichment Agent enhances articles with authoritative inline links and improved citations through web search.
Enrichment Process
Validation Rules
| Check | Requirement |
|---|---|
| Word Count Change | +25% max, -10% min |
| Link Protocol | HTTPS only |
| Anchor Text | Descriptive (no "click here") |
| Content Integrity | Title and key paragraphs preserved |
Publishing Pipeline
The publishing pipeline ensures only unique, high-quality articles reach the database.
ISR Revalidation
After publishing, the system triggers Next.js Incremental Static Regeneration:
- Home Page (
/) - Updated article list - Article Page (
/news/{slug}) - New article accessible
Data Models
Database Schema
E-E-A-T Fields
Google's E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) fields are embedded in the article schema:
| Field | Value |
|---|---|
author_name | "SimpleNews AI" |
author_credentials | "AI Research Assistant" |
methodology | "AI-generated from verified sources" |
citations | Structured JSON array |
fact_check_status | pending | verified | flagged | failed |
External Integrations
API Costs
| Service | Cost |
|---|---|
| Claude API | 15/M output |
| X API | 0.01/user lookup |
| Web Search | $0.01/search |
| OpenAI Embeddings | $0.00002/1K tokens |
| Hacker News | Free |
| Free | |
| arXiv | Free |
| GitHub | Free (rate limited) |
Cost Optimization
The system employs multiple strategies to minimize API costs:
Optimization Techniques
- Caching - X API responses cached for 30 minutes
- Free Tool Priority -
hn_front_page,reddit_rising,github_trendingcalled first - Broad Queries - Single
x_indie_discoverycall with OR query vs. multiple calls - Batch Processing - 3 articles per Claude API call
- Search Limits - Max 5 web searches per article enrichment
- Budget Enforcement - $1.50 hard limit per research run
Quality Assurance
Multi-Layer Quality Controls
Error Handling
| Component | Strategy |
|---|---|
| Research Agent | Credit balance retry (3 failures = abort) |
| Writer Agent | 3 retries with exponential backoff |
| Enrichment Agent | Graceful degradation (return original) |
| Publisher | Continue with remaining articles on failure |
Summary
The SimpleNews Agent System demonstrates a production-grade approach to AI-powered content generation:
- Autonomous Discovery - Claude Agent SDK enables intelligent tool selection
- Multi-Platform Coverage - 7+ discovery sources with diversity enforcement
- Quality Assurance - Structured outputs, validation layers, semantic dedup
- Cost Efficiency - Caching, free tool priority, budget caps
- Full Transparency - Cost tracking, audit trails, source attribution
The system produces 5-10 original news articles per run, each with proper citations, enriched links, and E-E-A-T compliance for SEO best practices.