MDX Limo
Agentic-First CRM: A First-Principles Design

Agentic-First CRM: A First-Principles Design

1. What is a CRM, actually?

Strip away decades of accumulated features and a CRM is three things:

  1. A memory of who you've talked to and what was said
  2. A state machine tracking where each relationship stands
  3. A work queue telling someone (or something) what to do next

Everything else — pipelines, custom fields, dashboards, reports — is UI sugar on top of those three primitives. A traditional CRM exists because humans need structured prompts to remember to follow up. An agentic CRM exists because agents need structured state to reason over and act on.

This reframing changes what matters. The question isn't "what fields does a salesperson want to see?" It's "what does an agent need to know to decide the next action, and what does it need to write back after taking it?"

2. The first-principles cut

If agents are the primary actors and humans are the reviewers, three things become true that aren't true in a traditional CRM:

Schema rigidity is a liability, not an asset. Traditional CRMs encode workflow as columns (lead_status, mql_score, sql_date) because humans need consistent forms. Agents read JSON. Every column you add is a column an agent has to be taught about. Lean hard on JSONB and let agents structure their own observations.

The timeline is the source of truth, not derived state. In a traditional CRM, "stage = qualified" is the truth and the activity log is supporting evidence. In an agentic CRM, the stream of events is the truth and stage is just a cached projection an agent maintains. If you lost every status field tomorrow, an agent should be able to reconstruct them from the event log.

Every write needs provenance. When humans are the only writers, "updated_by = user_id" suffices. When agents are writing constantly, you need to know which agent, on which run, triggered by which event, with what confidence, and whether a human approved it. Provenance is not optional metadata — it's load-bearing.

3. The minimum viable schema

Eight tables. That's it. Everything else is a projection, a cache, or premature.

3.1 The identity layer (2 tables)

1entity( 2 id, kind, -- 'person' | 'organization' 3 display_name, 4 attrs_jsonb, -- everything else: name parts, title, industry, etc. 5 created_at, updated_at, deleted_at 6) 7 8identity( 9 id, entity_id, 10 channel, -- 'email' | 'phone' | 'linkedin' | 'domain' 11 value, normalized_value, 12 is_primary, verified, 13 created_at 14)

Why one entity table instead of separate person and organization: Agents don't care about the distinction often enough to justify two tables, two sets of joins, two polymorphic patterns. A kind discriminator plus JSONB attrs covers 95% of cases. Relationships between entities (person works at org) become rows in link (below).

Why no separate "lead" or "prospect" table: Lifecycle is a state, not a record type. It lives in attrs_jsonb or as a projection an agent maintains. The classic Lead/Contact split exists because legacy CRMs needed pre-qualification and post-qualification forms to look different. Agents don't need different forms.

3.2 The relationship layer (1 table)

1link( 2 id, 3 from_entity_id, to_entity_id, 4 kind, -- 'works_at' | 'champion_of' | 'reports_to' | 'parent_of' 5 attrs_jsonb, -- title, role, started_at, ended_at, strength 6 created_at, deleted_at 7)

One generic edge table replaces person_org_role, opportunity_party, deal_contacts, and the entire web of M:N junction tables in traditional CRMs. Direction is encoded in the kind field's semantics. JSONB carries edge metadata.

This is the hardest pill to swallow if you've designed traditional schemas, but it's correct: a CRM is fundamentally a property graph, and forcing it into rigid relational tables is what makes them painful to extend. One edge table, well-indexed, handles every relationship pattern you'll need.

3.3 The event layer (1 table — the heart of the system)

1event( 2 id, 3 occurred_at, 4 kind, -- 'email_sent' | 'email_replied' | 'meeting_held' | 5 -- 'note_added' | 'stage_changed' | 'agent_observation' | ... 6 subject_entity_id, -- who/what this is about 7 related_entity_ids, -- array; other entities involved 8 actor_kind, -- 'human' | 'agent' | 'system' | 'external' 9 actor_id, -- user_id or agent_run_id 10 source, -- 'smartlead' | 'heyreach' | 'gmail' | 'ui' | 'agent:reply_classifier' 11 payload_jsonb, -- the full event content 12 ingested_at 13)

This is the only table that matters. Everything else is convenience.

Every email sent, every reply received, every meeting booked, every stage change, every agent inference, every human note — all one table. Insert-only. Append-only. Immutable.

Why this works:

  • Webhooks from SmartLead/HeyReach become event rows with source='smartlead' and kind='email_replied'.
  • Agent observations ("I think this lead is hot based on their reply") become event rows with actor_kind='agent' and kind='agent_observation'.
  • Status changes become event rows with kind='stage_changed' and payload={old, new}.
  • The unified timeline view is just SELECT * FROM event WHERE subject_entity_id = ? ORDER BY occurred_at DESC.

The payload_jsonb is the escape hatch. New event types require zero schema changes — agents can invent event kinds as they need them, and downstream consumers either understand the kind or ignore it.

3.4 The state projection layer (1 table)

1projection( 2 entity_id, 3 key, -- 'lifecycle_stage' | 'deal_amount' | 'next_action_at' | 4 -- 'reply_sentiment' | 'engagement_score' | ... 5 value_jsonb, 6 computed_at, 7 computed_by, -- agent or system that wrote this 8 source_event_ids, -- which events this projection is derived from 9 confidence -- 0.0–1.0 10)

This is what replaces every status column, every score field, every "stage" enum in a traditional CRM. Projections are cached interpretations of the event log. They're written by agents (or simple deterministic rules) and they're always traceable back to the events that produced them.

Why this is better than columns:

  • Multiple agents can hold opinions. A "lifecycle_stage" projection can have one row from a deterministic rule and another from a Claude-based classifier, with different confidences. The UI picks which to show.
  • Projections are debuggable. "Why is this lead marked qualified?" → look at source_event_ids. Done. No more guessing what automation flipped a flag three weeks ago.
  • Projections are rebuildable. If you change your scoring logic, you re-run the projection from the event log. The truth (events) is preserved.
  • Projections are cheap to invalidate. Delete and recompute. No migrations.

The only thing you give up: SQL queries get a little more verbose. WHERE stage = 'qualified' becomes WHERE EXISTS (SELECT 1 FROM projection WHERE key='lifecycle_stage' AND value_jsonb->>'stage' = 'qualified'). Build a view, move on.

3.5 The work layer (1 table)

1task( 2 id, 3 subject_entity_id, 4 kind, -- 'send_email' | 'review_reply' | 'enrich_contact' | 'human_review' | ... 5 status, -- 'pending' | 'claimed' | 'done' | 'failed' | 'cancelled' 6 priority, -- 0–100 7 assignee_kind, -- 'human' | 'agent' 8 assignee_id, -- user_id or agent name 9 scheduled_for, -- when this should run 10 input_jsonb, -- everything the executor needs 11 output_jsonb, -- result, populated on completion 12 parent_task_id, -- for sub-tasks 13 triggered_by_event_id, -- what created this task 14 requires_approval, 15 approved_by, approved_at, 16 created_at, started_at, completed_at 17)

This is the work queue. Humans and agents pull from the same table. A task can produce events (which can trigger more tasks). A task can spawn child tasks. A task can require human approval before its output gets applied.

Critically: tasks are not the same as events. Events are things that happened. Tasks are things that should happen. An agent doesn't write to event directly when it decides "we should send a follow-up" — it creates a task. When the task runs and the email actually sends, that creates an event.

This separation is what makes the system auditable and reversible.

3.6 The integration layer (2 tables)

1external_ref( 2 entity_id, kind_local, -- 'entity' | 'event' | 'task' 3 local_id, 4 provider, -- 'smartlead' | 'heyreach' | 'gmail' 5 provider_kind, -- 'lead' | 'campaign' | 'message' 6 external_id, 7 synced_at, 8 raw_jsonb 9) 10 11webhook_inbox( 12 id, 13 source, 14 received_at, 15 payload_jsonb, 16 status, -- 'pending' | 'parsed' | 'failed' 17 parsed_event_ids, -- events created from this payload 18 error 19)

webhook_inbox is the buffer between the outside world and your event log. Every webhook lands here raw, then a parser converts it into one or more event rows. This gives you idempotency (dedupe on payload hash), replay (reparse after a parser bug), and debugging (the raw payload is preserved forever).

external_ref is the bidirectional sync map. When SmartLead tells you "campaign 12345 sent email to lead 67890," external_ref tells you that lead 67890 is your entity abc-123. When your CRM wants to add someone to a SmartLead campaign, external_ref tells you their SmartLead ID.

4. The agent layer

Agents aren't a table — they're processes that read from event and projection, and write to task, event (with actor_kind='agent'), and projection. But you do need a place to store their definitions and runs:

1agent( 2 id, name, 3 kind, -- 'classifier' | 'enricher' | 'router' | 'drafter' | ... 4 trigger_event_kinds, -- which event kinds wake this agent up 5 config_jsonb, -- prompt template, model, tools, thresholds 6 is_active 7) 8 9agent_run( 10 id, agent_id, 11 trigger_event_id, 12 subject_entity_id, 13 status, -- 'running' | 'completed' | 'failed' | 'awaiting_approval' 14 input_jsonb, output_jsonb, 15 proposed_actions_jsonb, -- what the agent wants to do (create tasks, update projections) 16 applied_at, -- null until proposals are applied 17 tokens_used, cost_usd, 18 started_at, finished_at, 19 error 20)

The pattern: agents propose, the system disposes. An agent run produces proposed_actions_jsonb describing what it wants to change. A separate apply step (automatic for low-stakes, human-gated for high-stakes) actually performs the writes. This is what makes the system safe for autonomous operation — every agent action is reviewable before it takes effect, and every applied action is traceable back to the run that proposed it.

5. How it all flows: the canonical loop

This is the entire system in motion. Memorize this diagram:

1External tool (SmartLead) emits webhook 23 webhook_inbox row (raw payload preserved) 45 Parser creates event row(s) 67 Event triggers subscribed agents 89 agent_run executes, produces proposed_actions 1011 Apply step (auto or human-approved): 12 ├→ writes new projection rows 13 ├→ creates task rows 14 └→ inserts new event rows (the agent's observations) 1516 Tasks get executed by humans or other agents 1718 Task completion creates more event rows 1920 [loop]

Every loop iteration produces events. Events are the only ground truth. Projections drift, tasks get cancelled, agents change their minds — but the event log is immutable and authoritative.

6. What you get for free with this design

  • Unified timeline: one query against event, filtered by entity.
  • Every CRM "feature" as a projection: lead score, lifecycle stage, deal stage, next action date, engagement level — all derived.
  • Multi-channel suppression: a suppression projection keyed by entity, fed by unsubscribe events from any source.
  • A/B test analysis: group events by payload->variant, count outcomes. No special schema needed.
  • Agent memory: an agent reading an entity's history is SELECT * FROM event WHERE subject_entity_id = ?. Add pgvector to embed events for semantic recall later.
  • Replay and backtest: want to test a new lead-scoring agent? Run it against the historical event log. The events haven't changed; only the projection logic has.
  • Audit trail: built in. Every change is an event with an actor.
  • Cost tracking: agent_run.cost_usd aggregated by agent, day, entity. Free dashboards.

7. What you deliberately do not build

These are the things every CRM accumulates that an agentic-first design rejects, at least initially:

  • No custom field UI. Agents structure JSONB; humans review what agents produced. If a field becomes load-bearing, promote it to a projection.
  • No pipeline editor. Pipelines are projections. A "pipeline view" is SELECT entity_id, value FROM projection WHERE key='deal_stage'. New pipeline = new projection key.
  • No separate Leads / Contacts / Accounts / Opportunities tables. All entity rows with different kind and attrs.
  • No reports module. Projections + a SQL console + an LLM that writes queries covers 90% of reporting needs better than a drag-and-drop builder.
  • No workflow builder UI on day one. Workflows are agents. You write agents in code. When you have ten agents and the patterns are obvious, then build the visual editor — not before.
  • No role-based field-level permissions. Multi-tenant isolation via tenant_id + Postgres RLS is enough until you have an actual second tenant with actual conflicting needs.
  • No notes table, no tasks-as-activities, no email-templates table. Notes are events of kind='note'. Templates are JSONB blobs in agent configs. Resist the urge to model them separately.

The discipline here matters. Every table you add is a table agents need to understand and humans need to maintain. The above eight tables can run a real outbound operation. Add the ninth only when the absence is causing concrete pain.

8. The integration plan for SmartLead, HeyReach, and friends

Concretely, for each tool:

Inbound (tool → CRM):

  1. Tool's webhook hits an endpoint that writes to webhook_inbox.
  2. A parser worker (one per tool) reads pending inbox rows, extracts the relevant fields, and creates event rows. Dedup on (source, external_event_id) to handle retries.
  3. The parser uses external_ref to resolve "SmartLead lead 12345" → "entity abc-123", creating a new entity if no match exists.
  4. The new event triggers any agents subscribed to that event kind.

Outbound (CRM → tool):

  1. An agent or human creates a task with kind='enroll_in_smartlead_campaign' and the entity ID and campaign ID.
  2. A task executor for SmartLead picks it up, calls the SmartLead API, gets back a lead ID.
  3. The executor writes to external_ref mapping your entity to SmartLead's lead, and creates an event of kind enrolled_in_campaign.
  4. From here, future webhook events from SmartLead about that lead will resolve back to your entity automatically.

The unified suppression flow:

  1. Any unsubscribe event from any tool creates an event of kind='unsubscribed'.
  2. A suppression_projector agent watches for these and writes/updates a suppression projection on the entity, keyed by channel.
  3. Before any outbound task executes, it checks the suppression projection. If suppressed, the task fails with reason and creates a suppression_blocked event.
  4. Optionally, a sync agent watches the suppression projection and pushes updates back to all connected tools so other channels honor it too.

This is the entire integration story. Eight tables, one flow pattern, applied per provider.

9. The minimum viable build order

If you're starting from zero and want the leanest path to "this is actually running our outbound":

  1. Week 1: entity, identity, event, webhook_inbox. Wire up SmartLead and HeyReach webhooks. You now have a unified timeline. Already more useful than most CRMs.
  2. Week 2: external_ref, task, basic outbound to SmartLead/HeyReach. You can now enroll people in campaigns from your own UI.
  3. Week 3: projection and your first projector (lifecycle stage from events). Build one view that lists entities by stage. This is your "CRM screen."
  4. Week 4: agent and agent_run, plus one real agent — probably a reply classifier that updates a reply_intent projection on incoming reply events. This is the moment the system becomes "agentic."
  5. Week 5+: Add agents one at a time, each solving a specific pain. Lead scorer. Enricher. Next-action recommender. Draft-writer. Each one is ~200 lines of code plus a config row.

Six weeks to a system that is structurally simpler than HubSpot, integrated with your automation stack, and designed for agents instead of retrofitted for them.

10. The single sentence

An agentic-first CRM is an append-only event log over a generic entity graph, with cached projections written by agents that propose actions, a unified task queue that humans and agents both pull from, and webhook buffers at the edges for safe integration with the outside world — and everything else a traditional CRM contains is either an emergent property of those primitives or a feature you don't actually need.

Want me to turn this into the actual DDL with indexes and constraints, sketch the parser logic for SmartLead and HeyReach specifically, or write the reference implementation of the first agent (reply classifier)?

Agentic-First CRM: A First-Principles Design | MDX Limo