Research Platform

SwarmOS

A collective intelligence research platform that coordinates large populations of specialized AI agents to perform continuous, audited scientific research — treating research as a closed-loop control system rather than a sequence of isolated papers.

The thesis

Research quality scales with the quality of the substrate, not the quantity of agents.

If the substrate is correct — if every artifact has a hash, every claim has evidence, every result has a reproduction bundle, and every verification has a grade — then adding more agents, more compute, and more data strictly improves outcomes. If the substrate is wrong, more agents just means more garbage, faster.

Modern research is bottlenecked by fragmented evidence spanning papers, datasets, code, and instruments; combinatorial search spaces that defeat brute-force exploration; reproducibility breakdowns that erode trust; human bandwidth limits that prevent synthesis; and non-stationarity that makes static models decay. AI agents can help — but only within a system that enforces provenance, verification, and structured reasoning.

Core principles

These are architectural constraints, not aspirations. They are enforced by the system, not by convention.

Artifact-First

Everything is an artifact with a content hash, lineage, and permissions. Re-ingesting unchanged source produces identical hashes — deduplication is automatic and provenance is auditable.

Evidence Graphs Over Chat Logs

The knowledge layer is a structured claim graph where each claim has a formal statement, supporting and counter-evidence, a calibrated confidence distribution, and a verification grade.

Separation of Duties

The proposer of a hypothesis cannot verify it. Canonical promotion requires at least one critic pass, one replicator pass, and arbiter approval — with optional human gating by risk tier.

Reproducibility by Default

Every computational run produces an environment spec, command, parameters, seeds, input artifacts, output artifacts, and replay instructions. Runs can be replayed from their bundle.

Budgeted Autonomy

Agents operate within explicit compute, cost, and risk budgets. High-risk actions require human approval gates. The system tracks consumption at program, task, and agent level.

Continuous Evaluation

Agents must continuously outperform baselines on agreed metrics. The evaluation service detects regressions and can demote underperforming agents — evolutionary pressure without manual curation.

The research loop

SwarmOS implements a continuous cycle that runs per-program. Each step is decomposed into typed tasks, assigned to specialized agents, executed within budgets, and verified by independent agents.

Observe Ingest new data, papers, experimental results
Model Update knowledge graph and claim graph
Propose Generate hypotheses, design experiments
Test Execute simulations, computations, analyses
Verify Replicate results, adversarial critiques
Update Promote verified claims, deprecate refuted

The cycle runs continuously, not on a paper-publishing cadence. Publish, monitor, and priority adjustment happen alongside every iteration.

Agent model

SwarmOS defines a minimum viable set of agent types, each with a clear role. Every agent must consume typed task inputs, produce typed outputs with metadata, emit structured logs and citations, declare uncertainties and failure modes, and operate within its assigned budget and tool permissions.

Ingestion Parse, clean, extract, normalize incoming data
Mapper Build concept maps and knowledge graph edges
Hypothesis Propose claims with assumptions and test plans
Experiment Designer Design computational and experimental tests
Runner Execute simulations, training, theorem proving
Critic Adversarial review — find confounders, missing citations
Replicator Reproduce results independently
Synthesizer Write periodic syntheses and "what changed" diffs
Arbiter Resolve conflicts, choose next actions via VOI
Safety Sentinel Monitor for policy violations and dual-use patterns

Debate Protocol (Propose-Critique-Judge)

  1. Proposer outputs a claim with assumptions, predicted signatures, and a test plan
  2. Critics output strongest counterarguments, alternative explanations, and missing evidence
  3. Arbiter decides: accept, revise, or reject — and prioritizes next experiments by expected information gain

What is live now

SwarmOS MVP is operational. The following infrastructure is built and functional.

13 Packages Shared, database, events, auth, 8 domain services, API gateway
30+ API Endpoints Auth, programs, artifacts, claims, tasks, runs, uploads
10 Agent Types Ingestion through safety sentinel, each with typed contracts
5 Verification Grades A (replicated), B (critiqued + replicated), C (critiqued), contested, unknown
Redis Streams Event Bus CloudEvents 1.0 envelopes, consumer groups, at-least-once delivery
Content-addressed Storage S3/MinIO with multipart upload, presigned URLs, automatic deduplication

Architecture

SwarmOS is a TypeScript monorepo using pnpm workspaces and Turborepo. It follows a layered, service-oriented architecture where each service owns its domain logic and data, connected by a typed event bus. Every service defines a repository interface — concrete implementations are injected at the container level, making services testable in isolation and storage-agnostic.

Program Service

Research program lifecycle, membership management, objectives, milestones, budgets, and risk posture.

Artifact Service

Content-addressed deduplication, object storage (S3/MinIO), multipart uploads with presigned URLs, lineage tracking.

Knowledge Service

Structured claim graph with evidence, counterevidence, verification grades (A/B/C/contested/unknown), and canonical promotion rules.

Task Service

DAG-based task lifecycle with validated state machine: queued → running → done/blocked/failed, retry logic, and approval gates.

Compute Service

Run execution tracking with environment specs, parameters, seeds, output artifacts, cost metrics, and deterministic replay.

Agent Runtime

Agent registry, lease-based task assignment with heartbeating, reputation scoring, and stale lease expiration.

Observability

Immutable audit logging with actor identity, timestamps, and resource references. Queryable by program, resource, action, and time range.

Eval Service

Agent benchmarking and performance scoring: success rate, cost efficiency, duration metrics, and regression detection.

Infrastructure

PostgreSQL 16 + pgvector

Primary database with vector search capability. Drizzle ORM schema with 13 tables: users, programs, artifacts, claims, tasks, runs, agents, leases, audit log, uploads, and idempotency keys.

Redis 7 (Streams)

Event bus using Redis Streams with consumer groups for at-least-once delivery. CloudEvents 1.0 envelopes provide decoupled coordination and a natural audit trail.

MinIO / S3

Content-addressed object storage with multipart upload support. Presigned URLs for direct client uploads. Automatic deduplication via content hashing.

Fastify API Gateway

30+ endpoints with CORS, rate limiting, JWT authentication, RBAC guards on admin endpoints, and program membership guards on resources.

JWT / RBAC / ABAC

HS256 tokens with role hierarchy (admin through operator). Attribute-based access control for fine-grained permissions. Data classification tiers with enforcement.

Sandbox Execution

No outbound network by default, resource quotas, filesystem isolation. Prompt injection mitigation — extracted content treated as untrusted input, separated from control instructions.

Target domains

SwarmOS provides domain-specific workflow templates for the hardest class of scientific and societal challenges.

Biology Causal graph + intervention loop: build multi-omics knowledge graph, propose causal subgraphs, prioritize interventions, simulate, update causal model.
Civilization Dynamics Policy stress test: formalize objectives, build incentive models, run adversarial scenarios, compute robustness.
Materials Discovery Candidate funnel: generate candidates, run HPC simulations, rank with uncertainty, replicate, produce lab handoff.
Deep History Cross-modal consilience: ingest archaeology + aDNA + texts + climate proxies, propose competing narratives, test with cultural evolution models.
Paradigm Science Conjecture-proof-refute engine: mine patterns, generate conjectures, attempt proofs/counterexamples, track theory compression.

Safety & trust

Defense in depth, not defense by hope.

  • Data classification tiers (public/internal/restricted/secret) with ABAC enforcement
  • Sandbox execution with no outbound network by default, resource quotas, and filesystem isolation
  • Prompt injection mitigation — extracted content treated as untrusted input, separated from control instructions
  • Dual-use monitoring — safety sentinel agents and policy gates for sensitive domains
  • Kill switches — rate limits, anomaly detection, and the ability to quarantine programs or agents
  • Immutable audit ledger — every action logged with actor identity, timestamp, and resource references

Roadmap

2026 Q1
Platform MVP

Full TypeScript monorepo with 13 packages. API gateway with 30+ endpoints. JWT/RBAC/ABAC authentication. In-memory repository layer for all services. Artifact upload pipeline with S3 multipart support. Task state machine with validated transitions. Claim graph with evidence/counterevidence/verification. Agent registry with lease management. Audit logging.

2026 Q2
Database & Event Integration

Connect PostgreSQL via Drizzle repositories. Run database migrations. Redis Streams event bus integration in services. Semantic search via pgvector embeddings. Agent orchestration engine (task routing, scheduling).

2026 Q3
Agent Orchestration & UI

WebSocket real-time updates. UI dashboard for program directors and scientists. Comprehensive test suite. OpenAPI/Swagger documentation. CI/CD pipeline.

2026 Q4
Domain Templates & Scale

Biology, civilization dynamics, materials discovery, and deep history workflow templates. Multi-program orchestration. Community deployment documentation.

If you only get one thing "perfect," make it this chain:

Artifact → Claim → Evidence → Reproduction Bundle → Verification Grade

Everything else — agents, orchestration, prioritization — becomes dramatically easier once the substrate is correct.

Get involved

SwarmOS is being built in public. The platform is functional and actively developing toward full database integration and agent orchestration.

More agents is not the answer.

A correct substrate is.

Ask