SwarmOS
A collective intelligence research platform that coordinates large populations of specialized AI agents to perform continuous, audited scientific research — treating research as a closed-loop control system rather than a sequence of isolated papers.
The thesis
Research quality scales with the quality of the substrate, not the quantity of agents.
If the substrate is correct — if every artifact has a hash, every claim has evidence, every result has a reproduction bundle, and every verification has a grade — then adding more agents, more compute, and more data strictly improves outcomes. If the substrate is wrong, more agents just means more garbage, faster.
Modern research is bottlenecked by fragmented evidence spanning papers, datasets, code, and instruments; combinatorial search spaces that defeat brute-force exploration; reproducibility breakdowns that erode trust; human bandwidth limits that prevent synthesis; and non-stationarity that makes static models decay. AI agents can help — but only within a system that enforces provenance, verification, and structured reasoning.
Core principles
These are architectural constraints, not aspirations. They are enforced by the system, not by convention.
Everything is an artifact with a content hash, lineage, and permissions. Re-ingesting unchanged source produces identical hashes — deduplication is automatic and provenance is auditable.
The knowledge layer is a structured claim graph where each claim has a formal statement, supporting and counter-evidence, a calibrated confidence distribution, and a verification grade.
The proposer of a hypothesis cannot verify it. Canonical promotion requires at least one critic pass, one replicator pass, and arbiter approval — with optional human gating by risk tier.
Every computational run produces an environment spec, command, parameters, seeds, input artifacts, output artifacts, and replay instructions. Runs can be replayed from their bundle.
Agents operate within explicit compute, cost, and risk budgets. High-risk actions require human approval gates. The system tracks consumption at program, task, and agent level.
Agents must continuously outperform baselines on agreed metrics. The evaluation service detects regressions and can demote underperforming agents — evolutionary pressure without manual curation.
The research loop
SwarmOS implements a continuous cycle that runs per-program. Each step is decomposed into typed tasks, assigned to specialized agents, executed within budgets, and verified by independent agents.
The cycle runs continuously, not on a paper-publishing cadence. Publish, monitor, and priority adjustment happen alongside every iteration.
Agent model
SwarmOS defines a minimum viable set of agent types, each with a clear role. Every agent must consume typed task inputs, produce typed outputs with metadata, emit structured logs and citations, declare uncertainties and failure modes, and operate within its assigned budget and tool permissions.
Debate Protocol (Propose-Critique-Judge)
- Proposer outputs a claim with assumptions, predicted signatures, and a test plan
- Critics output strongest counterarguments, alternative explanations, and missing evidence
- Arbiter decides: accept, revise, or reject — and prioritizes next experiments by expected information gain
What is live now
SwarmOS MVP is operational. The following infrastructure is built and functional.
Architecture
SwarmOS is a TypeScript monorepo using pnpm workspaces and Turborepo. It follows a layered, service-oriented architecture where each service owns its domain logic and data, connected by a typed event bus. Every service defines a repository interface — concrete implementations are injected at the container level, making services testable in isolation and storage-agnostic.
Research program lifecycle, membership management, objectives, milestones, budgets, and risk posture.
Content-addressed deduplication, object storage (S3/MinIO), multipart uploads with presigned URLs, lineage tracking.
Structured claim graph with evidence, counterevidence, verification grades (A/B/C/contested/unknown), and canonical promotion rules.
DAG-based task lifecycle with validated state machine: queued → running → done/blocked/failed, retry logic, and approval gates.
Run execution tracking with environment specs, parameters, seeds, output artifacts, cost metrics, and deterministic replay.
Agent registry, lease-based task assignment with heartbeating, reputation scoring, and stale lease expiration.
Immutable audit logging with actor identity, timestamps, and resource references. Queryable by program, resource, action, and time range.
Agent benchmarking and performance scoring: success rate, cost efficiency, duration metrics, and regression detection.
Infrastructure
Primary database with vector search capability. Drizzle ORM schema with 13 tables: users, programs, artifacts, claims, tasks, runs, agents, leases, audit log, uploads, and idempotency keys.
Event bus using Redis Streams with consumer groups for at-least-once delivery. CloudEvents 1.0 envelopes provide decoupled coordination and a natural audit trail.
Content-addressed object storage with multipart upload support. Presigned URLs for direct client uploads. Automatic deduplication via content hashing.
30+ endpoints with CORS, rate limiting, JWT authentication, RBAC guards on admin endpoints, and program membership guards on resources.
HS256 tokens with role hierarchy (admin through operator). Attribute-based access control for fine-grained permissions. Data classification tiers with enforcement.
No outbound network by default, resource quotas, filesystem isolation. Prompt injection mitigation — extracted content treated as untrusted input, separated from control instructions.
Target domains
SwarmOS provides domain-specific workflow templates for the hardest class of scientific and societal challenges.
Safety & trust
Defense in depth, not defense by hope.
- Data classification tiers (public/internal/restricted/secret) with ABAC enforcement
- Sandbox execution with no outbound network by default, resource quotas, and filesystem isolation
- Prompt injection mitigation — extracted content treated as untrusted input, separated from control instructions
- Dual-use monitoring — safety sentinel agents and policy gates for sensitive domains
- Kill switches — rate limits, anomaly detection, and the ability to quarantine programs or agents
- Immutable audit ledger — every action logged with actor identity, timestamp, and resource references
Roadmap
Full TypeScript monorepo with 13 packages. API gateway with 30+ endpoints. JWT/RBAC/ABAC authentication. In-memory repository layer for all services. Artifact upload pipeline with S3 multipart support. Task state machine with validated transitions. Claim graph with evidence/counterevidence/verification. Agent registry with lease management. Audit logging.
Connect PostgreSQL via Drizzle repositories. Run database migrations. Redis Streams event bus integration in services. Semantic search via pgvector embeddings. Agent orchestration engine (task routing, scheduling).
WebSocket real-time updates. UI dashboard for program directors and scientists. Comprehensive test suite. OpenAPI/Swagger documentation. CI/CD pipeline.
Biology, civilization dynamics, materials discovery, and deep history workflow templates. Multi-program orchestration. Community deployment documentation.
If you only get one thing "perfect," make it this chain:
Artifact → Claim → Evidence → Reproduction Bundle → Verification Grade
Everything else — agents, orchestration, prioritization — becomes dramatically easier once the substrate is correct.
Get involved
SwarmOS is being built in public. The platform is functional and actively developing toward full database integration and agent orchestration.
More agents is not the answer.
A correct substrate is.