Private beta · Q3·Built with research desks, not for them.

Agentic research,
with discipline.

Coordinate autonomous research agents that generate falsifiable hypotheses, validate them against history, and produce auditable research — without losing statistical discipline.

Request access See it in motion

v0.4private beta⌘Kcommand paletteSOC-2 in progress

Swarm · Liverun / mr-vol-q3

4 agents1 validated

A-01Feature search
412 features · IC > 0.04
passed
A-02Regime detect
HMM(4) · BIC stable
running
A-03Lead/lag scout
Lag ∈ [1, 21] · cross-asset
running
A-04Anomaly scout
17 outlier days · review
rejected

Hypothesis queue3 of 47

#214validated

Mean reversion · vol_q1

#213validating

HY OAS → R2K drawdowns

#212rejected

PEAD decay · post-2018

01Every claim falsifiable

Hypotheses ship with explicit reject criteria — not vibes.

02Validation by construction

Walk-forward folds, leakage scans, and FDR control are mandatory.

03Auditable by default

Every decision an agent makes is recorded and reproducible.

The product

A research console for swarms, not single agents.

Coordinate dozens of research agents across hypotheses, datasets, and experiments — with the same rigor your senior researchers apply by hand.

app.alphaswarm.io / runs / mr-vol-q3

Workspace · Stoa Capital

Research run

Mean reversion in low-volatility regimes

Live · 4 agents2003 — 2024

Research pipeline

RUN · 03h 14m elapsed

01Done
Ingest
12 datasets · 4.1B rows
02Done
Hypothesize
3 candidates
03 Running
Validate
splits + leakage
04Queued
Backtest
walk-forward
05Queued
Memo
auto-draft

Agents

4 active · 8 standby

A-01 running

Feature search

Scanning 412 features for IC > 0.04 in vol_q1

62%

A-02 running

Regime detect

Hidden Markov fit · 4 states · BIC stable

88%

A-03 needs review

Lead/lag scout

Cross-asset CC matrix, lag ∈ [1,21]

100%

A-04 needs review

Anomaly scout

Outlier days flagged: 17 · awaiting human

100%

Hypothesis queue

View all 47 →

#214
Mean reversion in low-vol regimes (SPX intraday)
0.71
validating
#213
Lead/lag: HY OAS → Russell 2000 drawdowns
0.64
validating
#212
Earnings PEAD decay accelerates post-2018
rejected · Coverage bias in dataset
0.58
rejected
#211
Carry-momentum cross in 10Y rates
0.49
promoted

Backtest · B-0098

In-sampleOut-of-sample2003 — 2024

Sharpe

1.42OOS

Hit rate

54.1%OOS

Max DD

−7.3%OOS

Turnover

3.1×annual

Discipline checks

1 warning

Train/test split
Walk-forward · 6 folds
passed
Leakage scan
1 candidate · rolling_std(target)
review
Multiple comparisons
BH-FDR @ 0.10
passed
Regime stability
Sharpe stable across 4 states
passed
Human review
Awaiting sign-off · @priya
pending

Connections

5 sources · 2 environments

Polygon · US equities
MCP
live
FRED macro
MCP
live
Internal · features.parquet
Dataset
linked
S3 · earnings_v3
Dataset
linked
Bloomberg B-PIPE
MCP
re-auth

Research memo · auto-draft

v0.4 · draft

Memo · MR-VOL-Q3

Mean reversion in low-volatility regimes

Across 2003–2018, daily SPX returns exhibit a statistically robust short-horizon mean reversion when realized volatility sits in the lowest quartile. Out-of-sample (2019–2024) the effect persists, with Sharpe of 1.42 net of estimated costs and a max drawdown of −7.3%.

Walk-forward folds remained directionally consistent; leakage scan flagged one candidate feature rolling_std(target) now removed. Recommended for paper promotion pending desk review.

Why Alpha Swarm

Agentic exploration meets quant rigor.

Most agent stacks optimize for plausibility. Quantitative research demands the opposite — falsifiability. Alpha Swarm is built around that constraint.

Hypotheses, not prompts.

Agents don't 'find alpha'. They draft falsifiable hypotheses with explicit assumptions, prior probability estimates, and a planned test.

// hypothesis

P(reversion | vol_q1) > P(reversion)

// test

walk_forward(SPX, k=6, oos=0.25)

// reject if

Sharpe_OOS < 0.5 ∨ FDR > 0.10

// hypothesis

P(reversion | vol_q1) > P(reversion)

// test

walk_forward(SPX, k=6, oos=0.25)

// reject if

Sharpe_OOS < 0.5 ∨ FDR > 0.10

Tests against the real world.

Every claim is run against historical data through a validation layer that enforces train/test separation, leakage scans, and walk-forward folds.

Discipline at runtime.

Anti-overfitting rules — multiple-comparisons correction, regime stability, version-locked features — are part of the runtime, not a checklist.

✓ FDR ≤ 0.10

✓ min splits = 6

✓ feature lock

✓ seed pinned

✓ FDR ≤ 0.10

✓ min splits = 6

✓ feature lock

✓ seed pinned

Auditable traces.

Every decision an agent makes is recorded. Reviewers see which features, splits, and seeds produced a result — never a black box.

›A-02·fitHMM(4)

›validator·splitk=6

›A-04·flagleakage:1

›human·sign@priya

›A-02·fitHMM(4)

›validator·splitk=6

›A-04·flagleakage:1

›human·sign@priya

Architecture

From raw data to falsifiable claims.

A single, opinionated pipeline. Each stage produces an artifact the next stage can verify — so research is composable and auditable end-to-end.

Data sources

Market, fundamentals, alt-data and internal datasets — connected via MCP and immutable dataset versions.

MCPParquetLakehouse

Research agents

Specialized agents propose features, regimes, lead/lag relationships, and anomalies under shared tooling.

FeatureRegimeAnomaly

Hypothesis engine

Free-form ideas are compiled into falsifiable, parameterized hypotheses with explicit reject criteria.

FalsifiableVersioned

Validation layer

Train/test separation, leakage detection, and multiple-comparisons control are enforced before a single backtest runs.

SplitsLeakageFDR

Backtest engine

Walk-forward simulation with realistic costs, capacity assumptions, and regime-stratified performance.

Walk-fwdCostsCapacity

Research memo

Auto-drafted, fully traced memo: assumptions, splits, leakage notes, OOS results, and reviewer sign-off.

TraceSign-off

Discipline

Built to survive out-of-sample.

The hardest part of agentic research isn't generation — it's not fooling yourself. Alpha Swarm encodes the safeguards quant teams already trust, and applies them every time.

Illustrative · OOS pressure test

Naive vs disciplined research

Naive Disciplined

Naive · OOS Sharpe

0.31

Disciplined · OOS Sharpe

1.12

01
Walk-forward validation
Sequential train/test folds across time. No information from the future leaks into the past — by construction.
02
Out-of-sample by default
Every claim ships with a held-out OOS window. The system refuses to publish a memo without it.
03
Experiment registry
Every backtest, seed, and feature set is logged. Duplicate runs are detected and joined to prevent silent re-fitting.
04
Feature & version tracking
Datasets, code, and feature definitions are content-addressed. Memos cite exact hashes — reruns are reproducible to the byte.
05
Multiple-hypothesis correction
Benjamini–Hochberg FDR is applied across the swarm's proposals, not per agent. Volume of search is treated as a cost.
06
Leakage detection
Static and runtime checks flag look-ahead features, target encoding, and survivorship — before a backtest is allowed to run.
07
Human review checkpoints
Promotion to paper or live trading requires a human reviewer to sign the trace. The runtime won't bypass it.
08
Regime stratification
Performance is reported per regime (vol, rates, dispersion). Strategies that work only in one regime are surfaced, not buried.

Get in

Build research systems that survive contact with reality.

Alpha Swarm is currently onboarding a small number of design partners. If you run a research team and care about discipline as much as discovery, we’d like to talk.

Agentic research,with discipline.

Mean reversion in low-volatility regimes

Hypotheses, not prompts.

Tests against the real world.

Discipline at runtime.

Auditable traces.

Walk-forward validation

Out-of-sample by default

Experiment registry

Feature & version tracking

Multiple-hypothesis correction

Leakage detection

Human review checkpoints

Regime stratification

Build research systems that survive contact with reality.

Agentic research,
with discipline.