Historical tick-level orderbook data
for Kalshi and Polymarket
Every orderbook delta and trade on Kalshi and Polymarket, millisecond-stamped and normalized into one schema. Paired with prediction market reference data that resolves the same outcome to a single canonical ID across venues.
Federal Reserve Research · FEDS 2026-010
“Kalshi provides a statistically significant improvement over the Bloomberg consensus forecast”
on headline CPI
“A perfect forecast record on the day before the FOMC meeting, which represents a statistically significant improvement over the fed funds futures forecast”
Diercks (Federal Reserve Board), Katz (Northwestern), Wright (Johns Hopkins & NBER). February 2026. Read the paper
The data is informative. The problem is getting it into a research workflow. Two venues, incompatible schemas, no unified history, no cross-venue mapping. We solve that.
The data is fragmented across venues
The same real-world outcome has different IDs, different labels, and different price formats on Kalshi versus Polymarket. A presidential candidate, a Super Bowl winner, an FOMC rate decision: each is a different ticker on Kalshi, a different conditionId and token on Polymarket, and a different label in each venue’s UI. Tickers rotate per event. Token IDs change per market.
To get a unified view across venues, you need a verified mapping between every venue-specific identifier for every outcome. And you need to have been maintaining it from the start. There is no backfill API. The historical record only exists if someone was capturing and curating it in real time.
Single-venue feeds exist. Nobody maintains the cross-venue mapping. That's what we do.
One symbology across venues
Our prediction market reference data is a canonical registry of events, entities, and settlement rules. The same outcome on Kalshi and Polymarket resolves to a single Oddpool ID, with full provenance back to each venue’s raw data. It is the reference data layer that ties the orderbook archive together.
Two venues, one event
The same outcome is listed under different titles, tickers, and IDs on each venue. We pair them deterministically with a verifier-confirmed match and a recorded confidence score.
One person, every spelling
Surface-form drift across venues, languages, and case collapses into one canonical Oddpool ID. Aliases are append-only so historical references still resolve.
Most pairs aren’t truly fungible
On a sample of 250 matched events, only about a third are settlement-fungible across both venues. The rest carry a real settlement difference, recorded explicitly so it isn’t mistaken for a clean spread.
Historical tick-level orderbook data
Raw per-venue microstructure for every tracked Kalshi and Polymarket market, normalized into a single schema. Not scraped headlines or hourly snapshots. The full book, every tick, with millisecond timestamps.
Full orderbook + trades
Every orderbook delta (signed price-level changes), every trade execution (price, quantity, taker side), every snapshot. Per-venue bid/ask with depth in USD so you can see the liquidity behind every probability and know whether a price is real or a thin-book artifact.
Compounding historical archive
Tick-level Kalshi and Polymarket orderbook history for every tracked market since we started subscribing. Neither exchange offers historical backfill. This data only exists because someone was capturing it in real time.
Cross-venue normalization
Unified market_id, yes/no sides, decimal dollar prices, and millisecond timestamps across Kalshi and Polymarket. Query the same market across venues in one statement. The cross-venue mapping that powers this is the symbology layer, covered below.
Sample orderbook data
Loading orderbook data (Apr 29, 5am-9am PT)...
Who uses this data
Trading desks
- A structurally different signal alongside futures, options, and polls.
- Read full book depth before sizing in so you know the edge survives the quote.
- Compare the same outcome on Kalshi and Polymarket in one query.
Market makers
- Calibrate spreads and depth-based skew against full tick history.
- Backtest quoting on the same data structure you would run live.
- Cross-venue hedge ratios are derivable from the reference layer, not guessed.
Research
- Run event studies on real microstructure, not aggregates.
- Versioned reference data means a condition_id you cite today still resolves next quarter.
- Reproduce results months later from the same partitioned archive.
Technical specifications
Coverage
Every market and event across Kalshi and Polymarket, excluding parlays.
Data types
Orderbook snapshots, orderbook deltas (every price-level change, signed quantity), trade executions (price, quantity, taker side).
Normalization
Unified market_id, yes/no sides, decimal dollar prices, millisecond timestamps.
Storage format
Hive-partitioned Parquet. No proprietary format.
Historical archive
Compounding daily. No exchange backfill exists.
Query engines
DuckDB, Polars, Spark, and Snowflake read the Parquet directly. No proprietary client.
REST + WebSocket
Real-time and historical queries. Dedicated API keys.
S3 Parquet
Daily Hive-partitioned files to your bucket.
DuckDB / Polars / Spark
Query the Parquet directly. No ingest pipeline.
Snowflake & custom
Snowflake share, SFTP, or other formats.
Book a call
See sample prediction market data and how it fits into your research workflow.
Prefer email? Reach us at founders@oddpool.com