Backed byY Combinator

Historical tick-level orderbook data for Kalshi and Polymarket

Every orderbook delta and trade on Kalshi and Polymarket, millisecond-stamped and normalized into one schema. Paired with prediction market reference data that resolves the same outcome to a single canonical ID across venues.

Federal Reserve Research · FEDS 2026-010

“Kalshi provides a statistically significant improvement over the Bloomberg consensus forecast”

on headline CPI

“A perfect forecast record on the day before the FOMC meeting, which represents a statistically significant improvement over the fed funds futures forecast”

Diercks (Federal Reserve Board), Katz (Northwestern), Wright (Johns Hopkins & NBER). February 2026. Read the paper

The data is informative. The problem is getting it into a research workflow. Two venues, incompatible schemas, no unified history, no cross-venue mapping. We solve that.

The data is fragmented across venues

The same real-world outcome has different IDs, different labels, and different price formats on Kalshi versus Polymarket. A presidential candidate, a Super Bowl winner, an FOMC rate decision: each is a different ticker on Kalshi, a different conditionId and token on Polymarket, and a different label in each venue’s UI. Tickers rotate per event. Token IDs change per market.

To get a unified view across venues, you need a verified mapping between every venue-specific identifier for every outcome. And you need to have been maintaining it from the start. There is no backfill API. The historical record only exists if someone was capturing and curating it in real time.

Single-venue feeds exist. Nobody maintains the cross-venue mapping. That's what we do.

One symbology across venues

Our prediction market reference data is a canonical registry of events, entities, and settlement rules. The same outcome on Kalshi and Polymarket resolves to a single Oddpool ID, with full provenance back to each venue’s raw data. It is the reference data layer that ties the orderbook archive together.

Match

Two venues, one event

The same outcome is listed under different titles, tickers, and IDs on each venue. We pair them deterministically with a verifier-confirmed match and a recorded confidence score.

KalshiKXANIMEBD-26
2026 Crunchyroll Anime Award for Best Drama Anime?
Polymarketanime-awards-best-drama-anime-winner
Anime Awards: Best Drama Anime Winner
institutional_event
OPI:KXANIMEBD-26
5 outcomes paired across both venues
Entities

One person, every spelling

Surface-form drift across venues, languages, and case collapses into one canonical Oddpool ID. Aliases are append-only so historical references still resolve.

Luiz Inácio Lula da Silvaalias
Lula da Silvaalias
lula da silvaalias
symbology_entity
OPE:PERSON:LUIZ-INACIO-LULA-DA-SILVA
8 outcomes · 3 events · 5 domain tags
Settlement fidelity

Most pairs aren’t truly fungible

On a sample of 250 matched events, only about a third are settlement-fungible across both venues. The rest carry a real settlement difference, recorded explicitly so it isn’t mistaken for a clean spread.

Identical34%
Settlement window33%
Settlement universe33%
Window: different deadline. Universe: different bucket boundaries or candidate set.

Historical tick-level orderbook data

Raw per-venue microstructure for every tracked Kalshi and Polymarket market, normalized into a single schema. Not scraped headlines or hourly snapshots. The full book, every tick, with millisecond timestamps.

Full orderbook + trades

Every orderbook delta (signed price-level changes), every trade execution (price, quantity, taker side), every snapshot. Per-venue bid/ask with depth in USD so you can see the liquidity behind every probability and know whether a price is real or a thin-book artifact.

Compounding historical archive

Tick-level Kalshi and Polymarket orderbook history for every tracked market since we started subscribing. Neither exchange offers historical backfill. This data only exists because someone was capturing it in real time.

Cross-venue normalization

Unified market_id, yes/no sides, decimal dollar prices, and millisecond timestamps across Kalshi and Polymarket. Query the same market across venues in one statement. The cross-venue mapping that powers this is the symbology layer, covered below.

Sample orderbook data

FOMC June 2026 — HoldSame event, both venues
Apr 29 · 0 snapshots · 0 trades
Apr 29, 12:00

Loading orderbook data (Apr 29, 5am-9am PT)...

Who uses this data

Trading desks

  • A structurally different signal alongside futures, options, and polls.
  • Read full book depth before sizing in so you know the edge survives the quote.
  • Compare the same outcome on Kalshi and Polymarket in one query.

Market makers

  • Calibrate spreads and depth-based skew against full tick history.
  • Backtest quoting on the same data structure you would run live.
  • Cross-venue hedge ratios are derivable from the reference layer, not guessed.

Research

  • Run event studies on real microstructure, not aggregates.
  • Versioned reference data means a condition_id you cite today still resolves next quarter.
  • Reproduce results months later from the same partitioned archive.

Technical specifications

Infrastructure

Coverage

Every market and event across Kalshi and Polymarket, excluding parlays.

Data types

Orderbook snapshots, orderbook deltas (every price-level change, signed quantity), trade executions (price, quantity, taker side).

Normalization

Unified market_id, yes/no sides, decimal dollar prices, millisecond timestamps.

Storage format

Hive-partitioned Parquet. No proprietary format.

Historical archive

Compounding daily. No exchange backfill exists.

Query engines

DuckDB, Polars, Spark, and Snowflake read the Parquet directly. No proprietary client.

Delivery

REST + WebSocket

Real-time and historical queries. Dedicated API keys.

S3 Parquet

Daily Hive-partitioned files to your bucket.

DuckDB / Polars / Spark

Query the Parquet directly. No ingest pipeline.

Snowflake & custom

Snowflake share, SFTP, or other formats.

Book a call

See sample prediction market data and how it fits into your research workflow.

Prefer email? Reach us at founders@oddpool.com