AI lab leaderboard
Anthropic, OpenAI, Google DeepMind, Meta, xAI, Mistral, the Chinese labs — predict who ships what, when, and where it lands on the benchmarks. All in fictional GutCall coins.

How GutCall models the AI race
AI-lab challenges on GutCall resolve against public, citable signals: a model card published by the lab, a benchmark score on a major public eval (MMLU, GPQA, SWE-Bench, ARC-AGI), an official pricing-page update, or a court-docketed deal. The challenge spells out which signal settles it before you stake.
We don't try to score "which lab is best" with a single number — that's the kind of claim that ages badly. Instead, every prediction is concrete: "Will Claude beat GPT on SWE-Bench Verified by Q3?", "Will the next major model from lab X cost less per million tokens than the current model on launch day?". Concrete, falsifiable, and resolved by a public source.
The community's stake distribution becomes the live odds. If you think the room is wrong about a lab's release cadence, you stake on the underdog side and explain your reasoning in the comments. Winners pay from losers after the standard platform fee — in fictional coins, never in cash.
What you can predict in the AI category
Release dates
Will lab X ship a new flagship model in a given quarter? Will a teased preview become generally available before a specified date?
Benchmark scores
Will model Y beat model Z on a named public benchmark? Each challenge points at a single eval suite and a single named version of each model.
Capability claims
When a lab publishes a new capability claim (agent autonomy, multimodality, context length), GutCall opens a challenge on whether independent reproductions confirm it within N weeks.
Pricing moves
Token pricing for production models changes in step-functions — usually downward, occasionally up. Challenges resolve on the published pricing-page update.
Market leadership
Which lab's model is the most-cited model in third-party developer surveys at the end of a season? Resolution sources are major dev-tool dashboards.
Closed-loop, in-game coins
Coins are fictional. They cannot be cashed out, transferred, or exchanged for prizes. The game rewards good calls with cosmetics and badges, not money.
AI leaderboard FAQ
Which labs are covered?
All major frontier labs that publish public model cards or benchmark results — Anthropic, OpenAI, Google DeepMind, Meta, xAI, Mistral, Alibaba, DeepSeek, plus any other lab releasing a model in the relevant period. Challenges name the specific lab and model up front.
How are benchmark-score challenges resolved?
The challenge specifies the benchmark suite, the version, and the public scoreboard or paper that settles it. GutCall reads from that named source after the resolution deadline. Disagreements between labs' self-reported scores and independent reproductions are handled by the dispute process.
What if a lab silently changes a model behind an API?
Challenges name a model version (e.g. "Claude Opus 4.7"). If a lab rebrands or quietly swaps the underlying model, the challenge resolves on the named version — verified through release notes or model cards. Ambiguous cases enter dispute and may void.
Can I create my own AI-lab challenge?
Paid Creator and Pro plans unlock the authoring suite. The AI template asks you to specify the lab, the model, the benchmark or claim, and the public resolution source — keeping every authored challenge auditable.
Is this a real betting market on AI outcomes?
No. GutCall coins are fictional and have no cash value, can't be cashed out, and can't be redeemed for prizes. The AI leaderboard is a prediction game for entertainment, not a betting market or an investment product.
Keep exploring
Think you can read the lab race better than the room?
Free signup. Starter coins on us. No card, no wallet, no real money in play.