Reka turns Counter-Strike 2 demos into a world-model training dataset

CS2-10k promises 10,000-plus hours of first-person video with synchronized controls, but the full Hugging Face upload is not live yet.

By · Published

Why it matters

World models need action-linked video data, not just internet-scale clips. Reka's CS2-10k shows how AI labs are mining game replays for embodied training signals while the full corpus is still rolling out.

Reka turns Counter-Strike 2 demos into a world-model training dataset — CS2-10k promises 10,000-plus hours of first-person video with synchronized controls, but the full Hugging Face upload is not live yet.

Reka (@RekaAILabs) has released CS2-10k, a Counter-Strike 2 dataset and rendering pipeline aimed at one of the harder bottlenecks in world-model research: pairing egocentric video with the exact actions that produced each frame.

https://x.com/RekaAILabs/status/2070245465937822007

The release is a concrete signal of where Dani Yogatama's team is steering Reka. The Sunnyvale AI lab emerged from stealth in 2023 after being founded by researchers from DeepMind, Google, Baidu and Meta, with Yogatama, Cyprien de Masson d'Autume, Qi Liu Head and Yi Tay building the company around multimodal models for enterprise use cases, according to TechCrunch's 2023 profile. Reka's own homepage now frames its work more broadly as models and infrastructure for the physical AI era, including data infrastructure for egocentric video, robotics trajectories, world-model footage and expert judgment through Claru.

In a two-post thread on X, Reka said training world models requires synchronized egocentric video and dense action signals, and that such data is hard to find. CS2-10k is Reka's answer: a dataset built from Counter-Strike 2 professional match demos, with the company claiming more than 600,000 player-round videos, more than 10,000 hours of first-person footage and per-frame annotations covering keyboard state, mouse delta and 3D position.

The important qualifier is availability. The Hugging Face dataset card says the full dataset is still being uploaded and will appear over the coming days. As of June 25, 2026, the public Hugging Face page showed a browsable sample subset of three full matches, 748 rows and a 25.9 GB total file size. That makes CS2-10k an announced large-scale dataset with a live sample and open tooling, not yet a fully present 10,000-hour corpus on Hugging Face.

What Reka has made available matters beyond the headline size. The CS2-10k blog post says each clip is rendered at 720p and 48 fps from a single player's first-person perspective, with a matching parquet file aligned to the video timeline. The annotations include map, round number, team, frame count, field of view, active movement and action keys, mouse movement proxies, world position and camera yaw and pitch.

That alignment is the point. A model trained only on video can learn visual dynamics. A model trained on video plus action signals can be asked a more useful question: given what the agent sees and the action it takes next, what should happen visually and spatially? Reka positions CS2-10k for action-conditioned video generation, egocentric navigation, long-horizon planning and multi-agent world modeling, where the same round can be viewed from multiple players' perspectives with shared map and round identifiers.

The Counter-Strike choice is pragmatic. Real-world embodied data is expensive to collect, especially when the researcher needs synchronized camera, controls and state. Pure synthetic data is easier to label but can lack behavioral variety. Counter-Strike 2 demos sit between those options: public professional match replays preserve human behavior, while the game's deterministic replay tooling lets Reka reconstruct clean first-person footage and recover the controls and player state behind it.

Reka is also releasing the cs2-dem-renderer GitHub repository, the pipeline it says it used to create the dataset. The repo describes a Linux-oriented renderer that converts Counter-Strike 2 .dem demo files into per-player-round videos with synchronized frame-level metadata. It parses player spawn and death intervals plus per-frame button inputs, launches Counter-Strike 2 through Steam with the demo loaded, streams raw frames to ffmpeg and writes .mp4 clips alongside parquet metadata.

That open pipeline is strategically useful for Reka. If researchers accept Counter-Strike as a useful substrate for embodied AI, Reka does not need to own every match, clip or annotation variant itself. By releasing the renderer, Reka gives labs a way to expand beyond the sample and proposed full corpus, while still anchoring the workflow around Reka's schema and tooling.

The licensing also narrows the use case. The Hugging Face card lists CS2-10k under CC BY-NC 4.0, with attribution and non-commercial restrictions, and notes that the underlying match demos remain the property of their respective rights holders. That is appropriate for research distribution, but it limits straightforward commercial use by companies training production systems.

CS2-10k also lands at a moment when AI labs are looking for cheaper routes into embodied and interactive data. Large language model training rewarded internet-scale text and code. World models need time, action and consequence. Reka's bet is that a tactical shooter replay can supply enough visual richness, human behavior and recoverable control signals to become useful infrastructure for that next training regime.

For a company that began by selling enterprise-grade multimodal assistants and custom models, the release reads less like a one-off dataset drop than a product breadcrumb. Reka is no longer presenting itself only as a model builder. It is trying to own part of the data and tooling layer beneath physical AI, where the scarce asset is not another benchmark score but synchronized experience at scale.

Reader comments

Conversation for this story loads after sign-in.