Replay-grounded egocentric gameplay data

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

Rongjin Guo*, Dong Liang*, Yuhao Liu+, Fang Liu, Tianyu Huang, Gerhard P. Hancke, and Rynson W. H. Lau

City University of Hong Kong

EgoCS-400K pairs clean first-person Counter-Strike gameplay videos with temporally aligned actions, keyboard and mouse signals, camera motion, player states, game events, captions, and prompts. The dataset is built from public professional CS:GO and CS2 demos, making each video segment traceable back to the replay timeline that generated it.

Paper Code Demo

Dataset at a glance

Large-scale video-action-language trajectories for world models

The release organizes professional match replays into first-person player-view videos, protected action chains, action-safe segments, and per-tick state traces.

400K+round-player videos
10K+hours of gameplay
40K+rounds
1K+matches
13CS:GO and CS2 maps
10player viewpoints per round
6annotation types

Annotation Viewer

Inspect synchronized frames, actions, prompts, keyboard and mouse traces, and temporal segments in the interactive Hugging Face Space.

Open Viewer

Overview

Replay-grounded supervision beyond video captions

Counter-Strike demos preserve executable human gameplay trajectories. EgoCS-400K uses this replay structure to align rendered first-person observations with controls, camera movement, states, events, and language supervision.

First-person observations

Clean rendered videos from 10 player viewpoints per round.

Dense action traces

Keyboard, mouse, weapon, movement, utility, and event signals.

Hierarchical segments

Player sequences, DP-selected segments, protected chains, and atomic actions.

Prior-guided captions

Segment and protected-chain captions constrained by replay-derived facts.

Overview of EgoCS-400K dataset hierarchy and annotation types

Construction pipeline

From public demos to synchronized multimodal annotations

The pipeline collects public professional match demos, renders first-person videos, filters invalid captures, parses per-tick signals, builds protected action segments, and generates prior-guided captions.

Construction pipeline of EgoCS-400K

Annotation schema

Multi-level annotations share one replay timeline

Level Artifact Supervision
Tick state ticks.csv Controls, view angles, position, velocity, states
Atomic actions events.csv Fire, reload, switch, inspect, scope, grenade, crouch
Action timeline action.json, protected_action.json Frame-level actions and protected chains
Training segments dp_segments.json DP-planned clip boundaries and included actions
Captions segment_caption.json, protected_caption.json Structured scene drafts and long prompts

Qualitative example

Frames, controls, actions, and prompts in one temporal window

A four-second segment can expose sampled first-person frames, keyboard and mouse traces, action intervals, environment descriptions, and a video-generation prompt aligned to the same replay-derived timeline.

Qualitative EgoCS-400K segment with frames, input traces, actions, prompt, and environment description

Citation

BibTeX

Please cite the project if EgoCS-400K is useful for your research.

@misc{guo2026egocs400k,
  title={EgoCS-400K: An Egocentric Gameplay Dataset for World Models},
  author={Guo, Rongjin and Liang, Dong and Liu, Yuhao and Liu, Fang and Huang, Tianyu and Hancke, Gerhard P. and Lau, Rynson W. H.},
  year={2026},
  note={Project page: https://EgoCS-400K.github.io}
}

Contact

Questions about the dataset?

Contact Yuhao LIU at yuhaoliu7456@gmail.com.