A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

clock
GitHub
Knowledge in-domain testset
The clock shows 10:20. Show what the clock will look like after 6 hours.
First Frame
Last Frame
select_next_figure_increasing_size_sequence
GitHub
Abstraction in-domain testset
The scene has two separated areas: a top SEQUENCE area and a bottom CHOICES area. In the SEQUENCE area, the shapes are the same shape and the same color, and their sizes strictly increase from left to right. First identify the constant size step between consecutive sequence shapes, then select the one correct option (out of 4) in the CHOICES area that continues the same shape, color, and size-increase pattern. Circle the correct option and show the full process step by step.
First Frame
Last Frame
grid_number_sequence
GitHub
Spatiality in-domain testset
The scene shows a 10x10 grid with a green start point, a red end point, and yellow cells marked with numbers 1, 2, and 3. An orange circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the numbered yellow cells in numerical order (1, then 2, then 3), taking the shortest path between each consecutive pair of numbered cells. The agent is allowed to pass through the red end point when visiting the numbered cells if needed. After visiting all numbered cells in sequence, the agent must reach the red end point, also following the shortest path.
First Frame
Last Frame
add_borders_to_unbordered_shapes
GitHub
Transformation out-of-domain testset
Several shapes are shown; some have black borders and some do not. Add a thin black border to every shape that does not already have one. Do not change anything else.
First Frame
Last Frame
mark_wave_peaks
GitHub
Perception out-of-domain testset
The scene shows a continuous wave on a white background. Find all peaks (local maxima: each point where the wave value is greater than both immediate neighbors). Circle each peak with a red hollow outline and a solid red dot at its center, from left to right one by one, and show the solution step by step.
First Frame
Last Frame

Inference Results

View All Results
Circle Central Dot - Samples
00
01
02
03
04
Task Domains 1/5
Circle Central Dot
Knowledge out-of-domain testset
Shape Outline Fill
Abstraction in-domain testset
Grid Avoid Obstacles
Spatiality in-domain testset
Separate Objects (Spinning)
Transformation in-domain testset
Find Incorrect Arrow
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9