A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

View All
glass_refraction
GitHub
Knowledge in-domain testset
A light ray enters glass from air. The glass refractive index is 1.62, and the incident angle is 64.5 degrees. Using Snell's law, predict how the light ray refracts as it enters the glass. Draw the refracted ray from the point where the incident ray hits the glass surface, extending to the image edge. Show the complete refracted ray path inside the glass.
First Frame
Last Frame
select_next_figure_increasing_size_sequence
GitHub
Abstraction in-domain testset
The scene has two separated areas: a top SEQUENCE area and a bottom CHOICES area. In the SEQUENCE area, the shapes are the same shape and the same color, and their sizes strictly increase from left to right. First identify the constant size step between consecutive sequence shapes, then select the one correct option (out of 4) in the CHOICES area that continues the same shape, color, and size-increase pattern. Circle the correct option and show the full process step by step.
First Frame
Last Frame
grid_color_sequence
GitHub
Spatiality training set
The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.
First Frame
Last Frame
multiple_occlusions_horizontal
GitHub
Transformation training set
The scene shows 3 objects arranged horizontally on the right side of the frame, with a dark rectangular mask initially positioned on the left side. Move the mask horizontally to the right in a continuous motion until it leaves the frame. As it moves, the mask passes in front of the objects, temporarily blocking them from view.
First Frame
Last Frame
pigment_color_mixing_subtractive
GitHub
Perception out-of-domain testset
The scene has two pigment colors positioned on the left and right sides, and a mixing zone marked by a black rectangular border in the center. In subtractive color mixing (pigment/paint mixing), when two pigments combine, convert RGB to CMY, add CMY components, then convert back: convert RGB to CMY (CMY = 255 - RGB), mix in CMY space (result_CMY = min(CMY1 + CMY2, 255)), convert back to RGB (RGB = 255 - CMY_result). First identify the RGB values of the left pigment (an RGB(69, 238, 140) colored pigment) and the right pigment (an RGB(47, 80, 187) colored pigment), then calculate the mixed color using the CMY conversion process. Fill the black-bordered mixing zone in the center with the resulting mixed color and show the full calculation process step by step.
First Frame
Last Frame

Inference Results

View Full Bench
Dot to Dot - Samples
00
01
02
03
04
Task Domains 1/5
Dot to Dot
Knowledge in-domain testset
Symmetry Completion
Abstraction out-of-domain testset
Grid Shortest Path
Spatiality in-domain testset
Rolling Ball
Transformation in-domain testset
Mark Second Largest Shape
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Seedance 2.0

Leaderboard

Modality
Split
Type
Category
2026-04-28