A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

View All

glass_refraction

GitHub

Knowledge in-domain testset

Prompt

A light ray enters glass from air. The glass refractive index is 1.62, and the incident angle is 64.5 degrees. Using Snell's law, predict how the light ray refracts as it enters the glass. Draw the refracted ray from the point where the incident ray hits the glass surface, extending to the image edge. Show the complete refracted ray path inside the glass.

First Frame

Last Frame

Video

select_next_figure_increasing_size_sequence

GitHub

Abstraction in-domain testset

Prompt

The scene has two separated areas: a top SEQUENCE area and a bottom CHOICES area. In the SEQUENCE area, the shapes are the same shape and the same color, and their sizes strictly increase from left to right. First identify the constant size step between consecutive sequence shapes, then select the one correct option (out of 4) in the CHOICES area that continues the same shape, color, and size-increase pattern. Circle the correct option and show the full process step by step.

First Frame

Last Frame

Video

grid_color_sequence

GitHub

Spatiality training set

Prompt

The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.

First Frame

Last Frame

Video

multiple_occlusions_horizontal

GitHub

Transformation training set

Prompt

The scene shows 3 objects arranged horizontally on the right side of the frame, with a dark rectangular mask initially positioned on the left side. Move the mask horizontally to the right in a continuous motion until it leaves the frame. As it moves, the mask passes in front of the objects, temporarily blocking them from view.

First Frame

Last Frame

Video

pigment_color_mixing_subtractive

GitHub

Perception out-of-domain testset

Prompt

The scene has two pigment colors positioned on the left and right sides, and a mixing zone marked by a black rectangular border in the center. In subtractive color mixing (pigment/paint mixing), when two pigments combine, convert RGB to CMY, add CMY components, then convert back: convert RGB to CMY (CMY = 255 - RGB), mix in CMY space (result_CMY = min(CMY1 + CMY2, 255)), convert back to RGB (RGB = 255 - CMY_result). First identify the RGB values of the left pigment (an RGB(69, 238, 140) colored pigment) and the right pigment (an RGB(47, 80, 187) colored pigment), then calculate the mixed color using the CMY conversion process. Fill the black-bordered mixing zone in the center with the resulting mixed color and show the full calculation process step by step.

First Frame

Last Frame

Video

Inference Results

View Full Bench

Dot to Dot - Samples

Task Domains 1/5

Dot to Dot

Knowledge in-domain testset

Symmetry Completion

Abstraction out-of-domain testset

Grid Shortest Path

Spatiality in-domain testset

Rolling Ball

Transformation in-domain testset

Mark Second Largest Shape

Perception out-of-domain testset

Prompt

Ground Truth

First

Final

Model Outputs

VBVR-Wan2.2

CogVideoX 1.5

Kling 2.6

LTX-2

Runway Gen-4

Sora 2

Veo 3

Wan 2.2 I2V

Hunyuan I2V

Seedance 2.0

Prompt

Ground Truth

Model Outputs

VBVR-BAGEL

BAGEL

SenseNova-U1

VBVR-ThinkMorph

ThinkMorph

GPT Image 2

Nano Banana

Leaderboard

Modality

Split

Type