Head to head: Bernini-R Edit Video vs Marey Realism V1.5
Bernini-R Edit Video vs Marey Realism V1.5
One model understood the assignment; the other mostly delivered good-looking detours. Across both tests, Bernini-R Edit Video was the clearer, more disciplined editor, winning on prompt fidelity, occlusion logic, and shot continuity.
Bernini-R Edit Video wins this matchup cleanly, and the score gap — **17.0 to 11.3** — feels earned rather than inflated. This wasn’t a case of one model being a little sharper or a little prettier. It was a case of one model repeatedly preserving the actual shot design, while the other drifted into adjacent imagery that looked nice but stopped obeying the brief. The strongest evidence is the **teal violinist behind timetable** test. Bernini-R Edit Video nails the essentials: the **tram concourse** reads correctly, the **teal coat and mustard scarf** are there, the **arrivals timetable** functions as a real occluding object, and the subject’s **re-emergence stays continuous** instead of feeling like a reset. Marey Realism V1.5 produces an attractive image, but it misses the point of the shot: the scene skews toward a more static **outdoor street view**, and the core occlusion-and-tracking behavior never really locks in. The same pattern shows up in **arcade awning cloud shift**. Bernini-R Edit Video better understands the requested setup: a **courier on a silver folding bike**, moving through a **food-court sidewalk with steaming stalls**, in a **close backward tracking shot** that remains spatially coherent. Crowd motion and lighting variation also stay plausible. Marey Realism V1.5 gets some surface cues right — especially the awnings and moody cloud cover — but the camera logic wanders, the rider **drops out of frame**, and the whole thing feels less like the specified **dumpling-stall arcade** and more like a stylish approximation. What separates these two models is discipline. Bernini-R Edit Video is better at maintaining **identity, object permanence, and shot continuity** under prompt constraints. Marey Realism V1.5 can generate appealing frames, but too often it substitutes vibe for execution. In an edit-video context, that’s not a minor flaw; it’s the whole game. **Final call: Bernini-R Edit Video is the decisive winner. It follows the brief, preserves motion logic, and delivers the actual shot instead of a good-looking near miss.**
Teal violinist behind timetable
Object permanence & occlusion — In a busy tram concourse at dusk, a young busker in a teal raincoat and mustard scarf plays a scratched violin while walking sideways through the crowd; the camera makes a smooth shoulder-height lateral track to follow her as she passes fully behind a freestanding amber-lit arrivals timetable for about two seconds, then re-emerges still playing with the same coat, scarf, violin, bow hand, stride, and position relative to nearby commuters, with no warping or identity change through the occlusion; wet stone floor reflections, cool station fluorescents mixed with warm kiosk light, lively but slightly wistful mood, one continuous shot, 16:9
Model A matches the tram concourse setting, teal coat/mustard scarf, arrivals timetable occlusion, and continuous re-emergence much better, with strong identity and object consistency. Model B is visually attractive but misses the key occlusion/tracking setup and feels more like a static outdoor street scene than a busy tram concourse shot.
Arcade awning cloud shift
Lighting transition — Outside the Marlin Arcade food court at noon, a courier on a silver folding bike glides along a crowded sidewalk lined with steaming dumpling stalls and newspaper boxes as the camera performs a slow close tracking move backward just ahead of him under a striped green awning; over the course of the shot a dense cloud slides across the sun, causing the bright hard light to soften into muted cool shade and then partly return, with the change rippling smoothly across faces, chrome handlebars, shop windows, and pavement reflections while the crowd keeps moving naturally; bustling urban mood with a brief hush in the light, one continuous shot, 16:9
Model A better matches the prompt with a courier on a silver folding bike moving through a food-court sidewalk lined with steaming stalls, and it preserves a coherent continuous tracking setup with plausible crowd motion and lighting variation. Model B has strong awnings and dramatic cloud cover, but the camera angle and action diverge from the specified close backward tracking shot, the rider exits frame, and the scene feels less like the described dumpling-stall arcade.
Matchup powered by OpenRouter.