Video Quality Specs | Characteristics including video resolution, frame rate, aspect ratio and duration (length of a video output from a single prompt generation or ability to extend and append videos to longer lengths) |
---|
Prompt Adherence | Accuracy of a model's understanding of language (what the user has asked for in a text prompt) and its ability to interpret and closely follow specific instructions to produce a coherent and detailed scene executed in a video output; more complex prompts can confuse some models (e.g., exact descriptions of multiple events occurring in sequence) |
---|
Realism & 3D/Temporal Consistency | Ability to accurately simulate and render complex dynamics of the real world, such as specific types of motion, light, texture and object states of being, behavior and interaction in 3D space and over time; requires temporal consistency (coherent and smooth transition between frames), 3D spatial consistency (realistic size, shape and orientation across frames relative to background and foreground), and continuity (permanence or persistent appearance of objects, characters and environments, even if they are occluded or leave frame); can be a challenge for some models with characters, objects or scenes morphing unexpectedly between frames |
---|
Controllability | Editing features to allow users to exert more precise control over video outputs, including with masked editing to animate or modify specific selected (mask) areas of video (e.g., inpainting to remove or replace objects or characters in frame, outpainting to extend video frame with context-relevant content, add motion to objects or characters in frame) or advanced settings like custom seed numbers, upscaling, prompt weights and frame interpolation |
---|
Multimodality | Integrated features to enable videos with sound, including speech or sound effects |
---|
Safety | Safeguards to mitigate content harms, including adding visible or invisible watermarks to outputs or restricting models' ability to depict violence, sex, hate, celebrity/individual likenesses or IP |
---|