Google unveils Veo, a high-definition AI video generator that can rival Sora

Zoom in / Still images taken from videos generated by Google Veo.

Google / Benj Edwards

On Tuesday at Google I/O 2024, Google announced Veo, a new AI video synthesis model that can create HD videos from text, image or video prompts, similar to OpenAI’s Sora. It can generate 1080p videos over a minute long and edit videos from written instructions, but it has not yet been released for widespread use.

Veo reportedly includes the ability to edit existing videos using text commands, maintain visual consistency between frames, and generate video sequences of up to and over 60 seconds from a single prompt or a series of prompts that form a narrative. The company says it can generate detailed scenes and apply cinematic effects such as time lapse, aerial shots and various visual styles

Since the launch of DALL-E 2 in April 2022, we’ve seen a parade of new image and video synthesis models that aim to enable anyone who can input a written description to create a detailed image or video. Although neither technology is completely perfected, both AI image generators and video generators are constantly becoming more capable.

Back in February, we previewed OpenAI’s Sora video generator, which many at the time believed represented the best AI video synthesis the industry had to offer. This impressed Tyler Perry enough that he put off expanding his movie studio. So far, however, OpenAI hasn’t made the tool public – instead, they’ve restricted its use to a select group of testers.

Now, Google’s Veo appears to be capable of Sora-like video generation at first glance. We haven’t tried it ourselves, so we can only go by the select demo videos the company has provided on its website. This means that anyone looking at them should take Google’s claims with a huge grain of salt, because the results generated may not be typical.

Sample Veo videos include a cowboy riding a horse, a rapid-fire shooting down a suburban street, kebabs being grilled, a sunflower opening time lapse, and more. Conspicuously absent are any detailed images of humans, which have historically been difficult for AI to generate image and video models without obvious deformities.

Google says Veo builds on the company’s previous video generation models, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere. To improve quality and efficiency, Veo’s training data includes more detailed video captions and uses compressed “latent” video representations. To improve the quality of Veo’s video generation, Google included more detailed captions for the videos used to train Veo, allowing the AI to interpret the prompts more accurately.

Veo also looks notable for supporting movie-making commands: “When given a video input and an editing command at the same time, such as adding kayaks to an aerial shot of a coastline, Veo can apply that command to the original video and create new, edited video,” the company says.

While the demos look impressive at first glance (especially compared to Will Smith eating spaghetti), Google admits that generating AI video is difficult. “Maintaining visual consistency can be challenging for video generation models,” the company wrote. “Characters, objects, or even entire scenes can blink, jump, or change unexpectedly between frames, disrupting the viewing experience.”

Google tried to mitigate these shortcomings with “advanced latent diffusion transformers”, which is basically meaningless marketing talk without specifics. But the company is confident enough in the model that it’s working with actor Donald Glover and his Gilga studio to create an AI-generated demo that will debut soon.

Veo will initially be available to select creators through VideoFX, a new experimental tool available on Google’s AI Test Kitchen website, labs.google. Creators can join a VideoFX waiting list to potentially gain access to Veo features in the coming weeks. Google plans to integrate some of Veo’s capabilities into YouTube Shorts and other products in the future.

No word yet on where Google got the training data for Veo (if we had to guess, YouTube is probably involved). But Google says it’s taking a “responsible” approach with Veo. According to the company, “Videos created by Veo are watermarked using SynthID, our cutting-edge AI-generated content watermarking and identification tool, and pass through security filters and save verification processes that help mitigate privacy, copyright and bias risks.”

You Might Also Like

Painting a BMW M5 Frozen Orange doesn’t change the fact that it weighs more than a Ford F-150 – The Autopian

Real-life ‘immobile suit’: Dune-inspired spacesuit upgrade lets astronauts recycle urine into water

“Yooka-Replaylee” brings back Playtonic’s love letter to Banjo in remastered form

Leave a Reply Cancel reply