Google Veo, a sea change in AI-generated video, debuts at Google I/O 2024

Google is taking aim at OpenAI’s Sora with Veo, an AI model that can create 1080p videos about a minute long by receiving a text prompt.

Unveiled Tuesday at Google’s I/O 2024 developer conference, Veo can capture a variety of visual and cinematic styles, including landscape and time-lapse shots, and make edits and corrections to already generated footage.

“We’re exploring features like storyboarding and generating longer scenes to see what Veo can do,” Demis Hassabis, head of Google’s AI R&D lab DeepMind, told reporters during a virtual roundtable. “We’ve made incredible progress in video.”

Image Credits: Google

Veo builds on Google’s pre-commercial work in video generation previewed in April, which uses the company’s Imagen 2 family of image generation models to create looping videos.

But unlike the Imagen 2-based tool, which can only create low-resolution, several-second videos, the Veo appears to be competitive with today’s flagship video generation models — not just the Sora, but also models from startups like the Pika, Runway and Irreverent Laboratories.

At the briefing, Douglas Eck, who leads research efforts at DeepMind in the area of generative media, showed me some select examples of what Veo can do. One in particular—an aerial view of a busy beach—demonstrates Veo’s strengths over rival video models, he said.

“Detailing all the swimmers on the beach proved difficult for both image and video generation models – having so many moving characters,” he said. “If you look closely, the surf looks pretty good. And the meaning of the quick word ‘vibrant’ I would say is captured with all the people – the lively beach, full of sunbathers.’

Veo was trained on many frames. This is generally how generative AI models work: Fed example after example of some form of data, the models pick up patterns in the data that allow them to generate new data—videos, in Veo’s case.

Where does the Veo training footage come from? Eck wouldn’t say exactly, but admitted that some may have been sourced from Google’s own YouTube.

“Google models may be trained on certain YouTube content, but always in accordance with our agreement with YouTube creators,” he said.

The “agreement” part can technically Be honest. But it’s also true that given YouTube’s network effects, creators don’t have much choice but to play by Google’s rules if they hope to reach the widest possible audience.

A report by The New York Times in April revealed that Google expanded its terms of service last year in part to allow the company to use more data to train its AI models. Under the old ToS, it was unclear whether Google could use YouTube data to build products outside of the video platform. Not so under the new conditions, which loosen the reins significantly.

Google is far from the only tech giant using massive amounts of user data to train internal models. (See: Meta.) But what’s sure to upset some creators is Eck’s insistence that Google sets the “gold standard” here in terms of ethics.

“The solution to this [training data] it will be a challenge to bring all the stakeholders together to figure out what the next steps are,” he said. “Until we take those steps with the stakeholders — we’re talking about the film industry, the music industry, the artists themselves — we’re not going to move quickly.”

Yet Google has already made Veo available to select creatives, including Donald Glover (aka Childish Gambino) and his creative agency Gilga. (Like OpenAI with Sora, Google is positioning Veo as a tool for advertising.)

Google Veo, a sea change in AI-generated video, debuts at Google I/O 2024 | TechCrunch

Leave a Reply Cancel reply

You Might Also Like

Google Pixel 8A review: the smart choice

Helldivers 2 removes PSN account link requirement for Steam players after widespread backlash

Steam users rave about Hi-Fi Rush now Microsoft shuts down Tango Gameworks – IGN

Leave a Reply Cancel reply