You are currently viewing Google Veo, a sea change in AI-generated video, debuts at Google I/O 2024 |  TechCrunch

Google Veo, a sea change in AI-generated video, debuts at Google I/O 2024 | TechCrunch

Google is taking aim at OpenAI’s Sora with Veo, an AI model that can create 1080p videos about a minute long by receiving a text prompt.

Unveiled Tuesday at Google’s I/O 2024 developer conference, Veo can capture a variety of visual and cinematic styles, including landscape and time-lapse shots, and make edits and corrections to already generated footage.

“We’re exploring features like storyboarding and generating longer scenes to see what Veo can do,” Demis Hassabis, head of Google’s AI R&D lab DeepMind, told reporters during a virtual roundtable. “We’ve made incredible progress in video.”

Image Credits: Google

Veo builds on Google’s pre-commercial work in video generation previewed in April, which uses the company’s Imagen 2 family of image generation models to create looping videos.

But unlike the Imagen 2-based tool, which can only create low-resolution, several-second videos, the Veo appears to be competitive with today’s flagship video generation models — not just the Sora, but also models from startups like the Pika, Runway and Irreverent Laboratories.

At the briefing, Douglas Eck, who leads research efforts at DeepMind in the area of ​​generative media, showed me some select examples of what Veo can do. One in particular—an aerial view of a busy beach—demonstrates Veo’s strengths over rival video models, he said.

“Detailing all the swimmers on the beach proved difficult for both image and video generation models – having so many moving characters,” he said. “If you look closely, the surf looks pretty good. And the meaning of the quick word ‘vibrant’ I would say is captured with all the people – the lively beach, full of sunbathers.’

Wow
Image Credits: Google

Veo was trained on many frames. This is generally how generative AI models work: Fed example after example of some form of data, the models pick up patterns in the data that allow them to generate new data—videos, in Veo’s case.

Where does the Veo training footage come from? Eck wouldn’t say exactly, but admitted that some may have been sourced from Google’s own YouTube.

“Google models may be trained on certain YouTube content, but always in accordance with our agreement with YouTube creators,” he said.

The “agreement” part can technically Be honest. But it’s also true that given YouTube’s network effects, creators don’t have much choice but to play by Google’s rules if they hope to reach the widest possible audience.

Wow
Image Credits: Google

A report by The New York Times in April revealed that Google expanded its terms of service last year in part to allow the company to use more data to train its AI models. Under the old ToS, it was unclear whether Google could use YouTube data to build products outside of the video platform. Not so under the new conditions, which loosen the reins significantly.

Google is far from the only tech giant using massive amounts of user data to train internal models. (See: Meta.) But what’s sure to upset some creators is Eck’s insistence that Google sets the “gold standard” here in terms of ethics.

“The solution to this [training data] it will be a challenge to bring all the stakeholders together to figure out what the next steps are,” he said. “Until we take those steps with the stakeholders — we’re talking about the film industry, the music industry, the artists themselves — we’re not going to move quickly.”

Yet Google has already made Veo available to select creatives, including Donald Glover (aka Childish Gambino) and his creative agency Gilga. (Like OpenAI with Sora, Google is positioning Veo as a tool for advertising.)

Eck noted that Google provides tools that allow webmasters to prevent the company’s bots from scraping training data from their websites. But the settings don’t apply to YouTube. And Google, unlike some of its rivals, doesn’t offer a mechanism to allow creators to remove their work from its training data sets after deletion.

I also asked Eck about regurgitation, which in the generative AI context refers to when a model generates a mirror copy of a training example. Tools like Midjourney have been found to dump accurate timestamped footage from movies including Dune, The Avengers and Star Wars – posing a potential legal minefield for users. OpenAI has reportedly gone so far as to block trademarks and creator names from Sora prompts to try to deflect copyright challenges.

So what steps has Google taken to reduce the risk of regurgitation with Veo? Eck had no answer, other than to say that the research team implemented filters for violent and explicit content (so no pornography) and used DeepMind’s SynthID technology to flag Veo videos as AI-generated.

Wow
Image Credits: Google

“We will aim – for something as large as the Veo model – to gradually roll it out to a small set of stakeholders that we can work very closely with to understand the implications of the model, and only then expand it to more big group,” he said.

Eck had more to share about the model’s technical details.

Eck describes the Veo as “quite controllable” in the sense that the model understands camera movements and VFX reasonably well from prompts (think descriptors like “pan,” “zoom,” and “explosion”). And like Sora, Veo has some understanding of physics—things like fluid dynamics and gravity—that add to the realism of the videos it generates.

Veo also supports masked editing for changes to specific areas of a video, and can generate videos from a still image, a la generative models like Stability AI’s Stable Video. Perhaps most intriguingly, given a series of prompts that together tell a story, Veo can generate longer videos—videos over a minute long.

Wow
Image Credits: Google

That’s not to say the Veo is perfect. Reflecting the limitations of today’s generative AI, objects in Veo’s videos disappear and reappear without much explanation or sequence. And Veo often gets its physics wrong – for example, cars will inexplicably, impossibly reverse on a dime.

That’s why Veo will remain behind the waiting list at Google Labs, the company’s experimental technology portal, for the foreseeable future, in a new generative AI video creation and editing front end called VideoFX. As it improves, Google aims to bring some of the model’s capabilities to YouTube Shorts and other products.

“It’s very much a work in progress, very experimental … there’s a lot more unfinished business here than done,” Eck said. “But I think it’s kind of the raw material for doing something really great in the film space.”

We’re launching an AI newsletter! Sign up here to start getting it in your inbox on June 5th.

Read more about Google I/O 2024 on TechCrunch

Leave a Reply