Runway’s latest AI video generator brings giant cotton candy monsters to life

Zoom in / A screenshot of a Runway Gen-3 Alpha video generated with the prompt “Giant humanoid made of fluffy blue cotton candy stomping on the ground and roaring to the sky, clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha, which is still in development but appears to produce video of similar quality to OpenAI’s Sora, which debuted earlier this year (and also not released yet). It can generate new high-definition video from text prompts that range from realistic humans to surreal monsters stomping across the countryside.

Unlike the previous June 2023 top model Runway, which could only create two-second-long clips, the Gen-3 Alpha can reportedly create 10-second video segments of people, places and things that have a sequence and coherence that easily surpasses Gen-2. If 10 seconds sounds small compared to Sora’s full minute of video, consider that the company is running on a limited computing budget compared to the more generously funded OpenAI — and in fact has a history of delivering video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the videos, and it is very likely that time-coherent generations (those that keep a character consistent over time) depend on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is hard to ignore.

AI video is heating up

It’s been a busy few weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). The Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherence that reportedly matches the Sora.

Gen-3 Alpha hint: “Subtle reflections of a woman on the window of a train moving at high speed in a Japanese city.”

Shortly after Kling debuted, people on social media started creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were new and strange, but generally lacked coherence; we tested the Dream Machine and weren’t impressed with anything we saw.

Meanwhile, one of the original text-to-video pioneers, New York-based Runway, founded in 2018, recently fell victim to memes that showed its Gen-2 technology falling out of favor compared to newer models for video synthesis. This may have spurred the announcement of the Gen-3 Alpha.

Gen-3 Alpha prompt: “Astronaut running across an alley in Rio de Janeiro.”

Generating realistic humans has always been difficult for video synthesis models, so Runway specifically showcases Gen-3 Alpha’s ability to create what the developers call “expressive” human characters with a range of actions, gestures and emotions. However, the examples provided by the company weren’t particularly expressive – mostly people just slowly staring and blinking – but they look realistic.

Human examples provided include generated videos of a woman on a train, an astronaut running down a street, a man with his face illuminated by the glow of a TV, a woman driving a car, and a woman running, among others.

Gen-3 Alpha hint: “Close-up shot of a young woman driving a car, looking pensive, a misty green forest seen through the car’s rainy window.”

The demo videos generated also include more surreal examples of video synthesis, including a giant creature walking through an abandoned city, a man made of rocks walking through a forest, and the giant cotton candy monster seen below. which is probably the best video on the whole page.

Gen-3 Alpha prompt: “A giant humanoid made of fluffy blue cotton candy stomping on the ground and roaring to the sky, a clear blue sky behind them.”

Gen-3 will power a variety of Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls, and Director Mode. It can create videos from text or images.

Runway says the Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal learning, taking a step toward developing what it calls “General World Models,” which are hypothetical AI systems that build internal representations of environments and use them to simulate future events in those environments.

AI video is heating up

You Might Also Like

Animal Well contains a hidden puzzle that requires at least 50 people to complete – IGN

Oh my, the new DnD books have a DLC chart

Bungie wins landmark suit against Destiny 2 cheat creator AimJunkies

Leave a Reply Cancel reply