You are currently viewing Project Astra is the future of AI at Google

Project Astra is the future of AI at Google

“I’ve had this vision in my mind for quite some time,” says Demis Hassabis, head of Google DeepMind and head of Google’s AI efforts. Hassabis has been thinking and working on AI for decades, but four or five years ago something really crystallized. One day soon, he realized, “we’re going to have this universal assistant. It’s multimodal, it’s with you all the time.” You name it Star Trek Communicator; call it the voice of her; call it what you want. “It’s this assistant,” Hassabis continues, “that’s just helpful. You get used to it being there when you need it.”

At Google I/O, the company’s annual developer conference, Hassabis showed off a very early version of what it hopes will become this universal assistant. Google calls it Project Astra, and it’s a real-time, multimodal AI assistant that can see the world, know what things are and where you left them, and can answer questions or help you do just about anything. In an incredibly impressive demonstration video, which Hassabis swears is not fake or tampered with in any way, an Astra user at Google’s London office asks the system to identify a part of a speaker, find their missing glasses, review the code and many other. Everything works practically in real time and in a very conversational way.

The Astra is just one of many Gemini announcements at this year’s I/O. There’s a new model called Gemini 1.5 Flash designed to be faster for common tasks like summarizing and captioning. Another new model, called Veo, can generate a video from a text prompt. The Gemini Nano, the model meant to be used locally on devices like your phone, is also supposed to be faster than ever. The context window for Gemini Pro, which refers to how much information the model can take into account in a given query, has doubled to 2 million tokens, and Google says the model follows instructions better than ever. Google is making rapid progress both in the models themselves and in presenting them to users.

Astra is multimodal by design – you can talk, write, draw, take pictures and video chat with it.
Image: Google

Going forward, says Hassabis, the story of AI will be less about the models themselves and more about what they can do for you. And this story is all about agents: bots that don’t just talk to you, but actually do things on your behalf. “Our history in agents is longer than our generalized modeling work,” he says, pointing to the AlphaGo game system from nearly a decade ago. Some of these agents, he says, will be extremely simple tools for getting things done, while others will be more like collaborators and companions. “I think it might even come down to personal preference at some point,” he says, “and understanding your context.”

Astra, Hassabis says, is much closer than previous products to the way a true real-time AI assistant should work. When Gemini 1.5 Pro, the latest version of Google’s main big language model, was ready, Hassabis said he knew the underlying technology was good enough for something like Astra to start working well. But the model is only part of the product. “We had components of this six months ago,” he says, “but one of the problems was just the speed and the lag. Without that, the usability isn’t quite there.” So for six months, getting the system up to speed was one of the team’s most important tasks. This means improving the model, but also optimizing the rest of the infrastructure to work well and at scale. Fortunately, Hassabis says with a laugh, “this is something Google does very well!”

Many of Google’s AI announcements at I/O were about providing more and easier ways to use Gemini. A new product called Gemini Live is a voice-only assistant that lets you have simple conversations back and forth with the model, interrupting when it gets long or calling back to earlier parts of the conversation. A new feature in Google Lens lets you search the web by taking a picture and narrating a video. Much of this is enabled by Gemini’s large context window, which means it can access a huge amount of information at once, and Hassabis says it’s crucial that it feels normal and natural to interact with your assistant.

Gemini 1.5 Flash exists to make AI assistants faster first and foremost.
Image: Google

By the way, do you know who agrees with this assessment? OpenAI, which has been talking about AI agents for some time. In fact, the company demonstrated a product strikingly similar to Gemini Live just an hour after the call with Hassabis. The two companies are increasingly fighting for the same turf and seem to share a vision of how AI can change your life and how you can use it over time.

How exactly will these helpers work and how will you use them? No one knows for sure, not even Hasabi. One thing Google is focusing on right now is travel planning — it’s created a new tool for using Gemini to create an itinerary for your vacation, which you can then edit alongside the Assistant. Eventually, there will be many more such features. Hassabis says he’s optimistic about phones and glasses as key devices for these agents, but also says there’s “probably room for some exciting form factors.” Astra is still in the early prototype phase and represents just one way you might want to interact with a system like Gemini. The DeepMind team is still exploring how best to bring together multimodal models and how to balance ultra-huge general models with smaller, more focused ones.

We’re still very much in the ‘speeds and feeds’ era of AI, where every incremental model matters and we’re obsessed with parameter sizes. But pretty quickly, at least according to Hassabis, we’ll start asking different questions about AI. Better questions. Questions about what these assistants can do, how they do it, and how they can make our lives better. Because the technology is far from perfect, but it is improving very quickly.

Leave a Reply