You are currently viewing ChatGPT-4o’s major update enables audio-video conversations with an “emotional” AI chatbot

ChatGPT-4o’s major update enables audio-video conversations with an “emotional” AI chatbot

On Monday, OpenAI debuted GPT-4o (o for “omni”), a major new AI model that can seemingly converse using speech in real-time, read emotional cues and respond to visual inputs. It runs faster than OpenAI’s previous best model, GPT-4 Turbo, and will be free for ChatGPT users and available as a service via an API that will be released in the next few weeks, OpenAI says.

OpenAI unveiled the new capabilities for audio conversation and vision understanding in a live YouTube stream titled “OpenAI Spring Update,” presented by OpenAI CTO Mira Muratti and employees Mark Chen and Barrett Zoff, which included live demonstrations of GPT- 4o in action.

OpenAI claims that GPT-4o responds to audio inputs in about 320 milliseconds on average, which is similar to the human response time for a conversation, according to a 2009 study, and much shorter than the typical 2-3 second delay. seen in previous models. With GPT-4o, OpenAI says it has trained a brand new end-to-end AI model using text, vision and audio in a way that all inputs and outputs are “processed by the same neural network.”

OpenAI Spring Update.

“Since GPT-4o is our first model combining all these modalities, we are still only scratching the surface of exploring what the model can do and its limitations,” says OpenAI.

During the live stream, OpenAI demonstrated the GPT-4o’s real-time audio conversation capabilities, showing its ability to engage in natural, responsive dialogue. The AI ​​assistant seems to pick up on emotions easily, adapts its tone and style to match the user’s requests, and even includes sound effects, laughter and singing in its responses.

OpenAI CTO Mira Murati was seen debuting GPT-4o during the May 13, 2024 OpenAI Spring Update live stream.
Zoom in / OpenAI CTO Mira Murati was seen debuting GPT-4o during the May 13, 2024 OpenAI Spring Update live stream.

OpenAI

Presenters also highlighted the improved visual understanding of GPT-4o. By uploading screenshots, documents containing text and images, or diagrams, users can obviously have conversations about the visual content and receive data analysis from GPT-4o. In the live demo, the AI ​​assistant demonstrated its ability to analyze selfies, detect emotions and engage in light-hearted banter about the images.

In addition, GPT-4o showed improved speed and quality in more than 50 languages, which according to OpenAI cover 97 percent of the world’s population. The model also demonstrated its real-time translation capabilities, facilitating conversations between speakers of different languages ​​with near-instant translation.

OpenAI first added conversational voice features to ChatGPT in September 2023, which used Whisper, an artificial intelligence speech recognition model, for input and custom voice synthesis technology for output. In the past, OpenAI’s multimodal ChatGPT interface used three processes: transcription (speech-to-text), intelligence (processing the text as tokens), and text-to-speech, incurring increased latency with each step. With GPT-4o, all these steps happen at once. It “reasons through voice, text and vision,” according to Muratti. They called this an “omnimodel” in a slide shown on the screen behind Muratti during the live broadcast.

OpenAI announced that GPT-4o will be available to all ChatGPT users, with paid subscribers having access to five times the speed limits of free users. GPT-4o in API form will also reportedly feature twice the speed, 50 percent lower cost, and five times higher speed limits than GPT-4 Turbo.

IN <em>Her</em> the main character talks to an AI personality through wireless headphones similar to AirPods.” src=”https://cdn.arstechnica.net/wp-content/uploads/2023/10/her_2-640×344.jpg” width=”640″ height =”344″ srcset=”https://cdn.arstechnica.net/wp-content/uploads/2023/10/her_2-1280×689.jpg 2x”/><figcaption class=
Zoom in / in herthe main character talks to an AI personality through wireless earphones similar to AirPods.

Warner Bros.

The capabilities demonstrated during the live stream and numerous videos on the OpenAI website are reminiscent of the conversational AI agent in the 2013 sci-fi film. her. In this film, the main character develops a personal attachment to the AI ​​personality. With GPT-4o’s simulated emotional expressiveness from OpenAI (artificial emotional intelligence, you could call it), it’s not inconceivable that similar emotional attachments from the human side could develop with the OpenAI assistant, as we’ve already seen in the past.

Muratti acknowledged the new safety challenges posed by the GPT-4o’s real-time audio and video capabilities and said the company will continue to investigate safety and seek feedback from test users during its iterative rollout in the coming weeks .

“GPT-4o also went through extensive external red collaboration with 70+ external experts in fields such as social psychology, bias and justice, and misinformation to identify the risks introduced or amplified by the newly added modalities,” OpenAI says. “We used that knowledge [sic] to build our security measures to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they are discovered.”

ChatGPT Updates

Also on Monday, OpenAI announced several updates to ChatGPT, including a ChatGPT desktop app for macOS, which will be available to ChatGPT Plus users today and will become “more widely available” in the coming weeks, according to OpenAI. OpenAI also streamlined the ChatGPT interface with a new home screen and message layout.

And as we briefly mentioned above, when using the GPT-4o model (once it becomes widely available), ChatGPT Free users will have access to web browsing, data analysis, GPT Store, and memory features that were previously restricted to ChatGPT Plus, Team and Enterprise subscribers.

Leave a Reply