The novelties of the new Chatgpt-4O: free for everyone, with a simultaneous translator and able to read emotions on their face


Wednesday, May 15, 2024, 9:28 AM

Open AI launched on May 13, 2024 GPT-4O (“O” of “Omni”), the latest version of Cat GPT with new functionalities, free access for all users and newnesses that bring the interaction between person and computer into unexpected limits.

According to the company, the new model will be used in OpenAI products in the coming weeks. As an input, accept every combination of text, audio and image and generates each combination of text outputs, audio and image.

The OpenAI assistant, who can easily interrupt users, is capable of Read emotions on your faces Lead them through the camera of your smartphone to do breathing exercises, tell them a story or help them solve a mathematical problem.

With GPT-4O you can perform Simultaneous translations in other languagesAs they revealed during the presentation with a real demonstration and performs complex mathematical operations within the reach of each user.

“We are very enthusiastic to present GPT-4O to all our free users,” said Mira Murati, technological director of the start-up with the head office in California, United States, at a virtual press conference. «With GPT-4O we train a single new end-to-end model in text, vision and audio, which means that all inputs and outputs are processed by the same neuronal network. Because GPT-4O is our first model that combines all these modalities, we are still scratching the surface of the exploration of what the model can do and its limitations, ”they point out.

When presenting this new version, OpenAi showed a voice activity, able to make the fluidity of too amazing Discussions between people. “There are transcription, intelligence and the possibility to speak to offer to offer vocal mode,” summarized Murati, who showed with two colleagues how users can deal with chatgpt.

It can respond to audio cards in just 232 milliseconds, with an average of 320 milliseconds, which is comparable to the time of human response in a conversation.

Free spread for all users

As announced at Open AI, GPT-4O capacities will be implemented in an iterative way (with extensive access to the red equipment from today). GPT-4O text and image options are in fact starting to be implemented in Chatgpt today. «We make GPT-4O available at the free level and for plus users with messages up to 5 times higher. We will launch a new version of Voice mode with GPT-4O in Alfa Inside Chatgpt Plus in the coming weeks, ”they announce …

Developers now also have access to GPT-4O in API as a text and vision model. GPT-4O is 2 times faster, half the price and has speed limits 5 times higher compared to GPT-4 Turbo. We are planning to launch support for the new audio and video options from GPT-4O to a small group of reliable partners in the API in the coming weeks.

Before GPT-4O you can use the speech mode to talk on average with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). To achieve this, the speech mode is a channel of three separate models: a simple model transcribes audio to text, GPT-3.5 or GPT-4 takes and generates text, and a third simple model converts that text in audio again. This process means that the most important source of intelligence, GPT-4, loses a lot of information: it cannot immediately observe the tone, different speakers or background noises and cannot laugh, sing or express.

Google Gemini Receza

This new version of the OpenAI program arrives a day before an expected Google presentation on its Gemini search engine, the AI ​​tool that competes with Chatgpt.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here