On May 14th, OpenAI officially announced the release of GPT-4o, its latest AI model.
Following closely, Microsoft announced the I/O 2024 Developer Conference.
At the event, Google not only upgraded its previous large models but also introduced several new ones, including a new version of Gemini, the model from Microsoft that uniquely extends the context window length to 1 million tokens.
Debates quickly ensued:
Let's compare the upgrades in GPT-4o to its previous versions.
The 'o' in GPT-4o stands for Omni, meaning 'all-powerful.' True to its name, GPT-4o is an all-around AI model that can accept and generate text, audio, and images.
GPT-4o can perceive human emotions and generate audio outputs in any desired style, be it singing, storytelling, or robotic voices.
Additionally, it boasts powerful visual capabilities and can analyze tabular data.
One of GPT-4o's standout advantages over other large models is its response speed, nearly eliminating latency issues. The shortest response time for audio input is 232 milliseconds, with an average response time of 320 milliseconds, close to human reaction speeds.
Unlike the widespread AI frenzy sparked by GPT-3.5 last year, this version update hasn't garnered as much attention, clearly not meeting the audience's high expectations.
Experts point out that some of GPT-4o's features are more flash than substance, lacking true functional breakthroughs. Innovations like adding tone to conversations or impromptu singing are only marginally impressive.
Earlier this year, OpenAI's launch of the video production model Sora caused a sensation, leading many to believe that the futuristic AI world depicted in science fiction was imminent. However, subsequent practical applications and commercial integrations were lacking, cooling public enthusiasm for AI.
Currently, the market's response to large models has calmed, and investments have become more rational. In China, after a period of intense competition with various large models and a plethora of functions, the audience's expectations have been tempered, and general feature updates no longer excite much curiosity.
In China, the focus has shifted to vertical applications of AI large models, such as specialized models for healthcare, finance, and small to medium-sized enterprises.
Interestingly, while international efforts are still pushing the functional boundaries of large models, China's focus is on affordability, helping enterprise users accelerate business innovation at lower costs.
On May 6th, Deepseek, known as the "Pinduoduo of the large model world," ignited a price war by pricing its DeepSeek-V2 API at 1 yuan per million tokens input and 2 yuan per output (32K context). ByteDance responded with its "Bean Bag Large Model" priced at 0.0008 yuan per thousand tokens, capable of processing over 1,500 Chinese characters, marking a shift from "cent pricing" to "mill pricing."
Numerous events have shown that both business models and technological innovations are essential wheels driving the world forward. The practical application of AI requires both advanced technological innovation as a foundation and low-cost inference services for widespread use.
If GPT-4o is merely the appetizer and the real feast is GPT-5.0, we can look forward to the dazzling new features it will bring.
Related recommendations
Learn more newsLeading Provider of Server Solutions
YouTube