Major Large Model Manufacturers Release New Products Concentratedly
In the early morning of August 6 Beijing time, several large model manufacturers took frequent actions and released their latest products, once again setting off a wave in the field of artificial intelligence.
AI startup Anthropic launched Claude Opus 4.1. According to Anthropic, this model is an upgraded version of Claude Opus 4 in terms of agent tasks, real-world coding, and reasoning. The company stated that in the past, it only focused on releasing major version updates, and the release of Claude Opus 4.1 this time means that the company will shift to promoting incremental improvements of coding models in the future, and more updates will be launched in the next few weeks. In terms of performance, in the SWE-bench Verify benchmark test, Claude Opus 4.1 scored 74.5%, exceeding Opus 4's 72.5%, indicating an improvement in coding ability. In addition, the model's scores in Terminal-Bench (agent terminal programming), GPQA Diamond (graduate-level reasoning ability), and MMMLU (multilingual question answering) benchmark tests all exceeded those of Opus 4.
Google launched the new generation of world model Genie 3. Google stated that Genie 3 is the company's first world model that supports real-time interaction, and the world model is a key stepping stone towards AGI (Artificial General Intelligence). It allows AI agents to train unrestrictedly in rich simulated environments and also provides a place for robot training. Genie 3 can generate diverse interactive environments. Its modeling of the physical characteristics of the world can simulate natural phenomena such as water and light, generate ecosystems containing animals and plants, create animated characters, and simulate the interactions of various elements in complex environments. However, Google also pointed out that although Genie 3 has broken through the capability boundary of world models, it still has limitations such as limited executable action space and difficulty in simulating multi-agent interactions in a shared environment.
OpenAI open-sourced two reasoning models, gpt-oss-120b and gpt-oss-20b, which is also the first time OpenAI has open-sourced models in six years. In April this year, OpenAI CEO Sam Altman stated that it is "very important" to launch new powerful and reasoning-capable open-source models. The two open-sourced models this time have fulfilled this promise. Sam Altman said that they are the research results of OpenAI that cost billions of dollars. Among them, gpt-oss-120b has 117 billion parameters, adopts the MoE (Mixture of Experts) architecture, with 5.1 billion activated parameters; gpt-oss-20b has 21 billion parameters, also using the MoE architecture, with 3.6 billion activated parameters. According to the benchmark test results released by OpenAI, the reasoning performance of both models is among the first echelon of open-source models. It is worth noting that the focus of OpenAI's update this time is not only on open-sourcing but also on enabling the new models to be deployed locally on end-side devices such as computers and mobile phones. gpt-oss-120b can run on a single 80GB GPU, and gpt-oss-20b can run on consumer devices with 16GB of memory.
The usage scenarios of the new products of these three large model manufacturers are different, but through this new product release, it is not difficult to find that both OpenAI and Anthropic have made changes in their product strategies. Although GPT-5, which marks the iteration of basic large models, has not yet been released, these frequent large model updates show that the capabilities of AI are continuously upgrading, and the availability of large models is also increasing.