OpenAI Unveils Two New Custom AI Reasoning Models
OpenAI has launched two new open-weight AI reasoning models, gpt-oss-120b and gpt-oss-20b, available for free on Hugging Face. This release is a significant shift for OpenAI, which has historically favored proprietary AI models.
The move marks the company’s return to open-sourcing its AI technology, a strategy it abandoned after the release of GPT-2 over five years ago. The company aims to make its models more accessible to a wider range of developers, thus democratizing AI development and fostering innovation
Different Sizes and Capabilities – gpt-oss-120b vs. gpt-oss-20b
The two models released differ in size and capability:
- gpt-oss-120b: The larger of the two, with 120 billion parameters, this model can run on a single NVIDIA GPU.
- gpt-oss-20b: A more lightweight model with 20 billion parameters, capable of running on consumer-grade laptops with 16GB of memory.
This tiered release allows a broader range of developers, from those using high-performance GPUs to those with standard laptops, to leverage the models in their projects. While the larger model offers superior performance, the smaller model is more accessible for users with limited resources.
Licensing: Open Access for Developers
Both models are released under the Apache 2.0 license, a highly permissive open-source license. This allows developers and enterprises to use, modify, and even monetize the models without seeking permission from OpenAI.
This decision comes in response to growing competition from other open-source AI projects, especially those from Chinese labs like DeepSeek, Alibaba’s Qwen, and Moonshot AI. By opening its models to the public, OpenAI aims to regain its position as a leader in the open-source AI space while encouraging the development of AI technology in line with democratic values.
Performance Benchmarks: Strong, Yet Imperfect
OpenAI has shared the performance of its models across various benchmarks:
- Codeforces: A competitive coding platform, where gpt-oss-120b scored 2622 and gpt-oss-20b scored 2516. Both models outperformed DeepSeek’s models but underperformed OpenAI’s proprietary o3 and o4-mini models.
- Humanity’s Last Exam (HLE): A challenging test of crowdsourced questions, where gpt-oss-120b scored 19% and gpt-oss-20b scored 17.3%. These results were lower than OpenAI’s proprietary models but still better than other open models.
However, one of the significant drawbacks of these models is their hallucination rate—the frequency with which they generate incorrect or fabricated information.
In tests like OpenAI’s PersonQA, gpt-oss-120b and gpt-oss-20b exhibited hallucinations in 49% and 53% of responses, respectively.
In comparison, OpenAI’s other models, such as o1 and o4-mini, show much lower hallucination rates. OpenAI notes that this is expected for smaller models, as they have less world knowledge.
Training Process: Efficiency and Reinforcement Learning
OpenAI used a mixture-of-experts (MoE) approach in training the models, where only a subset of the total parameters are activated for each task. In the case of gpt-oss-120b, which has 117 billion parameters, only 5.1 billion are activated per token. This helps the model perform efficiently without using excessive computational resources.
The models were also trained using reinforcement learning (RL), a post-training process where AI models are trained in simulated environments to distinguish right from wrong. This process has been critical in improving the reasoning capabilities of OpenAI’s proprietary models, and its inclusion in the open models ensures they also benefit from advanced training techniques.
Text-Only Limitation: No Multimodal Capabilities
Despite their capabilities, both models are text-only, which means they cannot process or generate images, audio, or other multimodal data. This limits their use compared to OpenAI’s other models, like DALL·E (for images) and Whisper (for audio).
However, these text-based models can still be used for various applications, including web searches, data analysis, and code execution. Their primary strength lies in their ability to generate and understand text, which is still a crucial aspect of many AI tasks.
Safety and Ethical Considerations: Preventing Misuse
OpenAI has implemented several safeguards to ensure that the release of these models does not lead to harmful uses, such as cyberattacks or the creation of dangerous technologies.
While the models were found to carry some risk, particularly in areas like biological research, OpenAI did not find evidence that they could be used to create high-risk threats. This cautious approach is part of OpenAI’s broader strategy to balance innovation with responsibility.
The company continues to monitor the use of its open models, ensuring that they are used ethically and safely. OpenAI has acknowledged that the potential for misuse will always exist with powerful AI models but remains committed to ensuring that these risks are minimized.
Global Impact and Future Developments
The release of these models is a significant step forward in the global race for AI supremacy. By open-sourcing these powerful tools, OpenAI is allowing developers worldwide to build on its technology and contribute to AI research and development. This move could accelerate advancements in various industries, including healthcare, finance, entertainment, and technology.
Looking ahead, developers are eagerly anticipating the release of future open models from OpenAI and competitors. OpenAI is likely to continue refining the models, addressing their hallucination issues, and enhancing their multimodal capabilities. As the competition in the AI space intensifies, the success of these models could determine the trajectory of open-source AI development in the coming years.