OpenAI’s New Open Models Now Run Faster on NVIDIA GeForce RTX & RTX PRO GPUs

Vijay Kumari
Aug 07
1.5k
0
2

News

NVIDIA RTX AI Garage

In a pioneering collaboration, NVIDIA has partnered with OpenAI to optimize the newly released open-source GPT-OSS models for NVIDIA GPUs, significantly enhancing AI reasoning applications from cloud servers to personal computers. This initiative enables faster, smarter inference capabilities and unlocks new potential for agentic AI use cases such as web searching, comprehensive research, and more.

Revolutionary Models Now Accessible to Millions

OpenAI’s launch of the GPT-OSS-20B and GPT-OSS-120B models marks a milestone in democratizing cutting-edge AI technology. These flexible, open-weight reasoning models are available for millions of users and have been fine-tuned for NVIDIA’s powerful RTX AI PCs and workstations. Users can access them through popular frameworks and tools, including Ollama, llama.cpp, and Microsoft AI Foundry Local. NVIDIA’s flagship GeForce RTX 5090 GPU supports inference speeds up to 256 tokens per second, demonstrating substantial performance gains.

OpenAI showed the world what could be built on NVIDIA AI — and now they’re advancing innovation in open-source software.T he gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI — all on the world’s largest AI compute infrastructure.

Jensen Huang, founder and CEO of NVIDIA.

Core Features and Technical Innovations

Both GPT-OSS variants utilize a mixture-of-experts architecture, offering chain-of-thought reasoning with adjustable effort levels. Designed for flexible instruction-following and tool integration, these models were trained on NVIDIA H100 GPUs and support up to an unprecedented 131,072 context length—one of the longest for local AI inference. This capability is ideal for complex contexts such as programming assistance, document analysis, and deep investigative tasks.

A notable technical advancement is their support for MXFP4 precision on NVIDIA RTX GPUs. MXFP4 strikes an optimal balance between high model quality and resource efficiency, enabling fast performance with lower memory requirements compared to traditional precision types.

Seamless Integration via Ollama Application

For an accessible user experience on RTX AI PCs with at least 24GB VRAM, the new Ollama app offers immediate compatibility with GPT-OSS models. Popular among AI developers and enthusiasts, Ollama’s revamped interface supports simple, out-of-the-box interaction with these models without additional configuration. Users can engage in fast, natural conversations by selecting their preferred GPT-OSS model and sending messages.

Ollama

Ollama also features enhanced functionalities, including support for uploading PDFs and text files within chat sessions, multimodal prompts incorporating images, and adjustable context lengths to navigate extensive documents efficiently. For developers, Ollama extends capabilities through command-line access and an SDK, facilitating integration into custom workflows and applications.

Expanding Access Through Additional Frameworks and Tools

Beyond Ollama, GPT-OSS models can run on RTX AI PCs equipped with 16GB or more VRAM through various other platforms optimized for NVIDIA hardware. NVIDIA continues to contribute to the open-source community with improvements to frameworks like llama.cpp and the GGML tensor library, including CUDA Graphs for reduced processing overhead. Developers eager to engage can explore these resources on the llama.cpp GitHub repository.

Performance of gpt-oss-20b on RTX AI PCs

Windows developers gain further access via Microsoft AI Foundry Local, presently available in public preview. This on-device AI inference solution integrates seamlessly into workflows through command-line commands, APIs, or SDKs. With ONNX Runtime and CUDA optimizations—and imminent NVIDIA TensorRT support—users can quickly deploy GPT-OSS models locally with minimal setup.

Driving the Next Wave of AI Progress

The release of GPT-OSS represents a key step in advancing AI innovation on NVIDIA’s broad compute infrastructure, from training to inference, and from cloud datacenters to personal AI-enabled desktops. NVIDIA’s ongoing RTX AI Garage blog series spotlights community-driven AI creations, focusing on building AI agents, enhancing creative workflows, and developing productivity software powered by these new open-source reasoning models.

By optimizing these models for widespread use on NVIDIA GPUs, NVIDIA and OpenAI are accelerating the development of intelligent applications and cementing leadership in AI technology for developers, researchers, and enterprises globally.