NVIDIA TensorRT for RTX: Optimized AI Inference Now on Windows 11

News

TensorRT

NVIDIA AI experiences are rapidly expanding on Windows in creativity, gaming, and productivity apps. Developers now have more tools than ever to run powerful AI models directly on desktops, laptops, and workstations. However, choosing between hardware-specific libraries and cross-vendor frameworks like DirectML has often forced developers to trade off performance for compatibility.

Introducing TensorRT for RTX

To simplify this, NVIDIA has announced TensorRT for RTX, a powerful AI inference library designed specifically for Windows PCs. It's available as part of Microsoft’s new Windows ML framework, providing developers with a standardized way to tap into the full performance of NVIDIA RTX GPUs.

Application

Massive Performance Gains for Local AI Inference

TensorRT for RTX builds on NVIDIA’s data center inference technology but is optimized for consumer RTX GPUs. It delivers over 50% faster performance than DirectML for popular AI workloads like image generation, video creation, and more. It also supports cutting-edge quantization formats like FP4 and FP8, helping massive generative models run efficiently on everyday hardware.

Fast and Flexible Deployment

One major highlight is that TensorRT for RTX doesn’t need pre-generated inference engines. Instead, it uses just-in-time (JIT) compilation, generating optimized code on the user’s GPU in just seconds. This can lead to an additional 20% speed boost over generic engines without bloating your app. The library is also lightweight (under 200 MB) and works silently in the background via Windows ML.

Strong Developer Feedback

Early testers have seen great results.
Topaz Labs eliminated weeks of engine pre-processing thanks to TensorRT.
Lightricks achieved 70% better performance than PyTorch FP16, with another 30% gain using FP8, and all this with on-device compilation times under 5 seconds.

Flux

Versatile Model Support

TensorRT for RTX supports a wide range of model types.

CNNs (Convolutional Neural Networks)
Diffusion Models (for image generation)
Audio Processing Models
Transformers (used in large language and vision models)

It’s also compatible with ONNX models and offers a build-once, deploy-anywhere workflow thanks to its AOT (ahead-of-time) + JIT architecture.

Models

Smarter Resource Usage & Real-Time Optimization

For tasks like text-to-image generation, TensorRT can adapt to dynamic resolutions without pre-defined size limits. Over time, as users interact with the app, the engine gets faster with each generation. The JIT compiler caches optimized kernels and can share them across models and app sessions for even quicker launches.

Broad Precision Support

TensorRT for RTX supports a wide spectrum of precisions.

FP32, FP16, BF16, FP8, FP4
INT8, INT4 (weight-only quantization)

This makes it ideal for balancing speed, size, and accuracy in different AI workloads.

Available Now in Windows ML Preview

TensorRT for RTX is part of the Windows ML public preview, and a standalone SDK will be available in June via developer.nvidia.com. Developers can use it via the Windows ML API for seamless integration or directly in apps for more control.

Conclusion: A New Era for AI on Windows

NVIDIA’s TensorRT for RTX is a game-changer for AI development on Windows PCs. With its compact size, fast build times, powerful performance, and broad compatibility, it’s the perfect tool for developers looking to deploy cutting-edge AI features right on consumer hardware.

Whether you're optimizing apps for content creation, gaming, or productivity, TensorRT for RTX brings the power of AI to Windows in a whole new way.

TensorRT RTX