Google’s Gemma 3n Gets NVIDIA Jetson and RTX GPU Support

Powerade Featured

NVIDIA has officially rolled out general availability for the highly anticipated GEMMA 3n model, now optimized for both NVIDIA RTX GPUs and Jetson edge devices. This marks a significant leap forward in on-device artificial intelligence, with Gemma 3n delivering powerful multimodal capabilities and innovative memory efficiency for developers and AI enthusiasts.

Gemma 3n: Expanding Multimodal AI with Audio, Vision, and Text

First previewed by Google DeepMind at Google I/O 2025, Gemma 3n builds on its predecessor’s foundation by introducing robust audio capabilities alongside existing vision and text processing. The model integrates leading research components:

  • Universal Speech Model for advanced audio understanding
  • MobileNet v4 for efficient vision tasks
  • MatFormer for sophisticated text handling

This holistic approach enables developers to build applications that seamlessly interpret and generate content across multiple data types, all on-device.

Per-Lay Embeddings: Breakthrough in Memory Efficiency

A standout feature in Gemma 3n is the introduction of Per-Lay Embeddings. This technology dramatically reduces RAM requirements, allowing models with as many as 8 billion parameters to operate within the memory footprint of a typical 4 billion-parameter model. This innovation empowers developers to deploy higher-quality AI models even in resource-constrained environments, such as edge devices and robotics.

Model Specifications

Model name Raw Parameters Input Context Length Output Context Length Size on Disk
E2B 5B 32K 32K subtracting request input 1.55GB
E4B 8B 32K 32K subtracting request input 2.82GB

Gemma 3n’s lightweight and dynamic architecture is a perfect match for NVIDIA Jetson devices, which are widely used in robotics, smart cameras, and other edge AI applications. Developers can now leverage Gemma 3n to build smarter, more responsive systems that operate efficiently at the edge.

Join the Gemma 3n Impact Challenge

NVIDIA is inviting developers to participate in the Gemma 3n Impact Challenge on Kaggle. The competition encourages the use of Gemma 3n to drive positive change in fields such as accessibility, education, healthcare, environmental sustainability, and crisis response. Cash prizes start at $10,000, with special recognition for solutions optimized for Jetson and other on-device deployments.

Seamless Deployment for Windows Developers and AI Enthusiasts

With NVIDIA RTX AI PCs, deploying Gemma 3n is easier than ever. Developers and enthusiasts can use the Ollama platform to run Gemma 3n models locally, benefiting from RTX acceleration in popular applications like AnythingLLM and LM Studio.

Quick Start Guide:

ollama pull gemma3n:e4b
ollama run gemma3n:e4b "Summarize Shakespeare’s Hamlet"

NVIDIA’s collaboration with Ollama ensures that RTX GPUs deliver peak performance for Gemma 3n, thanks to backend optimizations built on the GGML library.

Customizing Gemma 3n with NVIDIA NeMo Framework

For organizations seeking tailored AI solutions, Gemma 3n models are now available on Hugging Face and fully compatible with the open-source NVIDIA NeMo Framework. NeMo enables end-to-end workflows for fine-tuning large language and multimodal models, including:

  • Data Curation: High-quality dataset preparation with NeMo Curator
  • Fine-Tuning: Efficient adaptation using LoRA, PEFT, or full parameter tuning
  • Evaluation: Rigorous model assessment with NeMo Evaluator

This streamlined process allows enterprises to achieve higher accuracy and relevance with their AI deployments.

NeMo Microservices

Commitment to Open Source and AI Transparency

NVIDIA continues to champion open-source AI, contributing hundreds of projects and supporting open models like Gemma. This commitment fosters transparency, safety, and resilience in the AI community, empowering developers worldwide to collaborate and innovate.