Introducing Qwen3: The Next-Gen Language Model with Unmatched Performance

News

Qwen3

We’re excited to introduce Qwen3, the latest addition to the Qwen family of large language models. Our main model, Qwen3-235B-A22B, performs extremely well in coding, math, and general tasks, competing with top models like DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Even our smaller MoE model, Qwen3-30B-A3B, outperforms QwQ-32B (despite having far fewer active parameters), and the tiny Qwen3-4B matches the performance of much larger models like Qwen2.5-72B-Instruct.

Model

Model 2

Open-Source Models and Availability

We are open-sourcing two MoE models.

Qwen3-235B-A22B (235B parameters, 22B active)
Qwen3-30B-A3B (30B parameters, 3B active).

MoE

Along with that, we’re releasing six dense models

(Qwen3-32B, 14B, 8B, 4B, 1.7B, and 0.6B)

Under the Apache 2.0 license. The models vary in size, with context lengths of either 32K or 128K, giving users flexibility based on their needs.

Layers

Where to Access the Models?

The pre-trained and post-trained models, such as Qwen3-30B-A3B and Qwen3-30B-A3B-Base, are available on Hugging Face, ModelScope, and Kaggle. For deploying these models, we recommend using SGLang or vLLM, and for local use, tools like Ollama, LMStudio, MLX, llama.cpp, and KTransformers are supported.

Driving Innovation and Easy Access

Qwen3 aims to drive innovation in AI by offering powerful models openly to researchers, developers, and organizations around the world. You can also try Qwen3 directly on the Qwen Chat website or through the mobile app.

Flexible Thinking Modes

Qwen3 introduces flexible "thinking modes." It can either work step-by-step for complex tasks (thinking mode) or respond quickly for simple tasks (non-thinking mode). Users can even set a thinking "budget" to balance between deep reasoning and faster replies.

Multilingual Support

The models support 119 languages and dialects from many language families like Indo-European, Sino-Tibetan, Afro-Asiatic, and more. This multilingual capability allows Qwen3 to work effectively in many regions and use cases.

Advanced Agent Capabilities

We also enhanced Qwen3’s agent capabilities, making it better at reasoning, coding, and interacting with environments for completing complex tasks.

Improved Training Process

Compared to Qwen2.5, Qwen3 is trained on a much larger dataset 36 trillion tokens covering 119 languages and a wide range of domains, including web data, PDFs, and synthetic math/code data. The training process involved three stages.

Initial training with over 30 trillion tokens at 4K context length.
Knowledge-focused pre-training with 5 trillion tokens.
Long-context training up to 32K tokens.

Efficiency and Performance

Thanks to improved training methods, Qwen3 models match or surpass Qwen2.5 models with fewer active parameters, especially in STEM and coding tasks. MoE models are highly efficient, offering high performance at a lower computational cost.

MCP

Post-Training Enhancements

Qwen3’s post-training involved four stages.

Long chain-of-thought fine-tuning
Rule-based reinforcement learning
Merging of thinking & non-thinking modes
General RL across 20+ domains

Result. A powerful, adaptable model ready for diverse tasks

Qwen

Easy Deployment for Developers

Developers can easily work with Qwen3 using Hugging Face transformers. It supports thinking control, where you can enable or disable the "thinking" mode when generating responses. Deployment is simple with SGLang and vLLM, and local usage is supported through Ollama, LMStudio, llama.cpp, and ktransformers. Additionally, developers can dynamically switch between thinking and non-thinking during conversations using special prompt commands like /think and /no_think.