Microsoft Just Dropped 3 New AI Models to Take on OpenAI and Google
MAI

Redmond, WA — Microsoft has unveiled three new in-house AI models under its Microsoft AI (MAI) division, signaling a major push to compete directly with rivals like OpenAI and Google. The new models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—are now available through Microsoft Foundry and the MAI Playground. 

The launch marks a shift in Microsoft’s strategy—from relying heavily on partner models to building its own full-stack AI ecosystem.

Three Models, Three Core AI Capabilities

Microsoft’s latest release targets three of the most commercially valuable AI domains:

🎙️ MAI-Transcribe-1 (Speech-to-Text)

  • Supports 25 major languages

  • Delivers state-of-the-art accuracy in noisy, real-world environments

  • Up to 2.5× faster transcription speeds than previous Microsoft offerings 

👉 Built for use cases like meetings, call centers, subtitles, and voice agents.

🔊 MAI-Voice-1 (Text-to-Speech)

  • Generates natural, expressive speech with emotional nuance

  • Can create custom voices from just seconds of audio

  • Produces 60 seconds of audio in ~1 second 

👉 Designed for voice assistants, audiobooks, and conversational AI.

🖼️ MAI-Image-2 (Image Generation)

  • Delivers 2× faster image generation with high visual quality

  • Optimized for realistic lighting, textures, and text rendering

  • Already ranking among top image models on benchmarks 

👉 Targeted at designers, marketers, and enterprise creative teams.

Built for Developers, Priced for Scale

Microsoft is emphasizing price-performance leadership with aggressive pricing:

  • Transcription starting at $0.36/hour

  • Voice generation at $22 per 1M characters

  • Image generation starting at $5 per 1M tokens 

👉 The goal: make high-end AI models accessible at scale for real-world production.

Available Now in Microsoft Foundry

All three models are integrated into Microsoft Foundry, the company’s AI platform for building and deploying applications.

Developers can:

  • Access models via APIs

  • Build multimodal applications (voice + text + image)

  • Deploy AI agents with built-in governance and security

Microsoft is also rolling these models into its own products like Copilot, Teams, Bing, and PowerPoint, signaling rapid real-world adoption. 

A Direct Challenge to AI Rivals

This launch clearly positions Microsoft against:

  • OpenAI (GPT, Codex)

  • Google (Gemini, Veo, Gemma)

  • Anthropic (Claude)

Industry reports confirm this is part of Microsoft’s broader effort to reduce reliance on external AI providers and build proprietary models

👉 In short: Microsoft is no longer just a platform for AI—it’s becoming a model creator at scale.

Microsoft’s MAI models reflect a growing trend:

👉 Tech giants are building end-to-end AI stacks—models, infrastructure, and applications

We’re seeing:

  • Google → Gemini + Gemma + Veo

  • OpenAI → GPT + Agents + Codex

  • Microsoft → MAI + Copilot + Foundry

The competition is shifting from who has the best model → to who owns the full AI ecosystem.

Source: Microsoft