Foundry Local is now GA

News

Microsoft has officially launched Foundry Local, a cross-platform solution that allows developers to run high-performance AI models entirely on a user's local machine. This milestone brings production-grade AI—including chat completions and audio transcription—directly to edge devices with zero cloud dependency, zero network latency, and no per-token costs.

The Vision of "Ship-Once" AI

Foundry Local is designed to be bundled directly into application installers. Its small footprint and self-contained nature mean developers can ship AI-powered apps without requiring end-users to install complex CLI tools or third-party dependencies. Once installed, the SDK intelligently identifies and downloads hardware-optimized models from the Foundry Model Catalog.

Technical Highlights for Developers

Multi-Platform & Hardware Acceleration: Foundry Local runs on Windows (WinML), macOS (Metal/Apple Silicon), and Linux. It automatically leverages GPUs and NPUs for acceleration, with a seamless fallback to CPU if no specialized hardware is detected.
Unified SDK: A single SDK handles multiple modalities including speech-to-text, tool calling, and chat. Support is available for C#, Python, JavaScript, and Rust.
OpenAI Compatibility: The inference APIs support the standard OpenAI request/response format, making it easy to switch between cloud-based and on-device models.
Offline-First Privacy: Data never leaves the device, making it ideal for high-security environments like healthcare or private personal assistants.

Getting Started with C# and .NET

Microsoft has provided a first-class experience for .NET developers. You can add the package via NuGet:

dotnet add package Microsoft.AI.Foundry.Local

The SDK abstracts the model lifecycle management, including download, hardware-matched loading, and inference, allowing you to focus on building features rather than managing local runtimes.

Curated Model Catalog

At launch, Foundry Local supports several optimized families of models, including:

Phi & Qwen: For high-efficiency local chat and reasoning.
Mistral & DeepSeek: For advanced coding and logic tasks.
Whisper: For local audio transcription.

This announcement marks a major shift toward decentralized AI, where the power of frontier models is balanced with the privacy and cost-efficiency of local execution. For more technical details and samples, visit the official Foundry Local GitHub repository.