AI integration in .NET has matured into a core engineering capability rather than an experimental add-on. C# developers can now bring LLMs, transformer models, embeddings, and multimodal reasoning directly into enterprise systems using three primary approaches: cloud-hosted APIs, local ONNX inference, and C#-native modeling with TorchSharp. Each method fits different architectural, security, and performance needs, and the modern .NET ecosystem supports all of them cleanly.
• Cloud-based LLMs like OpenAI, Azure OpenAI, and Hugging Face Inference offer the fastest way to add intelligence without infrastructure complexity. They deliver state-of-the-art models with strong reliability and minimal code, making them ideal for enterprise workloads, RAG systems, support automation, and intelligent assistants.
using OpenAI;
using OpenAI.Chat;
public class AiService
{
private readonly OpenAIClient _client = new OpenAIClient(Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
public async Task<string> AskAsync(string prompt)
{
var response = await _client.Chat.CreateAsync(new ChatRequest {
Model = "gpt-4o-mini",
Messages = new[] { new ChatMessage("user", prompt) }
});
return response.Choices[0].Message.Content;
}
}
• Local ONNX Runtime inference is optimal when data privacy, offline capabilities, or cost predictability are critical. Models exported from PyTorch or TensorFlow can run natively in C#, with CPU/GPU acceleration and no network dependency. This approach is heavily used in regulated industries and edge environments.
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
public class OnnxRunner
{
private readonly InferenceSession _session;
public OnnxRunner(string modelPath)
{
_session = new InferenceSession(modelPath);
}
public float[] Run(long[] tokens)
{
var tensor = new DenseTensor<long>(tokens, new[] { 1, tokens.Length });
using var results = _session.Run(new[] {
NamedOnnxValue.CreateFromTensor("input_ids", tensor)
});
return results.First().AsEnumerable<float>().ToArray();
}
}
• TorchSharp enables building or experimenting with transformer blocks entirely in C#, using PyTorch-style APIs for tensor operations, automatic differentiation, and GPU support. It is best suited for innovation-focused teams or products requiring custom neural architectures within a unified .NET stack.
using TorchSharp;
using static TorchSharp.torch;
using TorchSharp.Modules;
public class MiniTransformer : nn.Module
{
private readonly MultiheadAttention _attn;
private readonly nn.Linear _ff;
public MiniTransformer(int embed, int heads) : base("t")
{
_attn = nn.MultiheadAttention(embed, heads);
_ff = nn.Linear(embed, embed);
RegisterComponents();
}
public override Tensor forward(Tensor src)
{
var attn = _attn.forward(src, src, src).output;
var out1 = attn + src;
return _ff.forward(out1) + out1;
}
}
In real-world systems, AI workloads blend naturally with .NET architectures. Developers commonly expose inference through Minimal APIs or ASP.NET Core services, run batch jobs in background workers, and integrate embeddings with vector databases such as PostgreSQL+pgvector or Redis to build retrieval-augmented generation workflows. Performance improves significantly through caching tokenized prompts, batching inputs, streaming responses, and using AOT publishing for fast startup. GPU-enabled containers or cloud LLM endpoints support scalable inference, while ONNX models handle offline or private scenarios.
Senior engineers choose the approach based on constraints: cloud LLMs for maximum capability and minimum effort, ONNX for strict privacy and cost control, and TorchSharp for experimentation or custom architectures. The strength of .NET is that all three paths coexist cleanly and integrate with enterprise-grade engineering standards.
public interface IAiProvider
{
Task<string> GenerateAsync(string prompt);
}
public class OpenAiProvider : IAiProvider
{
private readonly OpenAIClient _client;
public OpenAiProvider(OpenAIClient client)
{
_client = client;
}
public async Task<string> GenerateAsync(string prompt)
{
var resp = await _client.Chat.CreateAsync(new ChatRequest {
Model = "gpt-4o-mini",
Messages = new[]{ new ChatMessage("user", prompt) }
});
return resp.Choices[0].Message.Content;
}
}
public class AiEngine
{
private readonly IAiProvider _provider;
public AiEngine(IAiProvider provider)
{
_provider = provider;
}
public Task<string> ProcessAsync(string input)
{
return _provider.GenerateAsync(input);
}
}
Summary
.NET has become a fully capable AI platform where developers can integrate transformers using cloud APIs, local ONNX models, or pure C# libraries. Cloud LLMs provide speed and model quality, ONNX ensures privacy and offline operation, and TorchSharp supports innovation and custom architectures. With strong tooling, performance, and architectural flexibility, .NET allows enterprises to build robust, intelligent systems that combine classical engineering discipline with modern AI logic.