Integrating Machine Learning Models into ASP.NET Core Applications

Vishal Gami
Dec 05
1.1k
0
1

Article

A Practical Guide for Enterprise-Grade AI Integration

Machine Learning (ML) is now a core functionality in many enterprise applications. Whether you are building recommendation systems, fraud detection pipelines, forecasting modules, image recognition, or text classification, integrating ML models inside your existing ASP.NET Core applications gives you the advantage of real-time decision-making close to the application layer.

However, the way you integrate ML models into ASP.NET Core depends on multiple factors:

Type of model (ONNX, TensorFlow, PyTorch, ML.NET-trained).
Performance requirements (latency, concurrency).
Deployment strategy (in-process, out-of-process, microservices).
Hardware acceleration (CPU, GPU).
Scalability and maintainability.

This article presents a practical, production-focused guide covering integration methods for ML models in ASP.NET Core applications using ML.NET, ONNX Runtime, Python microservices, and cloud-hosted models.

1. Architecture Approaches for ML in ASP.NET Core

There are four main patterns:

1 In-Process Model Hosting

The ML model executes within the ASP.NET Core process.

Pros

Lowest latency
Easy to deploy
Good for small ML models

Cons

Not suitable for GPU models
If the model crashes, the entire app may crash
Heavy models slow down request processing

Suitable for

Small ONNX models
ML.NET models
Simple classical ML use cases

2. Out-of-Process Hosting (Sidecar or Worker Process)

Your ASP.NET Core app communicates with another process on the same machine.

Pros

Better isolation
Can run GPU-accelerated Python models
Prevents ASP.NET Core from restarting if model process fails

Cons

Higher latency
Extra deployment complexity

Suitable for

Python-based models (TensorFlow, PyTorch)
NVIDIA GPU workloads

3. Microservice-Based Model APIs

ML model runs as an independent service (Docker, Kubernetes, Azure Container Apps).

Pros

Horizontal scaling for inference
Versioned models
CI/CD for model updates
Best for large applications

Cons

Highest infra overhead
Requires service discovery and load balancing

Suitable for

Enterprise AI
Multi-team ownership
Multi-model hosting

4. Cloud-Based External Model APIs

Models hosted in cloud services (Azure ML, AWS Sagemaker, OpenAI, HuggingFace Inference).

Pros

Zero infrastructure
Auto-scaling
High availability

Cons

High latency for large payloads
Recurring cost
Requires network connectivity

Suitable for

NLP, image analysis, embeddings
Prototypes and production AI at scale

2. Integrating ML.NET Models

ML.NET enables training and running .NET-native models.

2.1 Loading the Model

using Microsoft.ML;

public class PredictionEngineService
{
    private readonly MLContext _mlContext = new MLContext();
    private readonly ITransformer _model;

    public PredictionEngineService()
    {
        _model = _mlContext.Model.Load("models/sentiment.zip", out _);
    }

    public PredictionEngine<InputData, PredictionResult> CreateEngine()
    {
        return _mlContext.Model.CreatePredictionEngine<InputData, PredictionResult>(_model);
    }
}

2.2 Using Dependency Injection

builder.Services.AddSingleton<PredictionEngineService>();

2.3 Prediction Controller

[ApiController]
[Route("api/predict")]
public class PredictionController : ControllerBase
{
    private readonly PredictionEngineService _service;

    public PredictionController(PredictionEngineService service)
    {
        _service = service;
    }

    [HttpPost]
    public ActionResult Predict(InputData input)
    {
        var engine = _service.CreateEngine();
        var result = engine.Predict(input);
        return Ok(result);
    }
}

Best Practices

Use PredictionEnginePool for concurrent predictions.
Avoid reloading model on every request.
Re-train and replace model with hot reload pattern.

3. Integrating ONNX Models with ONNX Runtime

ONNX Runtime is optimized for cross-framework models and supports CPU, GPU, and TensorRT acceleration.

3.1 Installing ONNX Runtime

dotnet add package Microsoft.ML.OnnxRuntime

3.2 Loading ONNX Model

using Microsoft.ML.OnnxRuntime;

public class OnnxModelService
{
    private readonly InferenceSession _session;

    public OnnxModelService()
    {
        _session = new InferenceSession("models/model.onnx");
    }

    public float[] Predict(float[] input)
    {
        var tensor = new DenseTensor<float>(input, new[] {1, input.Length});
        var inputs = new List<NamedOnnxValue>
        {
            NamedOnnxValue.CreateFromTensor("input", tensor)
        };

        using var results = _session.Run(inputs);
        return results.First().AsEnumerable<float>().ToArray();
    }
}

3.3 Exposing ONNX Predictions in ASP.NET Core

[ApiController]
[Route("api/onnx")]
public class OnnxController : ControllerBase
{
    private readonly OnnxModelService _service;

    public OnnxController(OnnxModelService service)
    {
        _service = service;
    }

    [HttpPost]
    public ActionResult Predict(InputVector model)
    {
        var result = _service.Predict(model.Values);
        return Ok(result);
    }
}

Best Practices

Keep the ONNX session singleton for performance.
Use GPU execution provider if available.
Batch predictions if possible.

4. Integrating Python Models with ASP.NET Core

Most enterprise AI models are still built using Python (TensorFlow, PyTorch, Scikit-learn). Integrating Python-based inference into ASP.NET Core requires careful design.

There are three reliable ways.

Method 1: Python Process Execution (In-process bridge)

Use a Python interpreter hosted next to ASP.NET Core.

4.1 Starting Python Process

public class PythonPredictionService
{
    public string Predict(string input)
    {
        var psi = new ProcessStartInfo
        {
            FileName = "python",
            Arguments = $"predict.py \"{input}\"",
            RedirectStandardOutput = true
        };

        var process = Process.Start(psi);
        return process.StandardOutput.ReadToEnd();
    }
}

This is simple but not scalable.

Pros

Easy to implement
Good for internal tools

Cons

Not scalable
Expensive per-process startup

Method 2: Persistent Python Worker Process

A Python script stays alive, ASP.NET Core sends input via pipes or sockets.

Pros

Lower latency
Can load GPU models once

Cons

More code
Maintenance overhead

Method 3: Python Microservice (Recommended)

The Python model runs as a separate FastAPI/Flask service:

fastapi_app.py

from fastapi import FastAPI
import torch

app = FastAPI()
model = torch.load("model.pt")
model.eval()

@app.post("/predict")
def predict(data: dict):
    tensor = torch.tensor(data["input"])
    output = model(tensor).tolist()
    return {"result": output}

ASP.NET Core calls it

public class PythonModelClient
{
    private readonly HttpClient _http;

    public PythonModelClient(HttpClient http)
    {
        _http = http;
    }

    public async Task<float[]> Predict(float[] input)
    {
        var response = await _http.PostAsJsonAsync("python-api/predict",
                                                   new { input });
        var result = await response.Content.ReadFromJsonAsync<ResponseModel>();
        return result.Result;
    }
}

Why this is best

Scalable
Testable
Maintains clear boundaries
Supports GPU nodes
Independent lifecycle

5. Integrating Cloud-Hosted AI Models

ASP.NET Core works well with cloud-based inference such as:

Azure ML
AWS SageMaker
HuggingFace Inference API
OpenAI / Azure OpenAI GPT models

Example: Calling Azure OpenAI for Text Classification

public class AzureOpenAIService
{
    private readonly HttpClient _client;

    public AzureOpenAIService(HttpClient client)
    {
        _client = client;
    }

    public async Task<string> ClassifyText(string text)
    {
        var res = await _client.PostAsJsonAsync("/openai/deployments/classifier", new
        {
            input = text
        });

        var json = await res.Content.ReadFromJsonAsync<Dictionary<string, object>>();
        return json["label"].ToString();
    }
}

Cloud advantages

No infra
Trained models ready
Scales automatically

Cloud concerns

Cost
Latency
Data residency

6. Managing Models in Production

Enterprise ML integration must account for versioning, A/B testing, monitoring, and security.

6.1 Model Versioning

Store models with version identifiers:

models/
    sentiment/
        v1/
        v2/
    fraud/
        v1/

Expose version via API:

GET /api/predict?model=sentiment&version=v2

6.2 Model Hot Reload

Load new models without restarting the app.

public void ReloadModel(string path)
{
    _model = _mlContext.Model.Load(path, out _);
}

6.3 A/B Testing

Route a percentage of requests to newer models.

if (Random.Shared.Next(100) < 20)
{
    return PredictWithModelV2();
}
return PredictWithModelV1();

6.4 Monitoring

Track:

Prediction latency
Error rate
Drift detection
Input distribution changes

Log prediction input/output with appropriate masking.

7. Performance Optimizations

Integrating ML can make your API heavy. Use these strategies:

1. Keep model sessions singleton

Avoid reloading the model.

2. Use batching

Batch predictions to reduce overhead.

3. Use GPU

ONNX Runtime GPU, CUDA, TensorRT.

4. Preprocessing optimizations

Move preprocessing to client if possible.

5. Avoid synchronous calls

Use async everywhere to avoid thread pool starvation.

6. Cache results

For deterministic tasks like classification.

8. Security Considerations

1. Validate all input

AI inputs can be attack vectors.

2. Mask sensitive data in logs

Never log full payloads.

3. Use HTTPS always

Avoid exposing ML inference endpoints.

4. Control access with API Keys / JWT

Especially for external model endpoints.

5. Prevent model extraction

Avoid exposing model weights.

9. Deployment Approaches

Single Docker Container

Good for ML.NET/ONNX CPU workloads.

Multi-Container

ASP.NET Core + Python ML model.

Kubernetes

Production-grade scaling and rollout.

Container GPU Nodes

For TensorFlow/PyTorch.

Azure Container Apps

Managed container infrastructure.

10. Full Example: ONNX Model Integration into ASP.NET Core

builder.Services.AddSingleton(new InferenceSession("models/sales_forecast.onnx"));

app.MapPost("/forecast", (InferenceSession session, SalesInput input) =>
{
    var tensor = new DenseTensor<float>(input.Values, new[] { 1, input.Values.Length });
    var results = session.Run(new[]
    {
        NamedOnnxValue.CreateFromTensor("input", tensor)
    });

    return Results.Ok(results.First().AsEnumerable<float>().ToArray());
});

11. When Not to Host ML Inside ASP.NET Core

Avoid in-process hosting when:

You need GPU
Model > 500MB
You require high availability
You expect > 200 predictions/second
Multiple models are required
Model training and inference cycles are frequent

Use microservices instead.

Conclusion

Integrating machine learning models into ASP.NET Core applications requires careful architectural decisions. Your approach must consider:

Model type
Latency requirements
Scalability
Infrastructure maturity
Security and monitoring

Summary of best-fit scenarios:

Approach	Best For
ML.NET In-process	Small models, CPU, low latency
ONNX Runtime	Edge AI, cross-framework inference
Python Microservice	GPU models, PyTorch/TensorFlow, heavy ML
Cloud Model APIs	NLP, embeddings, large language models
Microservice architecture	Enterprise AI, multi-model environments

By choosing the right integration pattern and applying best practices in deployment, monitoring, versioning, and performance tuning, you can build enterprise-grade intelligent applications that are both efficient and maintainable.