ASP.NET Core  

Integrating Machine Learning Models into ASP.NET Core Applications

A Practical Guide for Enterprise-Grade AI Integration

Machine Learning (ML) is now a core functionality in many enterprise applications. Whether you are building recommendation systems, fraud detection pipelines, forecasting modules, image recognition, or text classification, integrating ML models inside your existing ASP.NET Core applications gives you the advantage of real-time decision-making close to the application layer.

However, the way you integrate ML models into ASP.NET Core depends on multiple factors:

  • Type of model (ONNX, TensorFlow, PyTorch, ML.NET-trained).

  • Performance requirements (latency, concurrency).

  • Deployment strategy (in-process, out-of-process, microservices).

  • Hardware acceleration (CPU, GPU).

  • Scalability and maintainability.

This article presents a practical, production-focused guide covering integration methods for ML models in ASP.NET Core applications using ML.NET, ONNX Runtime, Python microservices, and cloud-hosted models.

1. Architecture Approaches for ML in ASP.NET Core

There are four main patterns:

1 In-Process Model Hosting

The ML model executes within the ASP.NET Core process.

Pros

  • Lowest latency

  • Easy to deploy

  • Good for small ML models

Cons

  • Not suitable for GPU models

  • If the model crashes, the entire app may crash

  • Heavy models slow down request processing

Suitable for

  • Small ONNX models

  • ML.NET models

  • Simple classical ML use cases

2. Out-of-Process Hosting (Sidecar or Worker Process)

Your ASP.NET Core app communicates with another process on the same machine.

Pros

  • Better isolation

  • Can run GPU-accelerated Python models

  • Prevents ASP.NET Core from restarting if model process fails

Cons

  • Higher latency

  • Extra deployment complexity

Suitable for

  • Python-based models (TensorFlow, PyTorch)

  • NVIDIA GPU workloads

3. Microservice-Based Model APIs

ML model runs as an independent service (Docker, Kubernetes, Azure Container Apps).

Pros

  • Horizontal scaling for inference

  • Versioned models

  • CI/CD for model updates

  • Best for large applications

Cons

  • Highest infra overhead

  • Requires service discovery and load balancing

Suitable for

  • Enterprise AI

  • Multi-team ownership

  • Multi-model hosting

4. Cloud-Based External Model APIs

Models hosted in cloud services (Azure ML, AWS Sagemaker, OpenAI, HuggingFace Inference).

Pros

  • Zero infrastructure

  • Auto-scaling

  • High availability

Cons

  • High latency for large payloads

  • Recurring cost

  • Requires network connectivity

Suitable for

  • NLP, image analysis, embeddings

  • Prototypes and production AI at scale

2. Integrating ML.NET Models

ML.NET enables training and running .NET-native models.

2.1 Loading the Model

using Microsoft.ML;

public class PredictionEngineService
{
    private readonly MLContext _mlContext = new MLContext();
    private readonly ITransformer _model;

    public PredictionEngineService()
    {
        _model = _mlContext.Model.Load("models/sentiment.zip", out _);
    }

    public PredictionEngine<InputData, PredictionResult> CreateEngine()
    {
        return _mlContext.Model.CreatePredictionEngine<InputData, PredictionResult>(_model);
    }
}

2.2 Using Dependency Injection

builder.Services.AddSingleton<PredictionEngineService>();

2.3 Prediction Controller

[ApiController]
[Route("api/predict")]
public class PredictionController : ControllerBase
{
    private readonly PredictionEngineService _service;

    public PredictionController(PredictionEngineService service)
    {
        _service = service;
    }

    [HttpPost]
    public ActionResult Predict(InputData input)
    {
        var engine = _service.CreateEngine();
        var result = engine.Predict(input);
        return Ok(result);
    }
}

Best Practices

  1. Use PredictionEnginePool for concurrent predictions.

  2. Avoid reloading model on every request.

  3. Re-train and replace model with hot reload pattern.

3. Integrating ONNX Models with ONNX Runtime

ONNX Runtime is optimized for cross-framework models and supports CPU, GPU, and TensorRT acceleration.

3.1 Installing ONNX Runtime

dotnet add package Microsoft.ML.OnnxRuntime

3.2 Loading ONNX Model

using Microsoft.ML.OnnxRuntime;

public class OnnxModelService
{
    private readonly InferenceSession _session;

    public OnnxModelService()
    {
        _session = new InferenceSession("models/model.onnx");
    }

    public float[] Predict(float[] input)
    {
        var tensor = new DenseTensor<float>(input, new[] {1, input.Length});
        var inputs = new List<NamedOnnxValue>
        {
            NamedOnnxValue.CreateFromTensor("input", tensor)
        };

        using var results = _session.Run(inputs);
        return results.First().AsEnumerable<float>().ToArray();
    }
}

3.3 Exposing ONNX Predictions in ASP.NET Core

[ApiController]
[Route("api/onnx")]
public class OnnxController : ControllerBase
{
    private readonly OnnxModelService _service;

    public OnnxController(OnnxModelService service)
    {
        _service = service;
    }

    [HttpPost]
    public ActionResult Predict(InputVector model)
    {
        var result = _service.Predict(model.Values);
        return Ok(result);
    }
}

Best Practices

  1. Keep the ONNX session singleton for performance.

  2. Use GPU execution provider if available.

  3. Batch predictions if possible.

4. Integrating Python Models with ASP.NET Core

Most enterprise AI models are still built using Python (TensorFlow, PyTorch, Scikit-learn). Integrating Python-based inference into ASP.NET Core requires careful design.

There are three reliable ways.

Method 1: Python Process Execution (In-process bridge)

Use a Python interpreter hosted next to ASP.NET Core.

4.1 Starting Python Process

public class PythonPredictionService
{
    public string Predict(string input)
    {
        var psi = new ProcessStartInfo
        {
            FileName = "python",
            Arguments = $"predict.py \"{input}\"",
            RedirectStandardOutput = true
        };

        var process = Process.Start(psi);
        return process.StandardOutput.ReadToEnd();
    }
}

This is simple but not scalable.

Pros

  • Easy to implement

  • Good for internal tools

Cons

  • Not scalable

  • Expensive per-process startup

Method 2: Persistent Python Worker Process

A Python script stays alive, ASP.NET Core sends input via pipes or sockets.

Pros

  • Lower latency

  • Can load GPU models once

Cons

  • More code

  • Maintenance overhead

Method 3: Python Microservice (Recommended)

The Python model runs as a separate FastAPI/Flask service:

fastapi_app.py

from fastapi import FastAPI
import torch

app = FastAPI()
model = torch.load("model.pt")
model.eval()

@app.post("/predict")
def predict(data: dict):
    tensor = torch.tensor(data["input"])
    output = model(tensor).tolist()
    return {"result": output}

ASP.NET Core calls it

public class PythonModelClient
{
    private readonly HttpClient _http;

    public PythonModelClient(HttpClient http)
    {
        _http = http;
    }

    public async Task<float[]> Predict(float[] input)
    {
        var response = await _http.PostAsJsonAsync("python-api/predict",
                                                   new { input });
        var result = await response.Content.ReadFromJsonAsync<ResponseModel>();
        return result.Result;
    }
}

Why this is best

  • Scalable

  • Testable

  • Maintains clear boundaries

  • Supports GPU nodes

  • Independent lifecycle

5. Integrating Cloud-Hosted AI Models

ASP.NET Core works well with cloud-based inference such as:

  • Azure ML

  • AWS SageMaker

  • HuggingFace Inference API

  • OpenAI / Azure OpenAI GPT models

Example: Calling Azure OpenAI for Text Classification

public class AzureOpenAIService
{
    private readonly HttpClient _client;

    public AzureOpenAIService(HttpClient client)
    {
        _client = client;
    }

    public async Task<string> ClassifyText(string text)
    {
        var res = await _client.PostAsJsonAsync("/openai/deployments/classifier", new
        {
            input = text
        });

        var json = await res.Content.ReadFromJsonAsync<Dictionary<string, object>>();
        return json["label"].ToString();
    }
}

Cloud advantages

  • No infra

  • Trained models ready

  • Scales automatically

Cloud concerns

  • Cost

  • Latency

  • Data residency

6. Managing Models in Production

Enterprise ML integration must account for versioning, A/B testing, monitoring, and security.

6.1 Model Versioning

Store models with version identifiers:

models/
    sentiment/
        v1/
        v2/
    fraud/
        v1/

Expose version via API:

GET /api/predict?model=sentiment&version=v2

6.2 Model Hot Reload

Load new models without restarting the app.

public void ReloadModel(string path)
{
    _model = _mlContext.Model.Load(path, out _);
}

6.3 A/B Testing

Route a percentage of requests to newer models.

if (Random.Shared.Next(100) < 20)
{
    return PredictWithModelV2();
}
return PredictWithModelV1();

6.4 Monitoring

Track:

  • Prediction latency

  • Error rate

  • Drift detection

  • Input distribution changes

Log prediction input/output with appropriate masking.

7. Performance Optimizations

Integrating ML can make your API heavy. Use these strategies:

1. Keep model sessions singleton

Avoid reloading the model.

2. Use batching

Batch predictions to reduce overhead.

3. Use GPU

ONNX Runtime GPU, CUDA, TensorRT.

4. Preprocessing optimizations

Move preprocessing to client if possible.

5. Avoid synchronous calls

Use async everywhere to avoid thread pool starvation.

6. Cache results

For deterministic tasks like classification.

8. Security Considerations

1. Validate all input

AI inputs can be attack vectors.

2. Mask sensitive data in logs

Never log full payloads.

3. Use HTTPS always

Avoid exposing ML inference endpoints.

4. Control access with API Keys / JWT

Especially for external model endpoints.

5. Prevent model extraction

Avoid exposing model weights.

9. Deployment Approaches

Single Docker Container

Good for ML.NET/ONNX CPU workloads.

Multi-Container

ASP.NET Core + Python ML model.

Kubernetes

Production-grade scaling and rollout.

Container GPU Nodes

For TensorFlow/PyTorch.

Azure Container Apps

Managed container infrastructure.

10. Full Example: ONNX Model Integration into ASP.NET Core

builder.Services.AddSingleton(new InferenceSession("models/sales_forecast.onnx"));

app.MapPost("/forecast", (InferenceSession session, SalesInput input) =>
{
    var tensor = new DenseTensor<float>(input.Values, new[] { 1, input.Values.Length });
    var results = session.Run(new[]
    {
        NamedOnnxValue.CreateFromTensor("input", tensor)
    });

    return Results.Ok(results.First().AsEnumerable<float>().ToArray());
});

11. When Not to Host ML Inside ASP.NET Core

Avoid in-process hosting when:

  • You need GPU

  • Model > 500MB

  • You require high availability

  • You expect > 200 predictions/second

  • Multiple models are required

  • Model training and inference cycles are frequent

Use microservices instead.

Conclusion

Integrating machine learning models into ASP.NET Core applications requires careful architectural decisions. Your approach must consider:

  • Model type

  • Latency requirements

  • Scalability

  • Infrastructure maturity

  • Security and monitoring

Summary of best-fit scenarios:

ApproachBest For
ML.NET In-processSmall models, CPU, low latency
ONNX RuntimeEdge AI, cross-framework inference
Python MicroserviceGPU models, PyTorch/TensorFlow, heavy ML
Cloud Model APIsNLP, embeddings, large language models
Microservice architectureEnterprise AI, multi-model environments

By choosing the right integration pattern and applying best practices in deployment, monitoring, versioning, and performance tuning, you can build enterprise-grade intelligent applications that are both efficient and maintainable.