Building an Enterprise-Ready Angular Application with Local Machine Learning, Web Workers, and Signals

Cynthia Sathuragiri
Nov 20
792
0
1

Article

Introduction

Modern web applications increasingly depend on AI features. With recent progress in browser-based machine learning and GPU acceleration, it has become possible to run complex ML models entirely inside the browser, without relying on backend inference servers. Angular, combined with Web Workers and Signals, provides an ideal foundation to build such applications in a clean, scalable, and enterprise-friendly way.

This article explains a practical architecture for building AI-assisted Angular applications that perform local inference using ONNX Runtime Web or WebDNN. The focus is on real-world requirements: smooth UI, offline capability, strong modularity, and support for multiple ML pipelines.

Local ML in Angular

Running ML models in the browser offers several benefits:

No server-side GPU needed
No API latency
User data never leaves the device
Works offline
Scales without backend infrastructure

The challenge, however, is performance. Running inference directly in Angular components will block the main thread and cause UI stutter. This is why Web Workers become essential.

Architectural Overview

The architecture separates responsibilities clearly between the Angular UI, a central ML Orchestrator service, and background workers that handle inference.

Angular Components (Signals UI)

           │

Signal-based State Store

           │

ML Orchestrator Service

           │

Web Worker Messages

           │

ML Worker Thread (ONNX Web / WebDNN)

           │

Browser GPU / WASM Execution

Angular handles the UI and state reactivity.

The ML Orchestrator routes requests, queues tasks, and exposes results through Signals.

Workers run ML models away from the main thread to keep the UI responsive.

Implementing the ML Worker

A dedicated worker handles model loading and inference. Below is an example using ONNX Runtime Web.

/// <reference lib="webworker" />

import * as ort from 'onnxruntime-web';

let session: ort.InferenceSession | null = null;

 

addEventListener('message', async ({ data }) => {

  const { type, payload } = data;

 

  switch (type) {

    case 'LOAD_MODEL':

      session = await ort.InferenceSession.create(payload.modelUrl, {

        executionProviders: ['webgpu', 'wasm'],

      });

      postMessage({ type: 'MODEL_LOADED' });

      break;

 

    case 'INFER':

      if (!session) return;

      const input = new ort.Tensor('float32', payload.input, payload.shape);

      const output = await session.run({ input });

      postMessage({ type: 'RESULT', output });

      break;

  }

});

The worker:

Loads an ONNX model.
Performs inference using WebGPU or WASM.
Returns results to the main thread.

Creating the ML Orchestrator Service

This service acts as the central controller for all ML-related operations. It sends requests to the worker and stores ML states using Angular Signals.

import { Injectable, signal } from '@angular/core';



@Injectable({ providedIn: 'root' })

export class MlOrchestratorService {

  private worker = new Worker(

    new URL('../workers/ml.worker', import.meta.url),

    { type: 'module' }

  );

 

  modelLoaded = signal(false);

  result = signal<Float32Array | null>(null);

  loading = signal(false);

 

  constructor() {

    this.worker.onmessage = ({ data }) => {

      switch (data.type) {

        case 'MODEL_LOADED':

          this.modelLoaded.set(true);

          break;

 

        case 'RESULT':

          this.loading.set(false);

          this.result.set(data.output.output.data);

          break;

      }

    };

  }

 

  loadModel(modelUrl: string) {

    this.worker.postMessage({ type: 'LOAD_MODEL', payload: { modelUrl } });

  }

 

  infer(input: Float32Array, shape: number[]) {

    this.loading.set(true);

    this.worker.postMessage({ type: 'INFER', payload: { input, shape } });

  }

}

Because all outputs are exposed as Signals, UI updates happen immediately and without boilerplate.

Integrating with Angular Components

A component can simply inject the service and trigger inference.

@Component({

  selector: 'app-ai-widget',

  templateUrl: './ai-widget.component.html',

})

export class AiWidgetComponent {

  text = signal('');

 

  constructor(public ml: MlOrchestratorService) {}

 

  analyze() {

    const vec = this.textToVector(this.text());

    this.ml.infer(vec, [1, vec.length]);

  }

 

  textToVector(text: string): Float32Array {

    return new Float32Array(

      text.split('').map(c => c.charCodeAt(0) / 255).concat(new Array(128).fill(0))

    );

  }

}

Signals automatically update the UI whenever the orchestrator produces results.

Recommended Project Structure

A modular structure helps scale the codebase:

 src/app/

  core/

    services/

      ml-orchestrator.service.ts

      cache.service.ts

    interceptors/

    guards/

 

  features/

    ai-text/

      ai-text.component.ts

      ai-text.store.ts

      ai-text.worker.ts

 

    ai-image/

      ai-image.component.ts

      ai-image.worker.ts

 

  workers/

    ml.worker.ts

 

  models/

    ml-result.model.ts

This separation allows each feature module to have its own ML logic or even its own worker if necessary.

Using SharedArrayBuffer for Large Data Transfers

For large inputs such as images or audio buffers, JSON serialization becomes slow. SharedArrayBuffer allows zero-copy transfer between Angular and workers.

const buffer = new SharedArrayBuffer(1024 * 1024);

const arr = new Float32Array(buffer);

 

worker.postMessage({ type: 'INFER', payload: { buffer } }, [buffer]);

This significantly improves performance for real-time or batch ML tasks.

Multi-Model Pipelines

More complex applications may require running multiple models in sequence, such as:

Embedding generation
Classification
Ranking or summarization

The Orchestrator service can manage these pipelines by routing tasks to different workers and combining results before exposing them to the UI.

Offline Capability

Because models and inference run locally, adding PWA support enables full offline functionality. Models can be cached using the Cache API:



caches.open('ml-model-cache').then(cache => {

  cache.add('/assets/models/model.onnx');

});

This is especially useful for enterprise field applications that operate in unreliable network environments.

Security Considerations

Running ML locally avoids sending sensitive information to a server. This reduces compliance risks in healthcare, finance, legal, and other regulated industries. No external inference API means no data exposure and no dependency on backend GPU resources.

Practical Use Cases

This architecture fits a wide range of real-world applications:

AI-assisted text editors and document analysis tools
Manufacturing dashboards with on-device image detection
Medical NLP applications where data privacy is critical
Financial document scoring and extraction
Offline voice or audio command systems

Conclusion

By combining Angular Signals, Web Workers, and browser-based ML runtimes like ONNX Runtime Web or WebDNN, it is now possible to build fast, responsive, and offline-capable enterprise applications that run AI models directly in the browser. This architecture provides clean separation of responsibilities, excellent performance, and a solid foundation for scalable, production-ready AI features.