Real-Time Observability for AI Agents with .NET Aspire, Application Insights & OpenTelemetry

Article

Introduction

Building AI agents is exciting, but running them in production is where the real challenge begins. How do you know if your agent is performing well? Are function calls succeeding? What happens when something goes wrong? How long do responses take?

In our previous articles, we built a Task Management AI Agent and secured it with Azure AI Content Safety . Now, we're adding the observability layer that every production AI system needs.

What you'll learn:

How to use .NET Aspire for local observability with a real-time dashboard
Configuring OpenTelemetry for traces, metrics, and logs
Tracking AI agent-specific metrics (function calls, response times)
Deploying the same observability configuration to Azure Application Insights
Best practices for monitoring AI agents in production

Why this matters: Without observability, you're flying blind. With proper monitoring, you can:

Detect issues before users complain
Track agent performance and accuracy
Debug complex multi-step agent interactions
Meet compliance and audit requirements

Source code : GitHub - TaskAgent

What is .NET Aspire?

.NET Aspire is Microsoft's opinionated cloud-native stack for building observable, production-ready distributed applications. It includes:

Aspire Dashboard : Real-time observability UI for local development
Service Defaults : Pre-configured OpenTelemetry setup
Azure Integration : Seamless deployment to Azure with the same telemetry
OTLP Protocol : Standard OpenTelemetry export (works with any backend)

Key benefit : Configure observability once, use it everywhere—from local development to Azure production.

Understanding the Three Pillars of Observability

1. Traces (Distributed Tracing)

What : Shows the complete journey of a request through your system.

For AI Agents

User sends message → Agent receives → LLM call → Function tool execution → Response

Example trace

  
    ChatController.SendMessage [200ms]
  ├─ TaskAgentService.SendMessageAsync [180ms]
  │   ├─ Azure OpenAI Call [120ms]
  │   ├─ TaskFunctions.CreateTask [40ms]
  │   │   └─ Database Insert [15ms]
  │   └─ Thread Serialization [10ms]

Why it matters : See exactly where time is spent and where failures occur.

2. Metrics (Time-Series Data)

What : Numerical measurements over time (counters, gauges, histograms).

For AI Agents

Requests per second
Function call success rate
Response latency percentiles (p50, p95, p99)
Active threads/conversations
Error rates and types

Example metrics

  
    agent.requests = 1,234
agent.function_calls = 456
agent.errors = 3
agent.response.duration.p95 = 850ms

Why it matters : Understand system health and performance trends at a glance.

3. Logs (Structured Events)

What : Timestamped records of discrete events.

For AI Agents :

Agent started/stopped
Function tool called with parameters
Errors and exceptions
User intent extracted
Content safety violations

Why it matters : Provides detailed context for debugging and audit trails.

Architecture: Local Development vs Production

The project is configured with .NET Aspire to provide seamless observability from development to production. Here's how telemetry flows in different environments:

Local Development with .NET Aspire

When you run the TaskAgent.AppHost project, .NET Aspire automatically:

Launches the Aspire Dashboard at https://localhost:17198
Configures the OTLP exporter to send telemetry to the dashboard
Displays real-time logs, traces, and metrics as you interact with the application

No Docker required : .NET Aspire Dashboard runs as a standalone process—you don't need Docker installed.

Production Deployment to Azure

In production, the same OpenTelemetry instrumentation automatically switches to Azure Monitor (Application Insights) based on configuration:

Key Insight : Same OpenTelemetry instrumentation, different exporters. The ServiceDefaults project detects which exporter to use based on environment variables:

OTEL_EXPORTER_OTLP_ENDPOINT → Aspire Dashboard (local)
APPLICATIONINSIGHTS_CONNECTION_STRING → Application Insights (production)

How the Project is Configured

The project includes .NET Aspire support out of the box. Here's what's already set up:

1. ServiceDefaults Project

The TaskAgent.ServiceDefaults project contains shared OpenTelemetry configuration that works in all environments. Key configuration includes:

ServiceDefaultsExtensions.cs

  
    public static TBuilder AddServiceDefaults<TBuilder>(this TBuilder builder)
{
    builder.ConfigureOpenTelemetry();  // Sets up traces, metrics, logs
    builder.AddDefaultHealthChecks();   // Adds /health endpoint
    builder.Services.AddServiceDiscovery();
    // ... HTTP client configuration
    return builder;
}

The OpenTelemetry configuration automatically instruments:

ASP.NET Core (HTTP requests, responses)
HttpClient (outbound HTTP calls)
Entity Framework Core (database queries)
Custom meters: TaskAgent.Agent , TaskAgent.Functions
Custom activity sources: TaskAgent.Agent , TaskAgent.Functions

2. AppHost Project

The TaskAgent.AppHost project is the orchestrator for local development:

AppHost.cs

  
    IDistributedApplicationBuilder builder = DistributedApplication.CreateBuilder(args);
builder.AddProject<Projects.TaskAgent_WebApp>("taskagent-webapp");
await builder.Build().RunAsync();

When you run this project (F5 in Visual Studio or dotnet run ), it:

Launches the Aspire Dashboard
Starts the WebApp project
Streams all telemetry to the dashboard in real-time

3. WebApp Integration

The TaskAgent.WebApp project is configured to use Service Defaults:

Program.cs

  
    // Add Service Defaults (OpenTelemetry + Health Checks)
builder.AddServiceDefaults();

// ... other configuration

// Map health check endpoints
app.MapDefaultEndpoints();

This single line builder.AddServiceDefaults() enables all observability features.

Custom Agent Telemetry

Beyond the built-in instrumentation, the project includes custom telemetry specifically designed for AI agent operations.

Agent Metrics

The AgentMetrics class tracks AI-specific metrics using OpenTelemetry's Metrics API:

Key metrics tracked

  
    // Counter: Total requests to the agent
_requestsCounter = _meter.CreateCounter<long>("agent.requests");

// Counter: Function calls made by the agent
_functionCallsCounter = _meter.CreateCounter<long>("agent.function_calls");

// Counter: Errors encountered
_errorsCounter = _meter.CreateCounter<long>("agent.errors");

// Histogram: Response duration (captures p50, p95, p99)
_responseDurationHistogram = _meter.CreateHistogram<double>("agent.response.duration");

How it's used in TaskAgentService :

  
    _metrics.RecordRequest(threadId ?? "new", "processing");
// ... process message
_metrics.RecordResponseDuration(stopwatch.ElapsedMilliseconds, threadId, success);

Tags provide context

thread_id : Which conversation thread
status : success/error/processing
function_name : Which function tool was called

Agent Traces

The AgentActivitySource class creates custom spans for distributed tracing:

Creating spans

  
    // Start a span for message processing
using var activity = AgentActivitySource.StartMessageActivity(threadId, message);

// Start a span for function calls
using var activity = AgentActivitySource.StartFunctionActivity("CreateTask", tags);

Tags on activities

  
    activity.SetTag("thread.id", threadId);
activity.SetTag("function.name", functionName);
activity.SetTag("agent.operation", "process_message");

These spans appear in the trace waterfall, showing exactly how long each operation takes and how they relate to each other.

Running Locally with Aspire Dashboard

To see the observability features in action:

Option 1: Visual Studio

Set TaskAgent.AppHost as the startup project
Press F5

Option 2: Command Line

  
    cd TaskAgent.AppHost
dotnet run

The Aspire Dashboard launches automatically at https://localhost:17198 (check console for the exact URL and authentication token).

Dashboard Features

The Aspire Dashboard provides real-time observability with several key views:

1. Structured Logs

Real-time log streaming from all services
Filter by severity, source, message content
View full context (request ID, trace ID, custom properties)
Search capabilities for quick troubleshooting

2. Traces (Distributed Tracing)

See complete request flow through your system
Click on a trace to see span details and timing
Identify slow operations and bottlenecks
View function call parameters and results
Waterfall view showing parent-child relationships

3. Metrics

Real-time charts for custom agent metrics
Track agent requests, function calls, errors
Response duration histograms showing p50, p95, p99
ASP.NET Core metrics (request rate, duration)
Runtime metrics (GC, memory, thread pool)

4. Resources

View all running services and their status
Health check status for each service
Environment variables and configuration
Container/process information

Pro Tip : The dashboard auto-refreshes, making it perfect for development and debugging. You can pause streaming, adjust time windows, and export data for analysis.

Deploying to Azure with Application Insights

Now that you understand how observability works locally, let's deploy to Azure for production monitoring.

Prerequisites for Azure Deployment

Before deploying, ensure you have:

An active Azure subscription
Azure CLI installed and configured
Visual Studio 2022 or VS Code with Azure extensions
The compiled application ready to deploy

⚠️ Important Note on Azure Regions : Due to App Service quota limitations with certain subscriptions, this deployment targets multiple regions. While the ideal setup would have all resources in West US 2 , the App Service in this example was successfully created in Central US due to quota availability. With a Pay-As-You-Go subscription, all resources can typically be deployed in any US region without quota issues.

💡 Security Note : For simplicity, this guide stores secrets in App Service configuration. In a real production environment, Azure Key Vault is highly recommended for secure credential management. This allows you to centralize secrets, enable automatic rotation, and integrate with managed identities for passwordless authentication.

Part A: Create Azure Resources via Azure Portal

1. Create Resource Group

Sign in to the Azure Portal
Click Resource groups from the left menu
Click + Create
Configure:
- Subscription : Select your subscription
- Resource group : rg-taskagent-prod
- Region : Choose a region with available quota (e.g., East US , West Europe , Central US )
Click Review + create → Create

2. Create Log Analytics Workspace

Application Insights requires a Log Analytics workspace for data storage.

In the Azure Portal, search for Log Analytics workspaces
Click + Create
Configure:
- Subscription : Your subscription
- Resource group : rg-taskagent-prod
- Name : log-taskagent-prod
- Region : Same as resource group
Click Review + create → Create

3. Create Application Insights

Search for Application Insights in the Azure Portal
Click + Create
Configure:
- Subscription : Your subscription
- Resource group : rg-taskagent-prod
- Name : appi-taskagent-prod
- Region : Same as resource group
- Log Analytics Workspace : Select log-taskagent-prod (created in step 2)
Click Review + create → Create
After deployment completes, navigate to the resource
Copy the Connection String :
- Go to Overview blade
- Click on the Connection String value to copy it
- Save this for later (format: InstrumentationKey=...;IngestionEndpoint=... )

4. Create SQL Server

Search for SQL servers in the Azure Portal
Click + Create
Configure:
- Subscription : Your subscription
- Resource group : rg-taskagent-prod
- Server name : (must be globally unique)
- Location : Same as resource group
- Authentication method : Use SQL authentication
  - Server admin login : Create a strong username
  - Password : Create a strong password (min 8 characters, mix of upper/lower/numbers/symbols)
  - Confirm password : Re-enter password
Click Next: Networking
Configure networking:
- Connectivity method : Public endpoint
- Allow Azure services and resources to access this server : Yes (required for App Service)
Click Review + create → Create
Save credentials securely : Username and Password

5. Create SQL Database

After SQL Server deployment completes, go to the SQL server resource
Click + Create database in the toolbar
Configure:
- Database name : sqldb-taskagent
- Want to use SQL elastic pool? : No
- Workload environment : Development
- Compute + storage : Click Configure database
  - Service tier : Basic (for minimal workload, 1 table)
    - DTUs : 5 (default)
    - Max data size : 2 GB (sufficient for task data)
    - Cost : ~$5/month
- Click Apply
Click Review + create → Create

Database Configuration Notes :

Basic tier is suitable for development/testing with low transaction volume
Standard (S0) recommended for production with moderate traffic
You can scale up/down the SKU anytime based on performance needs

6. Configure SQL Database Connection String

Navigate to the SQL Database ( sqldb-taskagent )
Go to Settings → Connection strings
Copy the ADO.NET connection string
Replace {your_password} with your SQL admin password
Save this connection string for deployment configuration

7. Create App Service Plan

Search for App Service plans in the Azure Portal
Click + Create
Configure:
- Subscription : Your subscription
- Resource group : rg-taskagent-prod
- Name : asp-taskagent-prod
- Operating System : Linux
- Region : Same as Resource Group
- Pricing tier : Click Explore pricing plans
  - Choose B1 (Basic)
    - Features : 1.75 GB RAM, 1 vCore, 10 GB storage
    - Cost : ~$12.50/month
    - Always On : Included (prevents cold starts)
    - Custom domains/SSL : Supported
  - Click Select
Click Review + create → Create

8. Create App Service (Web App)

Search for App Services in the Azure Portal
Click + Create → Web App
Configure:
- Subscription : Your subscription
- Resource group : rg-taskagent-prod
- Name : (must be globally unique)
- Publish : Code
- Runtime stack : .NET 9 (LTS)
- Operating System : Linux
- Region : Same as Service Plan
- App Service Plan : Select the created in step 7

Go to Monitoring
Configure monitoring:
- Enable Application Insights : Yes
- Application Insights : Select app-taskagent-prod
- Application Insights Region : Auto-selected
Click Review + create → Create

9. Configure App Service Settings

After the App Service is created, configure the application settings:

Navigate to your App Service
Go to Settings → Environment variables
In Connection strings tab, add:

Name : DefaultConnection
Value : [Connection string from step 6]
Type : SQLAzure
Click Apply

Azure OpenAI Settings (add these if not already configured):

Name	Value
AzureOpenAI__Endpoint	https://[your-openai-resource].openai.azure.com/
AzureOpenAI__ApiKey	[Your Azure OpenAI API key]
AzureOpenAI__DeploymentName	gpt-4o-mini

Azure Content Safety Settings :

Name	Value
ContentSafety__Endpoint	https://[your-contentsafety-resource].cognitiveservices.azure.com/
ContentSafety__ApiKey	[Your Content Safety API key]

Click Save at the top (the app will restart)

10. Configure IP Restrictions (Security)

Restrict access to your App Service by IP address for enhanced security:

In your App Service, go to Settings → Networking
Click Public Network Access under Inbound Traffic Configuration
Enabled from select virtual networks
Click + Add rule
Configure:
- Name : AllowCorporateNetwork
- Action : Allow
- Priority : 100
- Type : IPv4
- IP Address Block : [Your IP address]/32 (e.g., 203.0.113.45/32 )
- Description : Allow access from corporate network
Click Add rule
Click Save

IP Restriction Notes :

/32 = Single IP address (most restrictive)
/24 = Range of 256 addresses (e.g., office network)
You can add multiple rules with different priorities
Lower priority numbers are evaluated first
Azure services (if needed) can be allowed separately

Part B: Deploy Application to Azure

Now we'll set up continuous deployment from GitHub to Azure App Service using GitHub Actions.

Create Service Principal for GitHub Actions

We'll use Azure CLI to create a Service Principal that GitHub Actions will use to authenticate and deploy to Azure.

  
    # Create service principal with contributor role scoped to resource group
az ad sp create-for-rbac `
  --name "github-actions-app-taskagent-prod" `
  --role contributor `
  --scopes /subscriptions/$(az account show --query id -o tsv)/resourceGroups/rg-taskagent-prod `
  --sdk-auth

Output : Copy the entire JSON output—you'll need it for GitHub Secrets:

  
    {
  "clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "clientSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

⚠️ Security : Save this JSON securely—the clientSecret is displayed only once.

Set Up GitHub Actions Workflow from Visual Studio

Visual Studio 2022 provides a built-in feature to generate and configure GitHub Actions workflows for Azure deployments.

Steps

Open your solution in Visual Studio 2022
Right-click the TaskAgent.WebApp project → Publish
Select Target :
- Choose Azure → Next
- Choose Azure App Service (Linux) → Next
Select App Service
- Sign in to your Azure account
- Select Subscription : Your subscription
- Select Resource Group : rg-taskagent-prod
- Select App Service
- Click Finish
Configure CI/CD :
- In the Publish profile screen, click on More actions (...) dropdown
- Select Configure Continuous Deployment
- Check "Enable continuous deployment"
- Source control : Select GitHub
- Authenticate with GitHub if prompted
- Organization : Select your GitHub organization/username
- Repository : Select your repository (e.g., TaskAgent-AgenticAI )
- Branch : Select main
Generate Workflow :
- Visual Studio will automatically generate the GitHub Actions workflow file

Generated Workflow ( .github/workflows/main_app-taskagent-prod.yml ):

Visual Studio creates a workflow similar to this:

  
    name: Build and deploy .NET application to Web App
on:
  push:
    branches:
      - main
env:
  AZURE_WEBAPP_NAME: xxxxxxxxxx
  AZURE_WEBAPP_PACKAGE_PATH: src/services/TaskAgent/src/TaskAgent.WebApp/published
  CONFIGURATION: Release
  DOTNET_CORE_VERSION: 9.0.x
  WORKING_DIRECTORY: src/services/TaskAgent/src/TaskAgent.WebApp
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup .NET SDK
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: ${{ env.DOTNET_CORE_VERSION }}
      - name: Restore
        run: dotnet restore "${{ env.WORKING_DIRECTORY }}"
      - name: Build
        run: dotnet build "${{ env.WORKING_DIRECTORY }}" --configuration ${{ env.CONFIGURATION }} --no-restore
      - name: Test
        run: dotnet test "${{ env.WORKING_DIRECTORY }}" --no-build
      - name: Publish
        run: dotnet publish "${{ env.WORKING_DIRECTORY }}" --configuration ${{ env.CONFIGURATION }} --no-build --output "${{ env.AZURE_WEBAPP_PACKAGE_PATH }}"
      - name: Publish Artifacts
        uses: actions/upload-artifact@v4
        with:
          name: webapp
          path: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}
  deploy:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Download artifact from build job
        uses: actions/download-artifact@v4
        with:
          name: webapp
          path: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}
      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.app_taskagent_prod_SPN }}
      - name: Deploy to Azure WebApp
        uses: azure/webapps-deploy@v2
        with:
          app-name: ${{ env.AZURE_WEBAPP_NAME }}
          package: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}

GitHub Secrets Setup

Navigate to GitHub Repository Settings :
- Go to your repository on GitHub
- Click Settings → Secrets and variables → Actions
Add Azure Credentials Secret :
- Click New repository secret
- Name : APP_TASKAGENT_PROD_SPN
- Value : Paste the entire JSON output from the service principal creation
- Click Add secret

Verify Deployment and Application Insights Integration

Access your deployed application :
- URL: https://xxxxxxx.azurewebsites.net
Test the AI agent :
- Send a test message to /api/chat endpoint from the chat
- Create a task to generate telemetry
View telemetry in Application Insights :
- Go to Azure Portal → Application Insights
- Wait 2-5 minutes for telemetry to appear (initial ingestion delay)
- Navigate through the available monitoring features

Monitoring Your Agent in Application Insights

Application Insights provides the same observability features you saw in Aspire Dashboard, but is designed for production environments. Here's where to find key information:

Distributed Traces

Go to Investigate → Transaction search to find specific requests
Click on any transaction to see the complete trace with all spans
View timing breakdowns: controller → service → database → Azure OpenAI
Identify slow operations and bottlenecks

Custom Metrics

Go to Monitoring → Metrics to create charts
Select metrics like agent.requests , agent.response.duration , agent.function_calls
Add filters and dimensions (e.g., by function name, model, success status)
Create dashboards to visualize trends over time

Structured Logs

Go to Monitoring → Logs to query with KQL (Kusto Query Language)
Filter logs by severity level, time range, or custom properties
View detailed context for each log entry including thread IDs and parameters
Correlate logs with traces using operation IDs

Real-Time Monitoring

Go to Investigate → Live Metrics for real-time telemetry
See incoming requests per second, response times, and errors as they happen
Monitor server resources (CPU, memory) in real-time
View live traces and exceptions

Performance Analysis

Go to Investigate → Performance to analyze slow operations
See operation duration percentiles (p50, p95, p99)
Identify slowest dependencies and operations
Drill down into specific slow requests

Failure Investigation

Go to Investigate → Failures to analyze errors
View exception types and counts
See failed operations and dependencies
Access full stack traces and context

Application Map

Go to Investigate → Application Map for a visual architecture view
See relationships between your App Service, Azure OpenAI, and SQL Database
Identify dependency health and response times
Quickly spot failed dependency calls (highlighted in red)

Example KQL Queries for AI Agent Monitoring

  
    // Average response time by endpoint
requests
| where timestamp > ago(1h)
| summarize avg(duration), percentiles(duration, 50, 95, 99) by name
| order by avg_duration desc

  
    // Agent errors in the last 24 hours
traces
| where timestamp > ago(24h)
| where severityLevel >= 3 // Warning and above
| where message contains "Agent" or message contains "Function"
| project timestamp, severityLevel, message, operation_Id
| order by timestamp desc

  
    // Trace a specific conversation thread
traces
| where customDimensions.["thread.id"] == "your-thread-id-here"
| order by timestamp asc
| project timestamp, message, severityLevel

These queries help you understand agent behavior, optimize performance, and troubleshoot issues in production. You can save frequently used queries and create custom dashboards combining multiple visualizations.

Cost Management for Azure Resources

Estimated Monthly Costs (based on SKUs selected):

Resource	SKU	Estimated Cost
App Service Plan	B1 (Basic)	~$13/month
SQL Database	Basic (5 DTUs)	~$5/month
Application Insights	Pay-as-you-go	~$2-10/month (depends on data ingestion)
Log Analytics	Pay-as-you-go	First 5 GB/month free, then $2.30/GB
Total		~$20-30/month

Cost Optimization Tips

Stop App Service when not in use (dev/test environments)

      
        az webapp stop --name xxxxx --resource-group rg-taskagent-prod
az webapp start --name xxxxx --resource-group rg-taskagent-prod

Scale down SQL Database during off-peak hours (if traffic is low)

Set Application Insights sampling rate to reduce ingestion volume:

      
        builder.Services.AddApplicationInsightsTelemetry(options =>
{
    options.SamplingPercentage = 50; // Sample 50% of telemetry
});

Configure data retention in Application Insights:
- Default: 90 days
- Reduce to 30 days for cost savings
- Go to Configure → Usage and estimated costs → Data retention
Monitor Azure costs regularly:
- Azure Portal → Cost Management + Billing
- Set up budget alerts

Security Best Practices

1. Use HTTPS Only

Ensure HTTPS is enforced for all connections:

Go to App Service → Settings → Configuration → General settings
Verify HTTPS Only : On

2. Enable Managed Identity

Use System-Assigned Managed Identity to access Azure resources without storing credentials:

App Service has this enabled by default
Use it to authenticate with Azure Key Vault, Azure OpenAI, and other services
Eliminates the need to manage connection strings and API keys in configuration

3. Regular Security Updates

Keep .NET runtime updated to the latest version
Monitor for security advisories: https://github.com/dotnet/announcements
Update NuGet packages regularly to patch vulnerabilities
Subscribe to Azure Security Center recommendations

4. Review Access Controls

Use Role-Based Access Control (RBAC) for Azure resources
Follow principle of least privilege
Regularly audit who has access to production resources
Enable Azure AD authentication for database connections when possible

Best Practices for AI Agent Observability

1. Use Semantic Tags

Add meaningful tags to activities and metrics for better filtering and analysis:

  
    activity?.SetTag("agent.operation", "process_message");
activity?.SetTag("function.name", "CreateTask");
activity?.SetTag("thread.id", threadId);
activity?.SetTag("task.priority", "High");

2. Use Structured Logging

Always use structured logging with named parameters instead of string interpolation:

  
    // ❌ Bad
_logger.LogInformation($"Task {taskId} created");

// ✅ Good
_logger.LogInformation("Task created: {TaskId}, Priority: {Priority}", taskId, priority);

3. Set Activity Status on Errors

Mark activities as failed when exceptions occur for better trace analysis:

  
    catch (Exception ex)
{
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    _logger.LogError(ex, "Function call failed");
}

4. Use Histograms for Duration

Use histograms instead of counters for duration metrics to capture percentiles:

  
    // ❌ Bad - Counter
_durationCounter.Add(durationMs);

// ✅ Good - Histogram (shows p50, p95, p99)
_responseDurationHistogram.Record(durationMs);

Analyzing Agent Performance

Key Metrics to Monitor

Metric	Target	Alert Threshold
Request Rate	Depends on load	-
Response Time (p95)	< 1000ms	> 2000ms
Error Rate	< 1%	> 5%
Function Call Success Rate	> 99%	< 95%
Active Threads	Monitor for leaks	-

Example Kusto Query (Application Insights)

Query agent response times by function:

  
    customMetrics
| where name == "agent.response.duration"
| extend functionName = tostring(customDimensions.["function_name"])
| summarize
    avg(value),
    percentile(value, 50),
    percentile(value, 95),
    percentile(value, 99)
  by functionName
| order by avg_value desc

Troubleshooting

Telemetry Not Reaching Application Insights

  
    // Verify connection string is set
var connectionString = builder.Configuration["APPLICATIONINSIGHTS_CONNECTION_STRING"];
if(string.IsNullOrEmpty(connectionString))
{
    throw new InvalidOperationException("Application Insights connection string not configured");
}

Common Azure Deployment Issues

Issue : App Service shows "Application Error" after deployment

Solutions

Check Application Insights → Failures for exception details

View App Service logs:

      
        az webapp log tail --name xxxxxxxx --resource-group rg-taskagent-prod

Verify all app settings are configured correctly
Ensure database migrations were applied successfully

Issue : Cannot connect to SQL Database

Solutions :

Verify firewall rules allow App Service IP:
- SQL Server → Security → Networking
- Ensure "Allow Azure services" is enabled

Test connection string locally:

      
        sqlcmd -S sql-taskagent-prod.database.windows.net -d xxxxxxx -U xxx -P [password]

Check if database was created and migrations applied

Issue : Application Insights not showing telemetry

Solutions

Wait 5-10 minutes for initial ingestion (first deployment)
Verify connection string in App Service configuration
Check if ApplicationInsightsAgent_EXTENSION_VERSION is set to ~3

Restart App Service:

      
        az webapp restart --name xxxxxxxx --resource-group rg-taskagent-prod

Issue : High costs on Application Insights

Solutions

Enable sampling (50-90%):

      
        builder.Services.AddApplicationInsightsTelemetry(options =>
{
    options.SamplingPercentage = 50;
});

Reduce log verbosity (only Warning and above in production)
Set data retention to 30 days instead of 90
Review Usage and estimated costs blade to identify high-volume sources

Issue : IP restrictions preventing access

Solutions :

Verify your current IP: https://www.whatismyip.com/
Add your IP to access restriction rules
Temporarily disable restrictions for testing:
- App Service → Networking → Access restriction
- Remove all rules to allow all IPs (not recommended for production)

Getting Help

If you encounter issues not covered here:

Check Application Insights Logs : Most errors are logged with detailed context
Azure Support : Submit a support ticket through the Azure Portal
GitHub Issues : Report issues or ask questions
Stack Overflow : Tag questions with azure-app-service , azure-application-insights , .net-9

What's Next?

In the next articles of this series, we'll continue building on this foundation by adding more advanced capabilities to our Task Agent.

Coming up :

Advanced AI features and capabilities
Enhanced user experience
Performance optimization
And much more!

Stay tuned for the next article where we'll explore exciting new features for our AI agent.

Resources

Documentation

Source Code

Conclusion

Observability is not optional for production AI agents; it's essential. With .NET Aspire, OpenTelemetry, and Application Insights, you now have:

Real-time visibility into agent behavior during development
Production monitoring with Azure Application Insights
Custom metrics for AI-specific concerns (function calls, response times, errors)
Distributed tracing for complex multi-step agent interactions
Structured logging for debugging and compliance

Start with the Aspire Dashboard for local development, then deploy the same instrumentation to Azure. Your future self (and your ops team) will thank you.

Real-Time Observability for AI Agents with .NET Aspire, Application Insights & OpenTelemetry

Introduction

What is .NET Aspire?

Understanding the Three Pillars of Observability

1. Traces (Distributed Tracing)

2. Metrics (Time-Series Data)

3. Logs (Structured Events)

Architecture: Local Development vs Production

Local Development with .NET Aspire

Production Deployment to Azure

How the Project is Configured

1. ServiceDefaults Project

2. AppHost Project

3. WebApp Integration

Custom Agent Telemetry

Agent Metrics

Agent Traces

Running Locally with Aspire Dashboard

Dashboard Features

Deploying to Azure with Application Insights

Prerequisites for Azure Deployment

Part A: Create Azure Resources via Azure Portal

Part B: Deploy Application to Azure

Create Service Principal for GitHub Actions

Set Up GitHub Actions Workflow from Visual Studio

Verify Deployment and Application Insights Integration

Monitoring Your Agent in Application Insights

Cost Management for Azure Resources

Security Best Practices

Best Practices for AI Agent Observability

1. Use Semantic Tags

2. Use Structured Logging

3. Set Activity Status on Errors

4. Use Histograms for Duration

Analyzing Agent Performance

Key Metrics to Monitor

Example Kusto Query (Application Insights)

Troubleshooting

Telemetry Not Reaching Application Insights

Common Azure Deployment Issues

Getting Help

What's Next?

Resources

Documentation

Source Code

Related Articles

Conclusion