Azure  

Real-Time Observability for AI Agents with .NET Aspire, Application Insights & OpenTelemetry

Introduction

Building AI agents is exciting, but running them in production is where the real challenge begins. How do you know if your agent is performing well? Are function calls succeeding? What happens when something goes wrong? How long do responses take?

In our previous articles, we built a Task Management AI Agent and secured it with Azure AI Content Safety. Now, we're adding the observability layer that every production AI system needs.

What you'll learn:

  • How to use .NET Aspire for local observability with a real-time dashboard

  • Configuring OpenTelemetry for traces, metrics, and logs

  • Tracking AI agent-specific metrics (function calls, response times)

  • Deploying the same observability configuration to Azure Application Insights

  • Best practices for monitoring AI agents in production

Why this matters: Without observability, you're flying blind. With proper monitoring, you can:

  • Detect issues before users complain

  • Track agent performance and accuracy

  • Debug complex multi-step agent interactions

  • Meet compliance and audit requirements

Source code: GitHub - TaskAgent

What is .NET Aspire?

.NET Aspire is Microsoft's opinionated cloud-native stack for building observable, production-ready distributed applications. It includes:

  • Aspire Dashboard: Real-time observability UI for local development

  • Service Defaults: Pre-configured OpenTelemetry setup

  • Azure Integration: Seamless deployment to Azure with the same telemetry

  • OTLP Protocol: Standard OpenTelemetry export (works with any backend)

Key benefit: Configure observability once, use it everywhere—from local development to Azure production.

Understanding the Three Pillars of Observability

1. Traces (Distributed Tracing)

What: Shows the complete journey of a request through your system.

For AI Agents

  • User sends message → Agent receives → LLM call → Function tool execution → Response

Example trace

ChatController.SendMessage [200ms]
  ├─ TaskAgentService.SendMessageAsync [180ms]
  │   ├─ Azure OpenAI Call [120ms]
  │   ├─ TaskFunctions.CreateTask [40ms]
  │   │   └─ Database Insert [15ms]
  │   └─ Thread Serialization [10ms]

Why it matters: See exactly where time is spent and where failures occur.

2. Metrics (Time-Series Data)

What: Numerical measurements over time (counters, gauges, histograms).

For AI Agents

  • Requests per second

  • Function call success rate

  • Response latency percentiles (p50, p95, p99)

  • Active threads/conversations

  • Error rates and types

Example metrics

agent.requests = 1,234
agent.function_calls = 456
agent.errors = 3
agent.response.duration.p95 = 850ms

Why it matters: Understand system health and performance trends at a glance.

3. Logs (Structured Events)

What: Timestamped records of discrete events.

For AI Agents:

  • Agent started/stopped

  • Function tool called with parameters

  • Errors and exceptions

  • User intent extracted

  • Content safety violations

Why it matters: Provides detailed context for debugging and audit trails.

Architecture: Local Development vs Production

The project is configured with .NET Aspire to provide seamless observability from development to production. Here's how telemetry flows in different environments:

Local Development with .NET Aspire

When you run the TaskAgent.AppHost project, .NET Aspire automatically:

  1. Launches the Aspire Dashboard at https://localhost:17198

  2. Configures the OTLP exporter to send telemetry to the dashboard

  3. Displays real-time logs, traces, and metrics as you interact with the application

No Docker required: .NET Aspire Dashboard runs as a standalone process—you don't need Docker installed.

Production Deployment to Azure

In production, the same OpenTelemetry instrumentation automatically switches to Azure Monitor (Application Insights) based on configuration:

Key Insight: Same OpenTelemetry instrumentation, different exporters. The ServiceDefaults project detects which exporter to use based on environment variables:

  • OTEL_EXPORTER_OTLP_ENDPOINT → Aspire Dashboard (local)

  • APPLICATIONINSIGHTS_CONNECTION_STRING → Application Insights (production)

How the Project is Configured

The project includes .NET Aspire support out of the box. Here's what's already set up:

1. ServiceDefaults Project

The TaskAgent.ServiceDefaults project contains shared OpenTelemetry configuration that works in all environments. Key configuration includes:

ServiceDefaultsExtensions.cs

public static TBuilder AddServiceDefaults<TBuilder>(this TBuilder builder)
{
    builder.ConfigureOpenTelemetry();  // Sets up traces, metrics, logs
    builder.AddDefaultHealthChecks();   // Adds /health endpoint
    builder.Services.AddServiceDiscovery();
    // ... HTTP client configuration
    return builder;
}

The OpenTelemetry configuration automatically instruments:

  • ASP.NET Core (HTTP requests, responses)

  • HttpClient (outbound HTTP calls)

  • Entity Framework Core (database queries)

  • Custom meters: TaskAgent.Agent, TaskAgent.Functions

  • Custom activity sources: TaskAgent.Agent, TaskAgent.Functions

2. AppHost Project

The TaskAgent.AppHost project is the orchestrator for local development:

AppHost.cs

IDistributedApplicationBuilder builder = DistributedApplication.CreateBuilder(args);
builder.AddProject<Projects.TaskAgent_WebApp>("taskagent-webapp");
await builder.Build().RunAsync();

When you run this project (F5 in Visual Studio or dotnet run), it:

  1. Launches the Aspire Dashboard

  2. Starts the WebApp project

  3. Streams all telemetry to the dashboard in real-time

3. WebApp Integration

The TaskAgent.WebApp project is configured to use Service Defaults:

Program.cs

// Add Service Defaults (OpenTelemetry + Health Checks)
builder.AddServiceDefaults();

// ... other configuration

// Map health check endpoints
app.MapDefaultEndpoints();

This single line builder.AddServiceDefaults() enables all observability features.

Custom Agent Telemetry

Beyond the built-in instrumentation, the project includes custom telemetry specifically designed for AI agent operations.

Agent Metrics

The AgentMetrics class tracks AI-specific metrics using OpenTelemetry's Metrics API:

Key metrics tracked

// Counter: Total requests to the agent
_requestsCounter = _meter.CreateCounter<long>("agent.requests");

// Counter: Function calls made by the agent
_functionCallsCounter = _meter.CreateCounter<long>("agent.function_calls");

// Counter: Errors encountered
_errorsCounter = _meter.CreateCounter<long>("agent.errors");

// Histogram: Response duration (captures p50, p95, p99)
_responseDurationHistogram = _meter.CreateHistogram<double>("agent.response.duration");

How it's used in TaskAgentService:

_metrics.RecordRequest(threadId ?? "new", "processing");
// ... process message
_metrics.RecordResponseDuration(stopwatch.ElapsedMilliseconds, threadId, success);

Tags provide context

  • thread_id: Which conversation thread

  • status: success/error/processing

  • function_name: Which function tool was called

Agent Traces

The AgentActivitySource class creates custom spans for distributed tracing:

Creating spans

// Start a span for message processing
using var activity = AgentActivitySource.StartMessageActivity(threadId, message);

// Start a span for function calls
using var activity = AgentActivitySource.StartFunctionActivity("CreateTask", tags);

Tags on activities

activity.SetTag("thread.id", threadId);
activity.SetTag("function.name", functionName);
activity.SetTag("agent.operation", "process_message");

These spans appear in the trace waterfall, showing exactly how long each operation takes and how they relate to each other.

Running Locally with Aspire Dashboard

To see the observability features in action:

Option 1: Visual Studio

  • Set TaskAgent.AppHost as the startup project

  • Press F5

Option 2: Command Line

cd TaskAgent.AppHost
dotnet run

The Aspire Dashboard launches automatically at https://localhost:17198 (check console for the exact URL and authentication token).

Dashboard Features

The Aspire Dashboard provides real-time observability with several key views:

1. Structured Logs

  • Real-time log streaming from all services

  • Filter by severity, source, message content

  • View full context (request ID, trace ID, custom properties)

  • Search capabilities for quick troubleshooting

1. structured logs

2. Traces (Distributed Tracing)

  • See complete request flow through your system

  • Click on a trace to see span details and timing

  • Identify slow operations and bottlenecks

  • View function call parameters and results

  • Waterfall view showing parent-child relationships

2. traces

2. trace api

2. trace details

3. Metrics

  • Real-time charts for custom agent metrics

  • Track agent requests, function calls, errors

  • Response duration histograms showing p50, p95, p99

  • ASP.NET Core metrics (request rate, duration)

  • Runtime metrics (GC, memory, thread pool)

3. metrics

3. function calls

3. requests

3. request duration

4. Resources

  • View all running services and their status

  • Health check status for each service

  • Environment variables and configuration

  • Container/process information

Pro Tip: The dashboard auto-refreshes, making it perfect for development and debugging. You can pause streaming, adjust time windows, and export data for analysis.

Deploying to Azure with Application Insights

Now that you understand how observability works locally, let's deploy to Azure for production monitoring.

Prerequisites for Azure Deployment

Before deploying, ensure you have:

  • An active Azure subscription

  • Azure CLI installed and configured

  • Visual Studio 2022 or VS Code with Azure extensions

  • The compiled application ready to deploy

⚠️ Important Note on Azure Regions: Due to App Service quota limitations with certain subscriptions, this deployment targets multiple regions. While the ideal setup would have all resources in West US 2, the App Service in this example was successfully created in Central US due to quota availability. With a Pay-As-You-Go subscription, all resources can typically be deployed in any US region without quota issues.

💡 Security Note: For simplicity, this guide stores secrets in App Service configuration. In a real production environment, Azure Key Vault is highly recommended for secure credential management. This allows you to centralize secrets, enable automatic rotation, and integrate with managed identities for passwordless authentication.

Part A: Create Azure Resources via Azure Portal

1. Create Resource Group

  1. Sign in to the Azure Portal

  2. Click Resource groups from the left menu

  3. Click + Create

  4. Configure:

    • Subscription: Select your subscription

    • Resource group: rg-taskagent-prod

    • Region: Choose a region with available quota (e.g., East US, West Europe, Central US)

  5. Click Review + createCreate

2. Create Log Analytics Workspace

Application Insights requires a Log Analytics workspace for data storage.

  1. In the Azure Portal, search for Log Analytics workspaces

  2. Click + Create

  3. Configure:

    • Subscription: Your subscription

    • Resource group: rg-taskagent-prod

    • Name: log-taskagent-prod

    • Region: Same as resource group

  4. Click Review + createCreate

4. log analytics workspace

3. Create Application Insights

  1. Search for Application Insights in the Azure Portal

  2. Click + Create

  3. Configure:

    • Subscription: Your subscription

    • Resource group: rg-taskagent-prod

    • Name: appi-taskagent-prod

    • Region: Same as resource group

    • Log Analytics Workspace: Select log-taskagent-prod (created in step 2)

  4. Click Review + createCreate

  5. After deployment completes, navigate to the resource

  6. Copy the Connection String:

    • Go to Overview blade

    • Click on the Connection String value to copy it

    • Save this for later (format: InstrumentationKey=...;IngestionEndpoint=...)

5. application insights

4. Create SQL Server

  1. Search for SQL servers in the Azure Portal

  2. Click + Create

  3. Configure:

    • Subscription: Your subscription

    • Resource group: rg-taskagent-prod

    • Server name: (must be globally unique)

    • Location: Same as resource group

    • Authentication method: Use SQL authentication

      • Server admin login: Create a strong username

      • Password: Create a strong password (min 8 characters, mix of upper/lower/numbers/symbols)

      • Confirm password: Re-enter password

  4. Click Next: Networking

  5. Configure networking:

    • Connectivity method: Public endpoint

    • Allow Azure services and resources to access this server: Yes (required for App Service)

  6. Click Review + createCreate

  7. Save credentials securely: Username and Password

6. sql server

6. sql admin login

6. sql network

5. Create SQL Database

  1. After SQL Server deployment completes, go to the SQL server resource

  2. Click + Create database in the toolbar

  3. Configure:

    • Database name: sqldb-taskagent

    • Want to use SQL elastic pool?: No

    • Workload environment: Development

    • Compute + storage: Click Configure database

      • Service tier: Basic (for minimal workload, 1 table)

        • DTUs: 5 (default)

        • Max data size: 2 GB (sufficient for task data)

        • Cost: ~$5/month

    • Click Apply

  4. Click Review + createCreate

7. sql database

7. sql db creation

Database Configuration Notes:

  • Basic tier is suitable for development/testing with low transaction volume

  • Standard (S0) recommended for production with moderate traffic

  • You can scale up/down the SKU anytime based on performance needs

6. Configure SQL Database Connection String

  1. Navigate to the SQL Database (sqldb-taskagent)

  2. Go to SettingsConnection strings

  3. Copy the ADO.NET connection string

  4. Replace {your_password} with your SQL admin password

  5. Save this connection string for deployment configuration

7. sql db credentials

7. Create App Service Plan

  1. Search for App Service plans in the Azure Portal

  2. Click + Create

  3. Configure:

    • Subscription: Your subscription

    • Resource group: rg-taskagent-prod

    • Name: asp-taskagent-prod

    • Operating System: Linux

    • Region: Same as Resource Group

    • Pricing tier: Click Explore pricing plans

      • Choose B1 (Basic)

        • Features: 1.75 GB RAM, 1 vCore, 10 GB storage

        • Cost: ~$12.50/month

        • Always On: Included (prevents cold starts)

        • Custom domains/SSL: Supported

      • Click Select

  4. Click Review + createCreate

8. app service plan

8. Create App Service (Web App)

  1. Search for App Services in the Azure Portal

  2. Click + CreateWeb App

  3. Configure:

    • Subscription: Your subscription

    • Resource group: rg-taskagent-prod

    • Name: (must be globally unique)

    • Publish: Code

    • Runtime stack: .NET 9 (LTS)

    • Operating System: Linux

    • Region: Same as Service Plan

    • App Service Plan: Select the created in step 7

  1. Go to Monitoring

  2. Configure monitoring:

    • Enable Application Insights: Yes

    • Application Insights: Select app-taskagent-prod

    • Application Insights Region: Auto-selected

  3. Click Review + createCreate

9. app service basic

9. Configure App Service Settings

After the App Service is created, configure the application settings:

  1. Navigate to your App Service

  2. Go to SettingsEnvironment variables

  3. In Connection strings tab, add:

  • Name: DefaultConnection

  • Value: [Connection string from step 6]

  • Type: SQLAzure

  • Click Apply

9. app service settings

Azure OpenAI Settings (add these if not already configured):

NameValue
AzureOpenAI__Endpointhttps://[your-openai-resource].openai.azure.com/
AzureOpenAI__ApiKey[Your Azure OpenAI API key]
AzureOpenAI__DeploymentNamegpt-4o-mini

Azure Content Safety Settings:

NameValue
ContentSafety__Endpointhttps://[your-contentsafety-resource].cognitiveservices.azure.com/
ContentSafety__ApiKey[Your Content Safety API key]
  1. Click Save at the top (the app will restart)

10. Configure IP Restrictions (Security)

Restrict access to your App Service by IP address for enhanced security:

  1. In your App Service, go to SettingsNetworking

  2. Click Public Network Access under Inbound Traffic Configuration

  3. Enabled from select virtual networks

  4. Click + Add rule

  5. Configure:

    • Name: AllowCorporateNetwork

    • Action: Allow

    • Priority: 100

    • Type: IPv4

    • IP Address Block: [Your IP address]/32 (e.g., 203.0.113.45/32)

    • Description: Allow access from corporate network

  6. Click Add rule

  7. Click Save

IP Restriction Notes:

  • /32 = Single IP address (most restrictive)

  • /24 = Range of 256 addresses (e.g., office network)

  • You can add multiple rules with different priorities

  • Lower priority numbers are evaluated first

  • Azure services (if needed) can be allowed separately

9. app service networking

9. app service public access

9. app service add rule

Part B: Deploy Application to Azure

Now we'll set up continuous deployment from GitHub to Azure App Service using GitHub Actions.

Create Service Principal for GitHub Actions

We'll use Azure CLI to create a Service Principal that GitHub Actions will use to authenticate and deploy to Azure.

# Create service principal with contributor role scoped to resource group
az ad sp create-for-rbac `
  --name "github-actions-app-taskagent-prod" `
  --role contributor `
  --scopes /subscriptions/$(az account show --query id -o tsv)/resourceGroups/rg-taskagent-prod `
  --sdk-auth

Output: Copy the entire JSON output—you'll need it for GitHub Secrets:

{
  "clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "clientSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

⚠️ Security: Save this JSON securely—the clientSecret is displayed only once.

Set Up GitHub Actions Workflow from Visual Studio

Visual Studio 2022 provides a built-in feature to generate and configure GitHub Actions workflows for Azure deployments.

Steps

  1. Open your solution in Visual Studio 2022

  2. Right-click the TaskAgent.WebApp project → Publish

  3. Select Target:

    • Choose AzureNext

    • Choose Azure App Service (Linux)Next

  4. Select App Service

    • Sign in to your Azure account

    • Select Subscription: Your subscription

    • Select Resource Group: rg-taskagent-prod

    • Select App Service

    • Click Finish

  5. Configure CI/CD:

    • In the Publish profile screen, click on More actions (...) dropdown

    • Select Configure Continuous Deployment

    • Check "Enable continuous deployment"

    • Source control: Select GitHub

    • Authenticate with GitHub if prompted

    • Organization: Select your GitHub organization/username

    • Repository: Select your repository (e.g., TaskAgent-AgenticAI)

    • Branch: Select main

  6. Generate Workflow:

    • Visual Studio will automatically generate the GitHub Actions workflow file

Generated Workflow (.github/workflows/main_app-taskagent-prod.yml):

Visual Studio creates a workflow similar to this:

name: Build and deploy .NET application to Web App
on:
  push:
    branches:
      - main
env:
  AZURE_WEBAPP_NAME: xxxxxxxxxx
  AZURE_WEBAPP_PACKAGE_PATH: src/services/TaskAgent/src/TaskAgent.WebApp/published
  CONFIGURATION: Release
  DOTNET_CORE_VERSION: 9.0.x
  WORKING_DIRECTORY: src/services/TaskAgent/src/TaskAgent.WebApp
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup .NET SDK
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: ${{ env.DOTNET_CORE_VERSION }}
      - name: Restore
        run: dotnet restore "${{ env.WORKING_DIRECTORY }}"
      - name: Build
        run: dotnet build "${{ env.WORKING_DIRECTORY }}" --configuration ${{ env.CONFIGURATION }} --no-restore
      - name: Test
        run: dotnet test "${{ env.WORKING_DIRECTORY }}" --no-build
      - name: Publish
        run: dotnet publish "${{ env.WORKING_DIRECTORY }}" --configuration ${{ env.CONFIGURATION }} --no-build --output "${{ env.AZURE_WEBAPP_PACKAGE_PATH }}"
      - name: Publish Artifacts
        uses: actions/upload-artifact@v4
        with:
          name: webapp
          path: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}
  deploy:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Download artifact from build job
        uses: actions/download-artifact@v4
        with:
          name: webapp
          path: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}
      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.app_taskagent_prod_SPN }}
      - name: Deploy to Azure WebApp
        uses: azure/webapps-deploy@v2
        with:
          app-name: ${{ env.AZURE_WEBAPP_NAME }}
          package: ${{ env.AZURE_WEBAPP_PACKAGE_PATH }}

GitHub Secrets Setup

  1. Navigate to GitHub Repository Settings:

    • Go to your repository on GitHub

    • Click SettingsSecrets and variablesActions

  2. Add Azure Credentials Secret:

    • Click New repository secret

    • Name: APP_TASKAGENT_PROD_SPN

    • Value: Paste the entire JSON output from the service principal creation

    • Click Add secret

GitHub secrets

Verify Deployment and Application Insights Integration

  1. Access your deployed application:

    • URL: https://xxxxxxx.azurewebsites.net

  2. Test the AI agent:

    • Send a test message to /api/chatendpoint from the chat

    • Create a task to generate telemetry

  3. View telemetry in Application Insights:

    • Go to Azure Portal → Application Insights

    • Wait 2-5 minutes for telemetry to appear (initial ingestion delay)

    • Navigate through the available monitoring features

Monitoring Your Agent in Application Insights

Application Insights provides the same observability features you saw in Aspire Dashboard, but is designed for production environments. Here's where to find key information:

Distributed Traces

  • Go to InvestigateTransaction search to find specific requests

  • Click on any transaction to see the complete trace with all spans

  • View timing breakdowns: controller → service → database → Azure OpenAI

  • Identify slow operations and bottlenecks

10. distributed traces

11. metric details

Custom Metrics

  • Go to MonitoringMetrics to create charts

  • Select metrics like agent.requests, agent.response.duration, agent.function_calls

  • Add filters and dimensions (e.g., by function name, model, success status)

  • Create dashboards to visualize trends over time

11. metrics

Structured Logs

  • Go to MonitoringLogs to query with KQL (Kusto Query Language)

  • Filter logs by severity level, time range, or custom properties

  • View detailed context for each log entry including thread IDs and parameters

  • Correlate logs with traces using operation IDs

11. logs

Real-Time Monitoring

  • Go to InvestigateLive Metrics for real-time telemetry

  • See incoming requests per second, response times, and errors as they happen

  • Monitor server resources (CPU, memory) in real-time

  • View live traces and exceptions

Performance Analysis

  • Go to InvestigatePerformance to analyze slow operations

  • See operation duration percentiles (p50, p95, p99)

  • Identify slowest dependencies and operations

  • Drill down into specific slow requests

11. performance

Failure Investigation

  • Go to InvestigateFailures to analyze errors

  • View exception types and counts

  • See failed operations and dependencies

  • Access full stack traces and context

Application Map

  • Go to InvestigateApplication Map for a visual architecture view

  • See relationships between your App Service, Azure OpenAI, and SQL Database

  • Identify dependency health and response times

  • Quickly spot failed dependency calls (highlighted in red)

11. application map

Example KQL Queries for AI Agent Monitoring

// Average response time by endpoint
requests
| where timestamp > ago(1h)
| summarize avg(duration), percentiles(duration, 50, 95, 99) by name
| order by avg_duration desc

// Agent errors in the last 24 hours
traces
| where timestamp > ago(24h)
| where severityLevel >= 3 // Warning and above
| where message contains "Agent" or message contains "Function"
| project timestamp, severityLevel, message, operation_Id
| order by timestamp desc

// Trace a specific conversation thread
traces
| where customDimensions.["thread.id"] == "your-thread-id-here"
| order by timestamp asc
| project timestamp, message, severityLevel

These queries help you understand agent behavior, optimize performance, and troubleshoot issues in production. You can save frequently used queries and create custom dashboards combining multiple visualizations.

Cost Management for Azure Resources

Estimated Monthly Costs (based on SKUs selected):

ResourceSKUEstimated Cost
App Service PlanB1 (Basic)~$13/month
SQL DatabaseBasic (5 DTUs)~$5/month
Application InsightsPay-as-you-go~$2-10/month (depends on data ingestion)
Log AnalyticsPay-as-you-goFirst 5 GB/month free, then $2.30/GB
Total~$20-30/month

Cost Optimization Tips

  1. Stop App Service when not in use (dev/test environments)

    az webapp stop --name xxxxx --resource-group rg-taskagent-prod
    az webapp start --name xxxxx --resource-group rg-taskagent-prod
  2. Scale down SQL Database during off-peak hours (if traffic is low)

  3. Set Application Insights sampling rate to reduce ingestion volume:

    builder.Services.AddApplicationInsightsTelemetry(options =>
    {
        options.SamplingPercentage = 50; // Sample 50% of telemetry
    });
  4. Configure data retention in Application Insights:

    • Default: 90 days

    • Reduce to 30 days for cost savings

    • Go to ConfigureUsage and estimated costsData retention

  5. Monitor Azure costs regularly:

    • Azure Portal → Cost Management + Billing

    • Set up budget alerts

Security Best Practices

1. Use HTTPS Only

Ensure HTTPS is enforced for all connections:

  • Go to App Service → SettingsConfigurationGeneral settings

  • Verify HTTPS Only: On

2. Enable Managed Identity

Use System-Assigned Managed Identity to access Azure resources without storing credentials:

  • App Service has this enabled by default

  • Use it to authenticate with Azure Key Vault, Azure OpenAI, and other services

  • Eliminates the need to manage connection strings and API keys in configuration

3. Regular Security Updates

  • Keep .NET runtime updated to the latest version

  • Monitor for security advisories: https://github.com/dotnet/announcements

  • Update NuGet packages regularly to patch vulnerabilities

  • Subscribe to Azure Security Center recommendations

4. Review Access Controls

  • Use Role-Based Access Control (RBAC) for Azure resources

  • Follow principle of least privilege

  • Regularly audit who has access to production resources

  • Enable Azure AD authentication for database connections when possible

Best Practices for AI Agent Observability

1. Use Semantic Tags

Add meaningful tags to activities and metrics for better filtering and analysis:

activity?.SetTag("agent.operation", "process_message");
activity?.SetTag("function.name", "CreateTask");
activity?.SetTag("thread.id", threadId);
activity?.SetTag("task.priority", "High");

2. Use Structured Logging

Always use structured logging with named parameters instead of string interpolation:

// ❌ Bad
_logger.LogInformation($"Task {taskId} created");

// ✅ Good
_logger.LogInformation("Task created: {TaskId}, Priority: {Priority}", taskId, priority);

3. Set Activity Status on Errors

Mark activities as failed when exceptions occur for better trace analysis:

catch (Exception ex)
{
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    _logger.LogError(ex, "Function call failed");
}

4. Use Histograms for Duration

Use histograms instead of counters for duration metrics to capture percentiles:

// ❌ Bad - Counter
_durationCounter.Add(durationMs);

// ✅ Good - Histogram (shows p50, p95, p99)
_responseDurationHistogram.Record(durationMs);

Analyzing Agent Performance

Key Metrics to Monitor

MetricTargetAlert Threshold
Request RateDepends on load-
Response Time (p95)< 1000ms> 2000ms
Error Rate< 1%> 5%
Function Call Success Rate> 99%< 95%
Active ThreadsMonitor for leaks-

Example Kusto Query (Application Insights)

Query agent response times by function:

customMetrics
| where name == "agent.response.duration"
| extend functionName = tostring(customDimensions.["function_name"])
| summarize
    avg(value),
    percentile(value, 50),
    percentile(value, 95),
    percentile(value, 99)
  by functionName
| order by avg_value desc

Troubleshooting

Telemetry Not Reaching Application Insights

// Verify connection string is set
var connectionString = builder.Configuration["APPLICATIONINSIGHTS_CONNECTION_STRING"];
if(string.IsNullOrEmpty(connectionString))
{
    throw new InvalidOperationException("Application Insights connection string not configured");
}

Common Azure Deployment Issues

Issue: App Service shows "Application Error" after deployment

Solutions

  1. Check Application Insights → Failures for exception details

  2. View App Service logs:

    az webapp log tail --name xxxxxxxx --resource-group rg-taskagent-prod
  3. Verify all app settings are configured correctly

  4. Ensure database migrations were applied successfully

Issue: Cannot connect to SQL Database

Solutions:

  1. Verify firewall rules allow App Service IP:

    • SQL Server → SecurityNetworking

    • Ensure "Allow Azure services" is enabled

  2. Test connection string locally:

    sqlcmd -S sql-taskagent-prod.database.windows.net -d xxxxxxx -U xxx -P [password]
  3. Check if database was created and migrations applied

Issue: Application Insights not showing telemetry

Solutions

  1. Wait 5-10 minutes for initial ingestion (first deployment)

  2. Verify connection string in App Service configuration

  3. Check if ApplicationInsightsAgent_EXTENSION_VERSION is set to ~3

  4. Restart App Service:

    az webapp restart --name xxxxxxxx --resource-group rg-taskagent-prod

Issue: High costs on Application Insights

Solutions

  1. Enable sampling (50-90%):

    builder.Services.AddApplicationInsightsTelemetry(options =>
    {
        options.SamplingPercentage = 50;
    });
  2. Reduce log verbosity (only Warning and above in production)

  3. Set data retention to 30 days instead of 90

  4. Review Usage and estimated costs blade to identify high-volume sources

Issue: IP restrictions preventing access

Solutions:

  1. Verify your current IP: https://www.whatismyip.com/

  2. Add your IP to access restriction rules

  3. Temporarily disable restrictions for testing:

    • App Service → NetworkingAccess restriction

    • Remove all rules to allow all IPs (not recommended for production)

Getting Help

If you encounter issues not covered here:

  1. Check Application Insights Logs: Most errors are logged with detailed context

  2. Azure Support: Submit a support ticket through the Azure Portal

  3. GitHub Issues: Report issues or ask questions

  4. Stack Overflow: Tag questions with azure-app-service, azure-application-insights, .net-9

What's Next?

In the next articles of this series, we'll continue building on this foundation by adding more advanced capabilities to our Task Agent.

Coming up:

  • Advanced AI features and capabilities

  • Enhanced user experience

  • Performance optimization

  • And much more!

Stay tuned for the next article where we'll explore exciting new features for our AI agent.

Resources

Documentation

Source Code

Related Articles

Conclusion

Observability is not optional for production AI agents; it's essential. With .NET Aspire, OpenTelemetry, and Application Insights, you now have:

  • Real-time visibility into agent behavior during development

  • Production monitoring with Azure Application Insights

  • Custom metrics for AI-specific concerns (function calls, response times, errors)

  • Distributed tracing for complex multi-step agent interactions

  • Structured logging for debugging and compliance

Start with the Aspire Dashboard for local development, then deploy the same instrumentation to Azure. Your future self (and your ops team) will thank you.