Overview Of Azure Stream Analytics

For such a huge volume of data to be analysed in real time, Microsoft has provided a SAAS feature called Azure Stream Analytics.

Introduction

As the Internet of Things (IoT) is being  embraced widely in the current technological revolution, along comes a huge volume of data getting generated from the things that are connected to the internet.

For such a huge volume of data to be analyzed in real time, Microsoft has provided a SAAS feature called Azure Stream Analytics.

Features of Azure Stream Analytics

  • It is an engine to process live streams of data that are injected in to it.
  • It can be configured to act on data injected in a certain period of any time unit like seconds, minutes, hours and days. The maximum number being 7 days.
  • The input data can come from any of the Azure resources like Azure Event Hub, IoT hub and Blob Storage.
  • The data getting injected are processed using SQL like queries and the output can be stored in any other Azure data storage platform like Cosmos DB, Storage Queue, Service Bus Queue Table Storage and many more.

Caveats of Azure Stream Analytics

  • If you want to call any other services or access data that are beyond the capabilities of the ASA’s query then you shall go for Azure Functions.
  • An event hub with a single consumer group cannot contain more than one trigger which means you cannot have a stream analytics job if you already have an Azure Function trigger hooked to the even hub.

Scenario

Enough of theory, let's now get practical with an example. Imagine there is a manufacturing organization with manufacturing plants all over a country. The plants will have trucks frequently leaving from it to supply the manufactured parts to the respective customers.

Whenever a truck leaves a plant details like the truck no., plant name, departure time and customer name would be sent to an Azure Event Hub which will then be analyzed through Azure Stream Analytics to capture the total number of trucks that have left from each plant to each customer and store the insight data in to Azure Cosmos DB.

Steps Involved

  1. Login to your Azure portal and create an Azure Event Hub namespace and Event Hub instance in it. I have created an event hub namespace called ContosoEventsNamespace and an event hub called ContosoTruckEvents.

  2. Create an Azure Cosmos DB SQL account called ContosoDB and new collections in it which will store the insights of the trucks utilizations.

  3. Create a Stream Analytics job called ContosoTrucksJob which will take the events posted to the event hub as the input data to analyze and post the output to the Cosmos DB SQL database.

  4. Configure the above created event hub as the input for the job as shown below,

    event hub
  1. Configure the above created Cosmos DB SQL API as the output for the job as shown below,

    Cosmos DB SQL API
  1. Now configure the below query to transform the input events in the event hub to be grouped by plant name, customer name and tumbling window and insert into the Cosmos DB.
    1. SELECT PlantName, CustomerName, COUNT( * ) AS[Count]  
    2. INTO[ContosoTrucksOut]  
    3. FROM[ContosoEventsIn] TIMESTAMP BY DepartureTime  
    4. GROUP BY  
    5. PlantName,  
    6. CustomerName,  
    7. TumblingWindow(minutes, 10)  
  1. By adding DepartureTime as the timestamp in the query, we configure the DepartureTime field from the input to be used in Tumbling Window.

  2. Tumbling Window is used to group the events posted in certain time intervals and analyze them based on your query. This article explains briefly about tumbling window.

  3. Go to the overview page of the Azure Stream Analytics and click on the “Start” button, select Now and click on Ok. The ASA job would get started and will be ready to capture the data from the event hub and analyze it.

  4. Below is the code to send event data to the Azure Event Hub that we created.
    1. var eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, eventHubName);  
    2. var truckEvent = new {  
    3.     PlantName = "XXXXX Fittings",  
    4.         DepartureTime = DateTime.UtcNow.ToString(),  
    5.         CustomerName = "XXXXXXX Pvt Ltd",  
    6.         TruckId = "XXX-8906"  
    7. };  
    8. eventHubClient.Send(new EventData(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(truckEvent))));  
  1. Add a few more sample event data with the above details to the Event Hub and check the Cosmos DB. Make sure the sample data contains repeating plant names and customer names so that they get grouped by the stream analytics.

  2. Finish adding your sample data within 10 minutes from the start time of the job; we have specified 10 minutes as the tumbling window in our stream analytics.

  3. After 10 minutes check your Cosmos DB collection to see JSON documents created. Each document should contain the plant name, customer name and the count which is the total number of trucks that left the plant for a certain customer.

    Cosmos DB