Processing Data With Azure Stream Analytics

Introduction

 
Azure Stream Analytics is a general purpose solution for processing data in real time on an IoT scale. That means lots of data from many sources are being processed and analyzed in real time. That analysis happens using Stream Analytics queries, which are crafted using a SQL-like query language. We are going to be exploring Azure Stream analytics with IoT Hub. IoT Hub and Stream Analytics can work together right out-of-the-box, no coding required. Once we link the two services together, data  is pushed up to our Hub and is routed over to our Stream Analytics job in real time where it can be processed immediately.
 
Road Map
  1. Stream Analytics with Azure IoT Hub
    1. Azure IoT Hub and stream analytics setup
    2. Linking IoT hub and Stream Analytics
  2. Processing data with a Stream Analytics job
    1. Preparing sample data
    2. Writing a stream analytics query
  3. Stream analytics output
    1. Creating a blob storage output
    2. Running a stream analytics job
  4. IoT hub message routing
    1. Configuring IoT Hub Message Routing

Stream Analytics with Azure IoT Hub

 
Azure IoT Hub and stream analytics setup
  1. Let's head on over to the azure portal to get started.

    Processing Data With Azure Stream Analytics
  1. Begin by creating a resource group. Click Resource groups from the menu on the left to access the Resource Groups window (see figure below). You will see all the resource groups in your subscription listed in the window.

    Processing Data With Azure Stream Analytics
  1. Click Add (+) to create a new resource group. The Create Resource Group window appears.

    Processing Data With Azure Stream Analytics
  1. Provide the following information for the new resource group.

    Processing Data With Azure Stream Analytics

  2. Click Create.
  3. Once you’re done making a resource group. Firstly you will need to create an IoT hub service. Enter IoT Hub in the Search the Marketplace field.
  4. Select IoT Hub from the search results, and then select Create.
  5. Complete the following fields,

    1. Subscription
    2. Resource Group (Make sure it is the same one you created previously)
    3. Region
    4. IoT Hub Name

      Processing Data With Azure Stream Analytics
  1. Select Next
    Size and scale to continue creating your hub.

    Processing Data With Azure Stream Analytics

    1. This screen allows you to set the following values,

      1. Pricing and scale tier
      2. IoT hub units

        And a few advanced settings.
  1. Finally select Review + create to review your choices. And then click on Create to create your new hub.
  2. Once you’re done creating an IoT hub you will now need to create a Stream Analytics Job. Go into your newly created resource group where you have added the IoT hub and click on the (Add +) button.
  3. Then search for “Stream analytics” and select the “Stream Analytics Job” option. Once the option has been selected and you are on the service page, select “create” to get started.

    Processing Data With Azure Stream Analytics
  1. Now fill out the important setting information and then click on create to finally have a Stream Analytics Job as a part of your current resource group.

    Processing Data With Azure Stream Analytics

  2. Specify how many streaming units you want to allocate to your job for a real solution. Some trial and error is required to determine the right number of units. But since this is a demo and in the interest of saving US money, I’ve selected one streaming unit.
Linking IoT hub and Stream Analytics
 
We now have a new Azure Stream Analytics job. We just need to link that job to our existing IoT hub. In order to link our hub to our Stream Analytics job, we'll need to open our job settings. Let's go into Resource groups and into our target resource group, and finally, into our brand-new Stream Analytics job.
 
Processing Data With Azure Stream Analytics 
 
Our job has been created, but it isn't actually running yet. That's good, because we haven't told our job what to process yet. We can do that by defining inputs that link it to our hub. Under job topology, let's go into inputs, and from here there are two types of inputs we can add, a stream input or a reference input.
 
Processing Data With Azure Stream Analytics 
 
Reference data inputs are essentially lookup tables that you can join to your incoming data. You can read more about them here at the official Microsoft documentation. What we want to add is a stream input, though. As you can see, we have a few options. The one we want is IoT Hub. All inputs need an alias. If you are familiar with SQL, you can think of this alias like a table name. This is what we'll refer to our data as in our Stream Analytics queries.
 
Processing Data With Azure Stream Analytics 
 
I have named mine datapoints. Now we need to specify which hub to use. We're going to use one from our subscription, and since I only have a single hub, the Azure portal helpfully populated everything else for me. There are other options we could adjust, but the defaults are exactly what we want in this case. So let's just save our input, and yes, it's time to wait, but this time just for a few seconds hopefully, and once it completes you should see a message like this. That means our input is now linked and we're ready to write our Stream Analytics query.
 

Processing data with a Stream Analytics job

 
Our job is in place, and it's linked to our hub. As data arrives from our hub, the data is routed from our hub over to Stream Analytics via our input. Stream Analytics will then process that using whatever query we define, and output the data wherever we tell it to. But how do we write a query? What does that even look like? Azure Stream Analytics queries are written using a SQL-like language.
 
Processing Data With Azure Stream Analytics 
 
A query consists of the normal clauses you would expect, including a select clause and a FROM clause. There's also a TIMESTAMP BY here, which tells Stream Analytics which property from our input represents the timestamp for the event that it's processing. Queries can also contain aggregate functions like average, and they can contain a GROUP BY clause which groups by values in our input. And this is where the power of Stream Analytics starts to come in.
 
Preparing sample data
 
Firstly, we’re going to manually provision a device. We can do that by going into our hub, which is on the dashboard, and then scrolling down to IoT devices. From here just click Add, and specify the properties for the new device. I'm going to name mine generator-01. I'm going to leave the authentication type set to symmetric key, and I'm going to allow it to auto-generate the keys for me, so all I need to do is save my new device.
 
Processing Data With Azure Stream Analytics 
 
Now I need to get my device's connection string, which I can get by clicking on the device and then clicking here to copy the primary connection string to my clipboard.
 
Processing Data With Azure Stream Analytics 
 
Now I'm going to go over to my terminal, and as you can see, I've already changed into the generator-sample projects directory. All I need to do now is execute the “dotnet run” command and then paste in my device's connection string that I copied to my clipboard. Be careful here, though. Be sure you put quotation marks around the connection string, like so. Now I can press Enter to start the device.
 
Processing Data With Azure Stream Analytics 
 
My device is now happily sending data to the cloud. Let's leave it running for a few minutes so that we have some sample data to work with. Leave it running for about 2 minutes, so we should have plenty of events to work with now. I can go ahead and press Enter to stop the device, but I'm not going to close out of my console. We're actually going to run this same application again later in this module, so you might want to leave it open. That way we can quickly re-execute that same command later. For now, let's go write our query over in the Azure portal. Once again, we can get to our job by going to resource groups, into our target resource group, and then going into our Stream Analytics job. There are actually a couple of ways we can get to the query editor.
 
There's one right here on the overview page.
 
Processing Data With Azure Stream Analytics 
 
This is what the query editor looks like.
 
We have our inputs and outputs on the left and a starting query over here in our editor.
 
Processing Data With Azure Stream Analytics 
 
We can actually test our query out as we go along by clicking the Test button here, or we'll be able to once we load in some sample data. I can load in some sample data by clicking here on my datapoints input, and then I can choose to either upload data from a file or to sample data directly from the input. The reason we ran our generator device was so that we'd have some sample data, so let's go ahead and choose Sample data from input. Now we get to choose the start date and time, and how much data to pull in.
 
Writing a stream analytics query
 
In this demo, we're going to use the sample data that we imported to write and test a Stream Analytics query. I'm still here in my Stream Analytics job. In the last demo, we pulled in our sample data, which will help us as we craft our query. We can see that sample data by updating our query to pull from our datapoints endpoint like so. Now we can click Test to see the output. This will return one row per message that was processed. There are simple columns like SENSORNAME and VALUE, but there's also this IoT Hub property. That property actually looks more like this.
 
Processing Data With Azure Stream Analytics 
 
It's an object with properties that are specific to IoT Hub, including the DeviceId, and when the message was enqueued at the hub. So what are we going to do with this data? Let's keep it simple. Let's gather the min, the max, and the average sensor values for each sensor within 15-second intervals. I'm going to go ahead and drop in a query here, and then we'll walk through it. And let's reformat this a bit so that we can actually see our full query. Well, that may be about as good as we can get. This should look familiar. We have the major pieces of a typical query here. There's a SELECT INTO clause, a FROM clause, and a GROUP BY clause. The FROM clause is a tad different from your typical SQL clause, though. It has a TIMESTAMP BY component.
 
Processing Data With Azure Stream Analytics 
 
The data has been grouped by ‘DeviceId’ and ‘SensorName’, and it has been grouped into 15-second intervals. Grouping the data this way makes it very easy to spot outliers like in this example here where the temperature and RPM spiked.
 

Stream analytics output

 
We now have a Stream Analytics query in place to process the data as it comes in, but where does that data go? We need to define an output and then modify our query to push the data there. Azure Stream Analytics can natively push data to several other Azure resources. If we wanted to push the data somewhere traditional, it can output directly to a SQL database, or to Azure Cosmos DB. If we wanted to push the data to a full-featured data processing and analytics platform, it also supports using Azure Data Lake as an output. We can also target Azure Table and Blob storage, or Azure Service Bus, and if we want to push the data somewhere that it can be quickly and easily visualized, we could have Stream Analytics push directly to Power BI. And if none of these work for us, and we wanted to send our data somewhere else, we can actually output to an Azure function, and once we do that, our function could send that data anywhere we want, so yeah, we have a lot of choices and flexibility for where we send our output. For now, I'm going to show you how to send data to Blob storage, but this process, it's similar for all the other output types as well. Let's check it out.
 
Creating a blob storage output
 
We're now going to wire our existing Stream Analytics job to a new Blob storage container.
 
Let's go over to our Resource groups and into our target resource group and click the Add button. Storage account might be one of the suggestions that shows up here, but just in case it isn't for you, you can find it by searching for storage account. This is the one we want. Storage account blob, file, table, queue by Microsoft. Go ahead and double-check that your storage account is in whatever subscription and resource group that you've been using.
 
Processing Data With Azure Stream Analytics 
 
I'm going to go ahead and click Review and create. The validation all checks out, so I can go ahead and create my storage account. This is going to take anywhere from a few seconds to a few minutes.
 
Now we're ready to configure our output. Let's go ahead and go back to our resource. Now let's go back into our Stream Analytics job. We can manage our job's outputs from here under job topology. We, of course, don't have any outputs yet, so let's go ahead and click Add, and here are all the different places that we could send our output to. We want to go with Blob storage. First things first, we need to give our Blob storage output a name. We could specify the Blob storage account to use manually, but instead I'm going to select one from my subscription, which, actually, the Azure portal has already helpfully filled in for me. If you have access to multiple storage accounts, be sure you select the right one. My storage account is empty, so I do need to go ahead and create a new container.
 
Processing Data With Azure Stream Analytics 
 
If you aren't familiar with Blob storage, you can think of a container as a top-level folder basically. I'm going to name mine datapoints. The defaults are good for everything else on here, so I'm just going to go ahead and save mine. Azure will take a minute to test our connection and make sure it works.
 
Processing Data With Azure Stream Analytics 
 
There, we've now defined our output, but our query isn't actually using our output yet, so let's go back over to our query, and instead of selecting everything into your output alias here, let's select everything into blob storage.
 
Processing Data With Azure Stream Analytics 
 
That looks good, but it won't actually do anything if we forget to save our changes, as I have done more than once in the past, so be sure you go ahead and click this Save button right here, right now. We should be ready to run our job, run our generator, and watch as data flows through our job into our output. Let's give it a shot.
 
Running a Stream Analytics Job
 
We're ready to run our Stream Analytics job.
 
The very first thing we need to do is start our Stream Analytics job. We can do that by clicking the Start button here on the job overview.
 
Processing Data With Azure Stream Analytics 
 
It will take a few minutes for our job to start up. When it starts, you will see “Running” on the overview. Our job is now ready and waiting for data, so let's turn on our virtual generator. I'm going to run the command to start my generator, which is saved in my terminal history, but if you don't have that command saved, you just need to grab the connection string for your device that you created, and pass it in as an argument when you run dotnet run, like so.
 
Processing Data With Azure Stream Analytics 
 
Okay, there we go, the generator is now running. Let's go on over to the Azure portal. If we scroll down the summary page here and we wait a few minutes, we'll eventually see that there is data flowing in and out of our hub.
 
Processing Data With Azure Stream Analytics 
 
We can also monitor our resource utilization here as well. If we want to see the actual output, we can do that over at our Blob storage container. All we need to do is go onto see our resources. Select the created storage account and click on Blobs, and then click on the container, which I named as datapoints previously, and then the file that you see in the open window, this is the output of our Stream Analytics job.
 
Processing Data With Azure Stream Analytics 
 
We could download the file to check out its contents, but we can actually also click on Edit blob here to take a peek. We won't make any changes, but we can see that, sure enough, here is the aggregate information that our job created, average, min, and max. All this data is here, ready for us to do something interesting with.
 

IoT Hub Message Routing

 
Our data is now being piped from our IoT Hub over to Stream Analytics. The aggregate information is being persisted over in Blob storage, ready for us to analyze further. But what about the raw data? Sometimes, having the raw data is necessary for debugging purposes, or sometimes it's even required for audit reasons. One option is to use IoT Hub message routing to send our incoming data to multiple places. We can still have our data flowing to Stream Analytics, but we can also define a second route that will send our raw incoming data directly to Azure Blob storage. These two routes will operate independently, so the performance of one is not affected by the other.
 
Configuring IoT Hub Message Routing
 
Now we're going to use IoT Hub message routing to send a copy of all of our raw data over to Azure Blob storage. We can manage the routing rules by going into our hub, then scrolling down until we see Message routing. This is where we can manage our routes and endpoints. 
 
Processing Data With Azure Stream Analytics 
 
Let's start by defining a custom endpoint. We don't actually have any custom endpoints yet, so let's go ahead and add one. For the sake of simplicity, we're going to use Blob storage.
 
Processing Data With Azure Stream Analytics 
 
First we need to name our endpoint. I'm going to call mine rawdata.
 
Processing Data With Azure Stream Analytics 
 
Now we need to pick a container to output our data to. First we choose the storage account that we want to use, then we pick an existing container or add a new one. We do not want to put this in the same datapoints container that our Stream Analytics job is outputting to. That would just get confusing for us later. Instead, let's create new container. I'm going to be consistent and call this container rawdata.
 
Processing Data With Azure Stream Analytics 
 
That way it matches what I'm naming my custom endpoint as well. Now I just need to click on my new container, and select it. There are a few other options we can adjust here. The batch frequency controls the maximum amount of time that will pass before the data will be flushed to the blob. It can't be any more often than once a minute. The chunk window size controls the size of the blob in megabytes. Finally, we can control how the blob that gets created is named. This isn't actually as flexible as you might think or wish. You have to use all of these tokens in the path. I'm just going to leave that as it is. Let's go ahead and create this custom endpoint. It will take a few seconds for everything to refresh, and even after it completes you might not actually see your custom endpoint until you refresh the page, but eventually you should see your custom endpoint here under Blob storage.
 
Processing Data With Azure Stream Analytics 
 
Now we're ready to define a route. Let's go ahead and add a new custom route. Again, just to be consistent, I'm going to name my new route rawdata. That's just a convention I'm following. The names do not have to match between your endpoint and your route.
 
Processing Data With Azure Stream Analytics 
 
Now I'm going to choose the custom endpoint I just made, which is this rawdata one here under Azure Storage containers.
 
Processing Data With Azure Stream Analytics 
 
The only other thing I need to do is define my query. I'm just going to put in a static value of true. That will cause every message to be sent through this route. I think everything else looks good, so let's go ahead and save our route.
 
Processing Data With Azure Stream Analytics 
 
Perfect, now let's see if this works. Let's go back over to our console, and let's start up our trusty generator again. We'll need to let this run for a minute or so before the data will be batched and written to our blob.
 
Processing Data With Azure Stream Analytics 
 
Let's go back over to the portal, and let's go into our Blob storage account. The data is going to be here under Blobs, and if all goes well, it should be in the rawdata container here. The default output will create a folder for our hub, and another one for the partition, then a folder for the year, the month, the day, the hour of the day, and a file for the minute.
 
Processing Data With Azure Stream Analytics 
 
Let's see what's in this file using the Edit blob functionality.
 
Processing Data With Azure Stream Analytics 
 
All the raw data is there, and we can get it back out of this file if we need to. And while the raw data is being written here, the data is also being directed over to our Stream Analytics job. If we hadn't stopped that job, that job would actually still be running and processing data independently of the data being archived here.
 
Processing Data With Azure Stream Analytics 
 

Summary

 
So far, we learned all about Azure Stream Analytics. We covered the challenges of processing IoT data streams in real time, and we learned about Microsoft's solution to these challenges. We created our own Stream Analytics job, and configured it to process data from our hub, and we wrote our first Stream Analytics query to transform the raw data into something that's easier to work with, and we ran the job against data from a virtual generator device. As a bonus, we also learned a little bit about IoT Hub message routing. We configured a custom route so that our incoming IoT telemetry data could be sent to two places at once, our Stream Analytics job and Azure Blob storage. We can use IoT Hub to enable two-way communication with devices, we can auto-provision devices using the Device Provisioning Service, and now we can analyze the data coming in in real time using Azure Stream Analytics. We're not quite finished, though.