Streaming Data From Azure Storage To Data Lake Store

In this article, I will explain about creating HDInsight clusters with accessing Azure Data Lake Store. The data which we process will be stored in Data Lake Store. Also, we can use default storage of Data Lake or we can add the additional storage account, so that the cluster related files will be stored in Blob storage. By using Data Lake Store, we can write the data from storm topology.

Prerequisites

  1. An active Azure subscription.
  2. An active Azure Data Lake Store.
  3. An active Azure storage

Step 1

Login into Azure portal.

Step 2

Before starting with HDInsight cluster, we need to create Hadoop clusters in HDInsight; you can refer the article http://www.c-sharpcorner.com/article/hdinsight-cluster-based-with-linux-in-azure/. And, we need to create a storage account.

Step 3

After creating HDInsight and storage account, search for stream analytics in search menu and click "Create Stream Analytics jobs".

 

Step 4

Now, we need to provide a small values for the stream analytics job and provide a unique name for the job name and create a new resource group with the location of the datacenter and click Create.

 

Step 5

After creation, a Window page will appear and from the left pane, click inputs and click Add.

 

Step 6

In the input blade, we need to provide the values given below.

Provide a name for input alias.

Select the source as blob storage.

Select the source type as data stream.

Select the storage account, which has been previously created.

Click Create.

 

Step 7

Switch to stream analytics job page and click Output tab and click Add.

 

Step 8

Now, in output blade, we need to provide the requirements given below.

Provide a name for the output alias and provide a name as well as for the path prefix pattern and click Create.

 

Step 9

I have provided an overview for running the stream analytics jobs only. We can run a query from the query tab given below, so that we can replace the queries for the input and the output.

 

Step 10

Click Save in top menu and click Overview tab. Click Start from the dialog box. We can select the custom time and set time + date. Click-> Start and the data will be processed on the time schedule, which we selected.