Azure Data Factory Tutorial

In the previous article, we learned about Azure Data Factory. We’ve had a brief introduction to Azure Datasets and an understanding of the pipeline in Azure Data Factory. In this article, we’ll go ahead and create the Azure Data Factory from the Azure Portal. Next, we’ll learn to create datasets in the Azure Data Factory Studio, create a pipeline and learn to debug and trigger the pipeline.

Azure Data Factory

Azure Data Factory is an offering of Azure that enables the management and building of highly complex hybrid extract-transform-load (ETL), data integration projects, and extract-load-transform (ELT). Azure Data Factory makes it possible to orchestrate as well as operationalize the processes which are essential to refine the plethora of raw data accumulated in the big data into business insights that can serve the businesses in multitudes of dimensions.

Let us learn to create the Azure Data Factory in Azure and learn to create datasets, pipeline, debugging, and triggering of the Azure Data Factory pipeline.

Step 1

Log in to the Azure Portal. The welcome page will look similar to the one below.

Here, click on Create a Resource.

Step 2

Now, under the Categories, check for integration.

Now, we have the list of popular Azure services listed. Here, select the Data Factory.

Step 3

Next, choose your resource group or create one if you don’t have one.

Step 4

Next, name your instance, select the region and the version. The V2 is recommended as of May, 2022.

Step 5

Now, go to the Git Configuration.

Here, you have the option to setup the Github account by providing the github account link, repo name and branch with the root folder details or using Azure DevOps.

For now, select Configure Git Later. When you really need to integrate github in future, make sure the setup is chosen.

Step 6

Next, Click on Review + Create.

Step 7

As the validation is done, we have the Create option. Select it.

Step 8

The initialization process for the deployment initiates.

As the deployment is completed, we have the Go to Resource option. Click on it.

Step 9

Here, we can see the Data Factory being created. We can see different monitoring visualizations too.

For now, click on Open Azure Data Factory Studio to explore the Azure Data Factory.

Step 10

We are now welcomed to the Data Factory in Azure.

Step 11

Now, select the second option in the left menu for Factory Resources.

Now, under Pipelines, Select Dataset.

Step 12

Here, we have multitudes of options.

Let us choose the Azure Blob Storage.

And under the Format, select Binary.

Once, selected, click on Continue.

Step 13

Now, we have to set properties. Click on Linked Service.

Here, we are going to connect it to Storage account. You’d need to have it created following the Azure Blob Storage.

Now, Under Account selection method, select From Azure Subscription.

Select the Azure subscription and the Storage account.

Once, these are selected, we have the option to Test the Connection. Click the Test Connection Button.

We can see the Connection Successful. Now, we are connected to the Storage account.

Finally, Click on Create.

The notification will pop up.

Step 14

Next, we are given the File path to setup for usage.

Step 15

Now, we can see, the Binary Dataset has been setup and linked the service to the Azure Blob Storage in the Azure Data Factory.

Creating Pipeline

Step 16

Now, let us select our Dataset Binary1. Under the pipeline, select Pipeline.

Here, we are provided with the canvas. Under the Move and Transform option, we have Copy Data and Data Flow.

Drag and Drop the Copy data to the pipeline canvas.

Step 17

Next, we set the file path for the Source.

We have to setup the container now.

As it is done, we can now Validate it.

The pipeline run will be now queued and run.

Step 18

Once, all our work is done, we can publish it.

Here, we can see our pipeline1 with the dataset Binary1 is all set to be published.

Once, the changes are deployed to the Azure Data Factory, we’ll be notified.

Thus, we’ve successfully learned about exploring and using the Azure Data Factory.

Conclusion

In this article, we’ve learned about Azure Data Factory, how to create the Azure Data Factory using Azure Portal, and then went ahead to create pipeline, datasets and the process to debug and trigger our pipeline.