Copying Data from Azure Blob to Microsoft Lakehouse

Introduction

Microsoft Fabric is like an all-in-one toolbox for crunching numbers and making sense of data. It's got a bunch of tools, like moving data around, storing it in lakes, doing engineering work on it, mixing it together, using it for science stuff, keeping an eye on it in real-time, and helping you make smart business decisions. And the best part is it's got a strong foundation that makes sure your data is safe, plays by the rules, and follows all the important guidelines. In this article, we will learn how to create a data pipeline to copy data from the Azure blog and ingest it into a Lakehouse in Microsoft Fabric. Let’s get started.

Azure Blob to Lakehouse Data Pipeline in Microsoft Fabric

The first thing we want to do is to create a workplace that is typical of a container or organizing structure that allows us to collaborate on and manage content, such as reports, dashboards, datasets, and more.

To create the workspace, click on Workspaces and Click on New Workspace.

Provide a name for the workspace. In this article, DataPipelineFromAzureBlob is given.

  • Next, In the Data Factory platform, select Data Pipeline.
    Data Pipeline
  • In the New Pipeline box, we provided Azure Blob Data, as seen below.
    Azure Blob Data
  • Click Create.
  • In the Start building your data pipeline, select Add pipeline activity and select Copy data.
    pipeline activity
  • In the Name box of the General tab, we provided Customer Data, as seen below.
    Customer Data
  • In the Source tab, select External for the Datastore type.
  • Select New to create a new connection.
    new connection
  • In the New Connection box, search and select Azure Blog Storage.
    Azure Blog Storage
  • Click Continue.

In the Account name or URL of the Connection Settings, we provided the following.

https://azuresynapsestorage.blob.core.windows.net/sampledata/

In the Connection credentials tab, select Create a new connection in the dropdown for the Connection.

  • We provided Wide World Importers Public Sample as the Connection Name.
  • The Authentication kind is set to Anonymous.
     Connection Name
  • At the bottom left, click on Create.

To access the .parquet files in https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorldImportersDW/parquet/full/dimension_city/*.parquet.

In the File path text boxes, we provided the following.

  • Container sample data.
  • File path - Directory WideWorldImportersDW/tables.
  • File path - File name dimension_customer.parquet.
  • File format drop-down, choose Parquet.
     .parquet files
  • Select Preview data next to the File path setting.
     path setting
  • Click on Cancel to close the Preview Data window.
  • In the Destination tab, select Workspace for the Datastore type.
  • In the Workspace data store type dropdown, select Lakehouse.
  • In the Lakehouse option, select New and provide CustomerData.
     new Lakehouse
  • Click Create.
  • The root folder should be set to Table.

In Table Name, select New and provide BlogCustomerInformation (you can choose whatever name you want).

CustomerInformation

table

  • Click Create.
  • Click on Run at the top of the pipeline tab. You will be required to Save. Go ahead and save and run the pipeline.

In the screenshot below, we can see that the data pipeline was successful.

 data pipeline

Next, we need to check the data in the DataPipelineFromAzureBlog workspace we created initially. Click on the workspace.

In the screenshot below, we have the CustomerData in the Lakehouse and the SQL endpoint.

SQL endpoint

When we click on the CustomerData with the SQL endpoint, we can begin to write queries against the data.

In the screenshot below, we executed a query, and everything is working fine.

CustomerData with the SQL endpoint