Extract File Names And Copy From Source Path In Azure Data Factory

Introduction

Today we will learn a real-time scenario on how to extract the file names from a source path and then use them for any subsequent activity based on its output. This might be useful in cases where we have to extract file names, transform or copy data from CSV, excel, or flat files from blob, or even if you want to maintain a table record which explains from where does the data came from.

As a first step, I have created an Azure Blob Storage and added a few files that can used in this demo.

Extract file names and copy from source path in Azure Data Factory

Activity 1 - Get Metadata

Create a new pipeline from Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

Next with the newly created pipeline, we can use the ‘Get Metadata’ activity from the list of available activities. The metadata activity can be used to pull the metadata of any files that are stored in the blob and also we can use that output to be consumed into subsequent activity steps.

I have clicked and dragged the Get Metadata activity onto the canvas and then renamed it as File_Name.

Extract file names and copy from source path in Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

Create a Linked service pointer to the dataset if you have not done that already.

Extract file names and copy from source path in Azure Data Factory

Once linked server is selected you have to create a new Field list. The Filed list gives you the option to loop through the contents inside the storage folder when using the Child Items dropdown. The Child Items is obtained from the JSON output of the metadata activity, which is an important part. Similarly there are other options available which can be used based on your requirement.

Extract file names and copy from source path in Azure Data Factory

Once done now it’s time to use ForEach parameter to loop through each filename and copy that into the output. Make sure to check box the sequential parameter as it will help to iterate files one by one.

The Items is where you will pass the filenames as array and then foreach loop will take over to iterate and process the filenames.

Extract file names and copy from source path in Azure Data Factory

Use the ChildItems as an array parameter to loop through the filenames -follow the below steps sequentially.

@activity(‘File Name’).output.childItems

Extract file names and copy from source path in Azure Data Factory

Now select the Activities tab in the ForEach and click on edit Activity. This is where you will mention the activities that has to be performed. In this demo, we will copy the filenames to our destination location.

Extract file names and copy from source path in Azure Data Factory

There are Source and Sink tabs which are self-explanatory that it points source and destination. But you simply cannot select the linked service that we already created into the source for the fact that this is the second part of this demo and its activity is to copy dynamic filenames that are output from step one. Hence create a new dataset and select the source azure blob storage location up to the folder only leaving the file name field to be parameterized dynamically.

Activity 2 - ForEach - File Copy

Extract file names and copy from source path in Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

Now move on to the ‘Parameters tab to create a dynamic parameter called FileName, which can be referred to in the FilePath on ‘Connection” tab.

Extract file names and copy from source path in Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

Now going back to the pipeline you could see that our newly created FileName parameter is visible in the dataset properties. Click and create another parameter to extract the filenames from the storage using @item().Name dynamic parameter.

Extract file names and copy from source path in Azure Data Factory

Now with the datasource configuration has been completed move on to configuring the Sink, the destination folder. Refer to the folder from the source azure blob location or type the folder name which you want the sink to create automatically if not available. Make sure to set the Import Schema to ‘’None’, else it might throw an error.

Extract file names and copy from source path in Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

It is now the time to test our pipeline. Go to the pipeline validate and run it and view the output tab for the results.

We could see the pipeline ran successfully copying the files iteratively. See the azure blob storage folder where all the files have been copied from source to destination.

Extract file names and copy from source path in Azure Data Factory

Extract file names and copy from source path in Azure Data Factory

Summary

The ‘Activity 2’ of this article; the ‘File Copy’ is only a subsequent activity which is based on output from ‘Get Metadata’ activity. I chose file copy for this demo you can select any activity of your choice.

References

Microsoft Official Documentation