Overcoming Limitations of Get Metadata Activity in Azure Data Factory / Synapse

Problem Statement

There are multiple properties associated with a file uploaded on Azure Blob Storage / Azure Data Lake Storage

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

One can leverage Get Metadata Activity within the pipelines to get only the below sub set of properties :

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Is it possible to get other properties of the file like Creation Time, Content-Type etc. in Synapse / Data Factory pipelines.

Prerequisites

  1. Azure Data Factory / Synapse
  2. Azure Blob Storage / Azure Data Lake Storage

Solution

1. We would be leveraging Azure Blob Storage REST API : Get Blob to get the blob file properties.

2. Provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage to authenticate via Managed Identity.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

b) Search Storage Blob Data Reader role and proceed further

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

3. Create a pipeline within Synapse / Data Factory leveraging Web Activity to trigger the REST API.

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

URL

In case of Azure Blob Storage

https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>

In case of Azure Data Lake Storage

https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>

Method: GET

Authentication: System Assigned Managed Identity

Resource: https://storage.azure.com/

Headers:

1    x-ms-version : 2017-11-09

Output

Get Metadata Activity output

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Web Activity Output (Azure Blob Storage)

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

where [x-ms-creation-time] represents the file creation time.

Web Activity Output (Azure Data Lake Storage)

Directory Property

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse

Web Activity

Overcoming Limitations of Get Metadata Activity in Azure Data Factory/Synapse