Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Problem Statement

One of the challenges in Data Engineering is the presence of Empty / Blank files or Files with just headers at the Source.

So is there any automated way to Skip / Notify processing of those files via Azure Data Factory / Synapse.

Prerequisites

  1. Azure Data Factory / Synapse

Solution

CASE 1. Empty / Blank File

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

As seen above, the File size for blank file is 0 B.

Resolution

We can leverage Get Meta Data activity to get the file size and in case if it is '0', we can conclude that it is an Empty / Blank file.

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Output

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

CASE 2. File with just Header

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Note: The size of file with only headers might change depending on the number of columns present in the header.

So one cannot rely solely on the file size to identify whether the file contains only header or not.

Resolution

We can leverage Lookup activity to get the count of records (excluding the header) within the file, and in case if it is '0', we can conclude that it is a file with just header (and no data).

Dataset :

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

With the first row as header enabled.

ADF flow

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Lookup activity

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

IF Activity

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Expression :
@equals(activity('File LookUp').output.count,0 )

In case the count of rows within the file is zero, throw error via Fail activity (in current example) or send Email notification or skip the iteration for that file.

Output

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Skip/Notify Processing of Empty files via Azure Data Factory/Synapse

Note: The Lookup activity methodology can be used even for case #1. So it is a better and more efficient solution to validate both Empty/Blank files and/or files with only headers.


Similar Articles