How To View Data Flow Execution Plan In Azure Data Factory

Introduction

In this article, we will explore how to view Data flow execution plan using Azure Data Factory. The execution plan provides partition and performance metrics for the data engineers or operation team for review. This crucial step allows developers to ensure that performance goals are met. 

Challenges

In our previous article, we created a pipeline using Mapping Data flow. Now, we need to review the time it took to transform the data and build a better understanding of where we can optimize our solution.

Azure Data Factory Monitor provides this capability for us. This is available for debugging and scheduled pipelines. Before we dive into the tutorial, we need to understand 4 key concepts:

  1. Stage: A Mapping data flow pipeline is broken up into different stages. A stage consists of a group of activities that can be executed together.
  2. Partition: A dataset is split into multiple chunks so it can be processed in parallel. Each chunk is represented as a partition.
  3. Skewness: Skewness measures how the data is evenly distributed the dataset is. In data flow, the ideal range is between -3 and 3.
  4. Kurtosis: Kurtosis measures how many outliers is in the dataset. A high kurtosis value means it has heavy outliers. In data flow, the ideal value is less than 10.

Tutorial

  • In Azure Data Factory Studio, click on 'Monitor'. Under 'Runs', click on 'Pipeline runs'.

  • Select the pipeline and click on the 'eye glasses icon' under 'Activity runs'.

  • Now, we see the data flow. Click on 'Stages' to see how long each stage took. A 'Stages' side panel will be shown.

  • Let's examine the different stages we have:

    • The first 2 stages are for reading from the database and selecting the columns we need. It took 11s each.
    • The 3rd stage is to join the datasets together. It took 5s.
    • The last stage is to aggregate the data, create audit columns and write to the database table. It took 7s.
    • The total processing time is 26 seconds.

      The tables we are working on are relatively small, so the process is completed quickly. Let's click 'Close'. 

  • Let's dive into the 'Join transformation' by clicking on the Join icon. We observed that:

    • The first thing we notice is both Skewness and Kurtosis are within the range specified in the Challenges section.
    • The 'Partition chart' shows most of the partitions are ranging from 1000 to just under 1200 records. The last few partitions have a below average of records.

How To View Data Flow Execution Plan In Azure Data Factory

  • Finally, let's review the 'Aggregate transformation' by clicking on the Aggregate icon. We observed that:

    • Again, both the Skewness and Kurtosis are within the defined range.
    • The aggregate has produced a smaller number of records (down to 1,079 records).
    • The 'Partition chart' is very different from before. Our aggregate is calculating a daily profit, so this chart shows some partitions contain more unique dates records than others.
    • Finally, we observed that 2 rows have been dropped and we need to investigate why they are dropped.

How To View Data Flow Execution Plan In Azure Data Factory

Summary

In summary, it is very important to review the execution plan of the Mapping data flow. With Azure Data Factory Monitor, we can examine each data flow activity on data changes, Skewness, Kurtosis, and the number of rows for each partition. This information enables us to optimize our pipeline when required.

Finally, remember to turn on verbose logging in the Data flow activity to generate all the monitoring details. This will have some performance impact but without this information, it will be difficult to identify where the issue is.

Happy Learning!

References