Integrate Pipelines With Azure Synapse Analytics

In line with our previous articles, today we will see how to create, schedule, and monitor a pipeline in synapse using synapse analytics studio.

  • Pipeline is ETL with workflow where we will execute and extract the results. A pipeline can be a single or group of activities to be run.
  • Activity is a task to implement and execute as part of the pipeline.

To keep it short I am not going to explain in detail about the pipelines since we have already discussed them in our azure data factory articles. If you want to know in detail what pipelines are, I suggest you to take a look at those once.

Once the pipeline is created click and drag notebook tool from synapse dropdown. At this time this will be blank and will act as an object which calls the contents of the code or the workflow inside.

Once we have named (optional) our notebook, we have to select the base parameter dropdown to the notebook which we the code or workflow present inside. Base parameter is nothing but the notebook on which we have the codes.

Once the pipeline has been set, we can run it through trigger or manual run.

For triggering this pipeline we have to make sure that it has been published first, as we cannot run them without publishing except for the Debug option. I have published them all and let’s trigger now.

We will receive a notification that the pipeline has started but if we want to monitor it, click on the monitor pane on the left side and can see the duration and status. We can also click on the pipeline name, in case we have multiple pipelines running parallelly and view the current status.

It will highlight the current running notebook if we have more than one workflow attached into a single pipeline. In our case since we have only one pipeline, it just shows that.

After around three minutes our pipeline run is completed successfully.

We have a lot of options to filter and see the progress of our task, even by each row or even reads/writes.

Now let’s go to our storage account and check the storage to check if the files have been created.

The files have been created as per our PySpark query.

Summary

This is a simple article on how to integrate pipelines in azure synapse analytics using synapse studio. Pipelines are common for both azure data factory and azure synapse. If you could understand the concept in anyone of it, then this will be easy.