Merge Multiple JSON files via Synapse / Data Factory Pipelines

Problem Statement

In current times, when multiple log files of similar structures/schemas get generated and/or extracting data from Paginated APIs as individual files, is there a way to consume/process a final single file rather than individual files separately? JSON is one of the most essential technologies used in the modern software landscape, so in our use case, we would take JSON files as the sample files.

Prerequisites

  1. Azure Data Factory /Synapse
  2. Azure Blob Storage

Solution

We would use 3 JSON files present within Azure blob storage as sources for the merging process.

JSON Files

Sample

JSON File

JSON File 2

2. To merge the JSON files, we would be leveraging the Synapse/ ADF Copy Activity task.

a) Source Settings

Copy Data

Source dataset

Where the Source dataset is of type JSON with POC being the Azure blob storage Container containing the individual files.

JSON Files

b) Sink settings

Merge JSON Files

Sink dataset

File Sink

Output

Output

Merged file

Merged Files


Similar Articles