Complex/Nested JSON To CSV Using ChoETL

Introduction

 
Its an ETL framework for .net, .net standard and core. Simple, intuitive Extract, transform and load (ETL) library for .NET. Extremely fast, flexible, and easy to use.
 
 
Cinchoo ETL is a code-based ETL framework for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
 
This article talks about generating CSV file from complex/netsed JSON using Cinchoo ETL framework. It is very simple to use and fewer lines of code. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint. It supports many configurations. you can refer more there gitrepo.
 

Install

 
To install Cinchoo ETL (.NET Framework), run the following command in the Package Manager Console,
 
PM> Install-Package ChoETL 
 
To install Cinchoo ETL (.NET Standard), run the following command in the Package Manager Console,
 
PM> Install-Package ChoETL.NETStandard 
 
You can also search the respective name in nuget package manager.
 
Add namespace to the program.
 

Start with Simple JSON

 
Let's start with simple example of converting the below JSON input file.
  1. {  
  2.     "fruit":[  
  3.         {  
  4.             "name":"Apple",  
  5.             "binomial name":"Malus domestica",  
  6.             "major_producers":[  
  7.                 "China",   
  8.                 "United States",   
  9.                 "Turkey"  
  10.             ],  
  11.             "nutrition":{  
  12.                 "carbohydrates":"13.81g",  
  13.                 "fat":"0.17g",  
  14.                 "protein":"0.26g"  
  15.             }  
  16.         },  
  17.         {  
  18.             "name":"Orange",  
  19.             "binomial name":"Citrus x sinensis",  
  20.             "major_producers":[  
  21.                 "Brazil",   
  22.                 "United States",   
  23.                 "India"  
  24.             ],  
  25.             "nutrition":{  
  26.                 "carbohydrates":"11.75g",  
  27.                 "fat":"0.12g",  
  28.                 "protein":"0.94g"  
  29.             }  
  30.         } ]  
  31. }  
Convert JSON file to CSV format with following code.
  1. var sampleJson = File.ReadAllText("Folder Path/sample.json");  
  2.   
  3.             StringBuilder stringBuilder = new StringBuilder();  
  4.             using (var fruitData = ChoJSONReader.LoadText(sampleJson)  
  5.                    .WithJSONPath("$..fruit[*]")  
  6.                                        )  
  7.             {  
  8.                 using (var w = new ChoCSVWriter(stringBuilder)  
  9.                     .WithFirstLineHeader()  
  10.                     .Configure(c => c.MaxScanRows = 1)  
  11.                     .Configure(c => c.ThrowAndStopOnMissingField = false)  
  12.                     )  
  13.                 {  
  14.                     w.Write(fruitData);  
  15.                 }  
  16.             }  
  17.   
  18.             File.WriteAllText("Folder Path/fruitData.csv", stringBuilder.ToString());  
The CSV formatted output is following
  1. name,binomial name,major_producers_0,major_producers_1,major_producers_2,nutrition_carbohydrates,nutrition_fat,nutrition_protein  
  2. Apple,Malus domestica,China,United States,Turkey,13.81g,0.17g,0.26g  
  3. Orange,Citrus x sinensis,Brazil,United States,India,11.75g,0.12g,0.94g  

Start with Complex/Nested/Dynamic JSON 

 
Let's start with a simple example of converting the below JSON input file.
  1. {  
  2.     "data": {  
  3.         "getUsers": [  
  4.             {  
  5.                 "UserProfileDetail": {  
  6.                     "UserStatus": {  
  7.                         "name""User One"  
  8.                     },  
  9.                     "UserStatusDate""2018-10-31T06:12:42+00:00",  
  10.                     "EnrollId""am**********************************",  
  11.                     "lastDate""2019-07-22T03:05:39.0245313-04:00"  
  12.                 },  
  13.                 "UserInformation": {  
  14.                     "Id": 1111122,  
  15.                     "firstName""*****",  
  16.                     "middleName"null,  
  17.                     "lastName""*****",  
  18.                     "otherNames"null,  
  19.                     "UserType": {  
  20.                         "name""CP"  
  21.                     },  
  22.                     "primaryState""MA",  
  23.                     "otherState": [  
  24.                         "MA",  
  25.                         "BA",  
  26.                         "DL",  
  27.                         "RJ"  
  28.                     ],  
  29.                     "UserLicense": [  
  30.                         {  
  31.                             "licenseState""MA",  
  32.                             "licenseNumber""000000000",  
  33.                             "licenseStatus"null,  
  34.                             "aaaaaaaaaaaaaaaaa""only one"  
  35.                         }  
  36.                     ],  
  37.                     "Setting""ADMINISTRATIVE",  
  38.                     "primaryEmail""*****@*****.com",  
  39.                     "modifiedAt"null,  
  40.                     "createdAt"null  
  41.                 }  
  42.             },  
  43.             {  
  44.                 "UserProfileDetail": {  
  45.                     "UserStatus": {  
  46.                         "name""User Two"  
  47.                     },  
  48.                     "UserStatusDate""2019-10-31T06:12:42+00:00",  
  49.                     "EnrollId""am**********************************",  
  50.                     "lastDate""2019-07-22T03:05:39.0245313-04:00"  
  51.                 },  
  52.                 "UserInformation": {  
  53.                     "Id": 443333,  
  54.                     "firstName""*****",  
  55.                     "middleName""Jhon",  
  56.                     "lastName""*****",  
  57.                     "otherNames"null,  
  58.                     "UserType": {  
  59.                         "name""AP"  
  60.                     },  
  61.                     "primaryState""AK",  
  62.                     "otherState": [  
  63.                         "MP",  
  64.                         "CLT"  
  65.                     ],  
  66.                     "UserLicense": [  
  67.                         {  
  68.                             "licenseState""KL",  
  69.                             "licenseNumber""000000220",  
  70.                             "licenseStatus""Valid"  
  71.                         }  
  72.                     ],  
  73.                     "Setting""ADMINISTRATIVE",  
  74.                     "primaryEmail""*****@*****.com",  
  75.                     "modifiedAt"null,  
  76.                     "createdAt"null  
  77.                 }  
  78.             }  
  79.         ]  
  80.     },  
  81.     "errors": [  
  82.         {  
  83.             "message""GraphQL.ExecutionError: 13614711 - NO__DATA",  
  84.             "extensions": {  
  85.                 "code""212"  
  86.             }  
  87.         },  
  88.         {  
  89.             "message""GraphQL.ExecutionError: 13614712 - NO__DATA",  
  90.             "extensions": {  
  91.                 "code""212"  
  92.             }  
  93.         },  
  94.         {  
  95.             "message""GraphQL.ExecutionError: Cannot return null for non-null type. Field: PrivilegeFlag, Type: Boolean!.\r\n   at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)\r\n   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",  
  96.             "locations": [  
  97.                 {  
  98.                     "line": 374,  
  99.                     "column": 9  
  100.                 }  
  101.             ],  
  102.             "path": [  
  103.                 "getUsers",  
  104.                 "3",  
  105.                 "UserAffiliation",  
  106.                 "0",  
  107.                 "Affiliation",  
  108.                 "admittingPrivilegeFlag"  
  109.             ],  
  110.             "extensions": {  
  111.                 "code""ID: 1454790"  
  112.             }  
  113.         },  
  114.         {  
  115.             "message""GraphQL.ExecutionError: Cannot return null for non-null type. Field: admittingArrangementFlag, Type: Boolean!.\r\n   at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)\r\n   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",  
  116.             "locations": [  
  117.                 {  
  118.                     "line": 403,  
  119.                     "column": 9  
  120.                 }  
  121.             ],  
  122.             "path": [  
  123.                 "getUsers",  
  124.                 "3",  
  125.                 "UserAffiliation",  
  126.                 "0",  
  127.                 "Affiliation",  
  128.                 "admittingArrangementFlag"  
  129.             ],  
  130.             "extensions": {  
  131.                 "code""ID: 13614790"  
  132.             }  
  133.         }  
  134.     ]  
  135. }  
As JSON message is hierarchical and structural format, you will have to flatten out in order to product CSV file.
 
List out the difficulties/Challenge of json,
  • Two different objects ‘Data’ and ‘Error’ and different types.
  • The ‘Data’ object always dynamic
  • 'otherState' have a different count in each data array 
  • The ‘error’ has different data.

Steps

  1. Create separate csv for ‘Data’ and ‘Error’.
  2. Set Configure > MaxScanRows based on ‘data’ count. Used to scan csv rows
  3. Set equal row count for each csv
  4. Merge csv and write it
The sample code is,
  1. // read json from file  
  2.             var jsonString = File.ReadAllText("CAQH_Sample.json");  
  3.             StringBuilder csvPractitioners = new StringBuilder();  
  4.   
  5.             // craete csv for getUsers node  
  6.             using (var practitioners = ChoJSONReader.LoadText(jsonString)  
  7.                     .WithJSONPath("$..getUsers[*]")  
  8.                     .Configure(c => c.JsonSerializerSettings = new JsonSerializerSettings  
  9.                     {  
  10.                         DateParseHandling = DateParseHandling.None,  
  11.                         DateTimeZoneHandling = DateTimeZoneHandling.Utc,  
  12.                         Formatting = Formatting.Indented  
  13.                     })  
  14.                     )  
  15.             {  
  16.                 // to get users count  
  17.                 var arrPractitioners = practitioners.ToArray();  
  18.                 int practitionersCount = arrPractitioners.Length;  
  19.   
  20.                 using (var w = new ChoCSVWriter(csvPractitioners)  
  21.                     .WithFirstLineHeader()  
  22.                     .Configure(c => c.MaxScanRows = practitionersCount)  
  23.                     .Configure(c => c.ThrowAndStopOnMissingField = false)  
  24.                     )  
  25.                 {  
  26.                     w.Write(arrPractitioners);  
  27.                 }  
  28.             }  
  29.   
  30.             // craete csv for Error node array  
  31.             StringBuilder csvErrors = new StringBuilder();  
  32.             using (var errors = ChoJSONReader.LoadText(jsonString)  
  33.                    .WithJSONPath("$..errors[*]")  
  34.                         .WithField("errors_message", jsonPath: "$.message", isArray: false)  
  35.                         .WithField("errors_extensions_code", jsonPath: "$.extensions.code", isArray: false)  
  36.                         .WithField("errors_locations", jsonPath: "$.locations[*]", isArray: false)  
  37.                         .WithField("errors_path", jsonPath: "$.path[*]"))  
  38.             {  
  39.                 var arrError = errors.ToArray();  
  40.                 int errorCount = arrError.Length;  
  41.   
  42.                 using (var w = new ChoCSVWriter(csvErrors)  
  43.                     .WithFirstLineHeader()  
  44.                     .Configure(c => c.MaxScanRows = errorCount)  
  45.                     .Configure(c => c.ThrowAndStopOnMissingField = false)  
  46.                     )  
  47.                 {  
  48.                     w.Write(arrError);  
  49.                 }  
  50.             }  
  51.   
  52.             char separator = ',';  
  53.            // Split csv by new line  
  54.             var objPractitioners = csvPractitioners.ToString().Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).AsEnumerable();  
  55.            // remove newline of inside string  
  56.             var contents = Regex.Replace(csvErrors.ToString(), "\"[^\"]*(?:\"\"[^\"]*)*\"", m => m.Value.Replace("\n""").Replace("\r"""));  
  57.             var objErrors = contents.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).AsEnumerable();  
  58.   
  59.             string defaultValue = string.Empty;  
  60.             int cnt = 0;  
  61.             if (objPractitioners.Count() < objErrors.Count())  
  62.             {  
  63.                 cnt = objPractitioners.FirstOrDefault()?.Split(separator).Length ?? 0;  
  64.             }  
  65.             else  
  66.             {  
  67.                 cnt = objErrors.FirstOrDefault()?.Split(separator).Length ?? 0;  
  68.             }  
  69.             if (cnt > 0)  
  70.                 defaultValue = defaultValue.PadLeft(cnt - 1, separator);  
  71.             // set both csv rows are equal, merge only work when rows are equal  
  72.             var fullString = objPractitioners.Merge(objErrors, defaultValue, (f, s) => string.Join(separator.ToString(), f, s)).ToArray();  
  73.             File.AppendAllLines("folderpath/OutputSample.csv", fullString);  
Here Merge is Extensions methods,
  1. public static IEnumerable<T> Merge<T>(this IEnumerable<T> first,  
  2.        IEnumerable<T> second, T defaultValue, Func<T, T, T> operation)  
  3.        {  
  4.            using (var iter1 = first.GetEnumerator())  
  5.            using (var iter2 = second.GetEnumerator())  
  6.            {  
  7.                while (iter1.MoveNext())  
  8.                {  
  9.                    if (iter2.MoveNext())  
  10.                    {  
  11.                        yield return operation(iter1.Current, iter2.Current);  
  12.                    }  
  13.                    else  
  14.                    {  
  15.                        yield return operation(iter1.Current, defaultValue);  
  16.                    }  
  17.                }  
  18.                while (iter2.MoveNext())  
  19.                {  
  20.                    yield return operation(defaultValue, iter2.Current);  
  21.                }  
  22.            }  
  23.        }  
The output is,
  1. UserInformation_Id,UserInformation_firstName,UserInformation_UserType_name,UserInformation_primaryState,UserInformation_otherState_0,UserInformation_otherState_1,UserInformation_otherState_2,UserInformation_otherState_3,UserInformation_otherState_4,UserInformation_createdAt,UserInformation_lastUpdatedDate,errors_message,errors_extensions_code,errors_locations_line,errors_locations_column,errors_path_0,errors_path_1,errors_path_2,errors_path_3,errors_path_4,errors_path_5  
  2. 1111122,*****1,CP,MA,MA,BA,,,,,2019-04-03T07:49:05.2827076-04:00,GraphQL.ExecutionError: 13614711 - NO__DATA,212,,,,,,,,  
  3. 222222,*****2,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,GraphQL.ExecutionError: 13614712 - NO__DATA,212,,,,,,,,  
  4. 33333,*****3,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,"GraphQL.ExecutionError: Cannot return null for non-null type. Field: PrivilegeFlag, Type: Boolean!.   at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",ID: 1454790,374,9,getUsers,3,UserAffiliation,0,Affiliation,admittingPrivilegeFlag  
  5. 44444,*****4,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,"GraphQL.ExecutionError: Cannot return null for non-null type. Field: admittingArrangementFlag, Type: Boolean!.   at GraphQL.Execution.ExecutionStrategy.ValidateNodeResult(ExecutionContext context, ExecutionNode node)   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node)",ID: 13614790,403,9,getUsers,3,UserAffiliation,0,Affiliation,admittingArrangementFlag  
  6. 555555,*****3,CPP,MPA,KL,TN,DL,AP,RJ,,2019-12-03T07:50:05.2827076-05:00,,,,,,,,,,  
Note
Try with latest pre release version ChoETL