Clean JSON String To Resolve HTML Content And Double Quotes Issue

Hello Techies. I hope you are doing great in your field.

Let's learn something new and get the solution of a very well-known issue while getting the JSON String as response from the API. But, what if you are trying to get the response from the API, and it contains the HTML content and multiple double quotes inside the string?

That's the same issue I was facing, and I have tried the different HTML decode functions, but that didn't work. So, I have to go with the custom functions and got the clean response after successful parsing.

Let's check what JSON String I was getting and what is the issue I was facing with the JSON String. Following is the JSON String Response I was getting.

{
	"Link": "<a href="https: //www.c-sharpcorner.com/members/jay-pankhaniya">Jay Pankhania</a>",
    "name": "<p>Jay Pankhania</p>",
    "Description":"Hello Folks, My Name is "Jay Pankhaniya". Currently, Working as the JR. Software
                   Developer at Binary Republik- Ahmedabad. I have Completed B.E. In Computer
                   Engineering from Marwadi University",
	"host": "c-sharp Corner",
}

Here, this JSON contains the HTML code inside the JSON and double quotes inside the description part. So, to resolve this issue, I have tried different httputility.HTMLDecode(“<any string>”). But, It was not working. So, let's check out the following function which is working for my solution to get the JSON Object.

private static string HtmlToPlainText(string JsonString) {
    const string tagWhiteSpace = @ "(>|$)(\W|\n|\r)+<";
    const string stripFormatting = @ "<[^>]*(>|$)";
    const string lineBreak = @ "<(br|BR)\s{0,1}\/{0,1}>";
    var lineBreakRegex = new Regex(lineBreak, RegexOptions.Multiline);
    var stripFormattingRegex = new Regex(stripFormatting, RegexOptions.Multiline);
    var tagWhiteSpaceRegex = new Regex(tagWhiteSpace, RegexOptions.Multiline);
    var text = JsonString;
    text = System.Net.WebUtility.HtmlDecode(text);
    text = tagWhiteSpaceRegex.Replace(text, "><");
    text = lineBreakRegex.Replace(text, Environment.NewLine);
    text = stripFormattingRegex.Replace(text, string.Empty);
    return text;
}

In the above function, let's check the code in short one by one without going into the deep,

  • The first line will match the white space and line break between the angular braces.
  • Second line will match the tags between the angular braces, even if the end tag is missing 
  • Third line will match the line break tag.
  • The following line from 5 to 7 will create the Regex of the above lines.
  • The following line from 10 to 13 will match the JSON String and remove the HTML content, and it gives the clean JSON string.

The above function will return the clean JSON string as response and remove the HTML code. The Response after executing the above function is given below.

{
	"Link": "Jay Pankhania",
    "name": "Jay Pankhania",
    "Description":"Hello Folks, My Name is "Jay Pankhaniya". Currently, Working as the JR. Software
                   Developer at Binary Republik- Ahmedabad. I have Completed B.E. In Computer
                   Engineering from Marwadi University",
	"host": "c-sharp Corner",
}

Now, you can observe the above JSON, you can still check that there are multiple double quotes in the value part of the JSON object. Now, let's check the following function and model, which gives us the final clean JSON Object.

public class CSharpProfile {
    public string Link {
        get;
        set;
    }
    public string name {
        get;
        set;
    }
    public string Description {
        get;
        set;
    }
    public string host {
        get;
        set;
    }
}
private static CSharpProfile DeserializeJson(string jsonString) {
    var loop = true;
    do {
        try {
            var m = JsonConvert.DeserializeObject < CSharpProfile > (jsonString);
            loop = false;
        } catch (JsonReaderException ex) {
            var position = ex.LinePosition;
            var invalidChar = jsonString.Substring(position - 2, 2);
            invalidChar = invalidChar.Replace("\"", "'");
            jsonString = $ "{jsonString.Substring(0, position - 1)}{invalidChar}  {
                jsonString.Substring(position)
            }
            ";
        }
    } while (loop);
    return JsonConvert.DeserializeObject < CSharpProfile > (jsonString);
}

Now, in the above code if you check I have created a simple model and function which takes the JSON as a string that has multiple double-quotes. So, let's understand in steps in short.

  • Firstly, I have taken a loop, and we know if we are going to deserialize the string which has no proper JSON object then it will generate the exception.
  • Over here, we know the issue is of the space, so I have tried to deserialize it and on exception generation I have taken a current line on which the exception was generated.
  • On the basis of the current line number, I take the previous character and replace the double quotes with single quotes.
  • This will loop until successful de-serialisation.

The following is the final output of JSON Object after successful of the function “DeserializeJson”.

{
	"Link": "Jay Pankhania",
	"name": "Jay Pankhania",
	"Description": "Hello Folks, My Name is 'Jay Pankhaniya'. Currently, Working as the JR. Software 
                    Developer at Binary Republik - Ahmedabad.I have Completed B.E.In Computer 
                    Engineering from Marwadi University ",
	"host": "csharp Corner"
}

Also, we can put this JSON inside the jsonlint to check whether this is the valid JSON Object or not.

I have created two different functions to reuse whenever we require. I hope this will be helpful to you. Stay safe and stay energetic to learn new things.

Thank you.