Validating Input Xml Data Files

Abstract

In most Business Applications we frequently find that the input to a System would be in the form of Xml Data Files. This is particularly true with Modules of a Project that deal with Interoperability. These input files would require to be validated. The validation of an Xml input File could occur at various instances of processing as mentioned below:

  • using a schema file
  • in the Database Code for validating against business rules.

This paper basically focuses on two methods of validating a given Xml Input Data File using the classes provided by the .Net Framework.

Introduction

All Xml code can be considered as categorically correct if they are well-formed and valid xml files.

  • Well-formed - The XML code must be syntactically correct or the XML parser will raise an error.

  • Valid - If the XML file has an associated XML Schema, the elements must appear in the defined structure and the content of the individual elements must conform to the declared data types specified in the schema.  

Currently, we concentrate on two mechanisms for the latter category, which inherently incorporates the former one.

The Concept

Validation of Xml Files can be achieved through the use of various Technologies ex. Visual Studio Net. Also, there are many on-line Tools available for validating an input .xml File. Few of them are mentioned below :

Microsoft's Visual Studio .Net provides a number of classes, shipped in as base class libraries of the .Net Framework for dealing with Xml Files. Xml functionality in .Net is primarily handled by the class code present in the following namespaces :

  • System.Xml
  • System.Xml.Schema
  • System.Xml.XPath
  • System.Xml.Xsl

Note: All the Data Files mentioned in the Code Snippets are available in the accompanying Zip File.

Xml Schema Validation

An Xml File is generally validated for its conformance to a particular schema. The Xml schema file usually is a XML-Data Reduced (XDR) or XML Schema definition language (XSD). The input Xml File could be even validated against a set of Schema Files. The Schema File is the structural representation of how a given Xml Data File should resemble.

Following are the steps using which a given input file can be schema validated: 

1. Create a Class File named SchemaValidator.cs to include the following namespaces and variable declarations :

    • System.Xml
    • System.Xml.Schema
    • System.Windows.Form

      // variable declarations
      private string inputFilename ;
      private string schemaFilename ;
      private XmlSchemaCollection schemaCollection ;
      private bool isValid

2. In the initializer function / Constructor , load the input xml File and the schema Files :

public SchemaValidator (string xmlFile, string schemaFile)
{
this.inputFilename = xmlFile;
this.schemaFilename = schemaFile;
this.schemaCollection = new XmlSchemaCollection ();
xsc.Add (schemaFilename);
// add schema File to the Schema Collection
}

3. Write an event handler that would be invoked when while reading the input xml File the input File does not conform to the schema

// invoked when the xml File does not conform to the Schema
private void validationCallBack (object sender, ValidationEventArgs args)
{
isValid =
false;
MessageBox.Show ("Validation error: " + args.Message);
}

4. Write a validate function that would read the input xml File, simultaneously validating it.

private void Validate()
{
XmlTextReader textReader ;
XmlValidatingReader vReader =
new XmlValidatingReader(textReader);
try
{
textReader =
new XmlTextReader(inputFilename);
// create a validating reader.
XmlValidatingReader vReader = new XmlValidatingReader (textReader);
// validate using the schemas stored in the schema collection.
vreader.Schemas.Add (xsc);
// set the validation event handler
vreader.ValidationEventHandler += new ValidationEventHandler
(validationCallBack);
// read and validate the XML data.
while (vreader.Read()){}
MessageBox.Show ("Validation finished. Validation {0}",
(isValid==
true ? "successful": "failed"));
}
catch (Exception ex)
{
MessageBox.Show ("Exception : " + ex.Message);
}
finally
{
// close the readers, no matter what.
vreader.Close ();
textReader.Close ();
}
}

5. In the main function, write the following code : 

static void Main( )
{
// would pass through since the File conforms to the schema
SchemaValidator validXmlFile =
new
SchemaValidator(@"D:\Xml\ValidBooks.xml", @"D:\Xml\ books.xsd");
validXmlFile.Validate( ) ;
// would fail through since the File does not conform to the schema
SchemaValidator invalidXmlFile = new SchemaValidator (@"D:\Xml\ InvalidBooks.xml", @"D:\Xml\ books.xsd");
invalidXmlFile.Validate ( );
}

Validation using XSL Technology :

XSLT [Extended Stylesheet Language Transformation] is basically used for transforming the input xml File from one form to another, which could be .html, .xml etc. Here, we use XSLT to transform the input Xml File to a form that we require for further processing [for instance, we could transform it in a form that could be directly passed as a parameter to a Database Stored Procedure for further processing].

Note: Here, by transforming, we are actually filtering the records that satisfy a given condition.

/* In the sample example, the source file books.xml contains 2 records. On applying XSL Transformation, the Result file contains only 1 record.
Here, book records are said to be valid only if the price (one of the fields) of a book is greater than 10.00.*/

This is achieved through the following code in the .xsl file :

<xsl:template match="book">
<xsl:if test="price &gt; 10">
<
xsl:element name="title">
<
xsl:value-of select="title"/>
</
xsl:element>
....other sub - elements
</xsl:if>
</xsl:template>

Thus, an input Xml File can be validated not only against its schema [which is more of xml Tag validation], but also for the data that the Tags contain.

Following are the steps that need to be implemented in .Net Code for validating using .xsl file:

1. Create a Class File named DataValidator.cs
 
2. Include the following namespace and variables :

System.Xml
System.Xml.Schema
System.Xml.XPath
System.Xml.Xsl
System.Windows.Forms
// variable declarations
private string inputFilename ;
private string xslFilename ;
private string outputFilename ;

3. In the initializer function (could be a Constructor) , load the input xml File and the schema Files : 

public DataValidator (string xmlFile, string xslFile, string outputFilename)
{
this.inputFilename = xmlFile ;
this.xslFilename = xslFile;
this.outputFilename = outputFilename;
}

4. Write a ValidateAndTransform ( ) that does the actual transformation and write the result to an output File.

public void ValidateAndTransform ( )
{
XmlTextWriter textWriter =
new XmlTextWriter (this. outputFilename, null);
try
{
XPathDocument xPathDocument =
new
XPathDocument (
this.xmlFilename);
XslTransform xslTransform =
new XslTransform ();
xslTransform.Load (
this.xslFilename);
xslTransform.Transform (xPathDocument,
null, textWriter);
}
catch (Exception ex)
{
MessageBox.Show ("Exception: " + ex.Message);
}
finally
{
textWriter.Close ( );
}
}

5. In the main function, write the following code : 

static void Main()
{
DataValidator dataValidator =
new DataValidator
(@"D:\Xml\books.xml", // source Xml File
@"D:\Xml\books.xsl",
// transformation File
@"D:\Xml\result.xml") // result Xml File
dataValidator.ValidateAndTransform ();
}

The data from result.xml could subsequently be read into a string variable and sent as a parameter to a Database Stored Procedure in which using SQLXML constructs like OPENXML( ), sp_xml_preparedocument, etc we could store the resultant records into Database Tables.

Conclusion

We have currently discussed only two of the several strategies for validating the input Xml Data File. Another possible method would be to load the complete input File in a DOM object and validating each record programmatically. Every method stems out from some business requirement and/or quest parameters like technology available, timeframe, volume of data etc. Taking into consideration all the relevant parameters, one has to implement the best option which could be yet another method off the beaten track.


Similar Articles