Easily Find Tags and Values in a Large Xml Document Using XmlTextReader in C#

Use XmlTextReader to parse large XML documents.

 

public void findAParticularNodesUsingTextReader()

{

    XmlTextReader txtreaderObj = new XmlTextReader(@"C:\

    Document and Settings\ Administrator\Desktop\samleXmlDoc.xml");

 

    txtreaderObj.WhitespaceHandling = WhitespaceHandling.None;

    while (txtreaderObj.Read())

    {

        if (txtreaderObj.Name.Equals("TotalPrice") &&

        txtreaderObj.IsStartElement())

        {

             txtreaderObj.Read();

             richTextBox1.AppendText(txtreaderObj.Value);

        }

    }

}

 

Result

 

12.36 11.99 7.97

 

Faster, read-only XPath provides query-based access to data, use XPathDocument and XPathNavigator along with xpath query.

 

public void FindTagsUsingXPthNaviatorAndXPathDocumentNew()

{

    XPathDocument xpDoc = new XPathDocument(@"C:\Documents

    and Settings\ Administrator\Desktop\samleXmlDoc.xml");

    XPathNavigator xpNav = xpDoc.CreateNavigator();

    XPathExpression xpExpression =

    xpNav.Compile(@"/Orders/Order/TotalPrice");

    XPathNodeIterator xpIter = xpNav.Select(xpExpression);

    while (xpIter.MoveNext())

    {

        richTextBox1.AppendText(xpIter.Current.Value);

    }

}

 

Result

 

12.36 11.99 7.97

 

Combining XmlReader and XmlDocument. On the XmlReader use the MoveToContent and Skip methods to skip unwanted items.

 

public void UserXmlReaderAndXmlDocument()

{

    XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\ Administrator\ Desktop\samleXmlDoc.xml");

    while (RdrObj.Read())

    {

        if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
       
RdrObj.IsStartElement())

        {

            RdrObj.Read();

            richTextBox1.AppendText(RdrObj.Value);

        }

    }

}

 

Result

 

12.36 11.99 7.97

 

public void UserXmlReaderAndXmlDocumentNew()

{

    XmlReader RdrObj = XmlReader.Create(@"C:\Documents and Settings\Administrator\ Desktop\samleXmlDoc.xml");

    XmlDocument XmlDocObj = new XmlDocument();

    while (RdrObj.Read())

    {

        if (RdrObj.NodeType.Equals(XmlNodeType.Element) && RdrObj.Name.Equals("TotalPrice") &&
        RdrObj.IsStartElement())

        {

            RdrObj.Read();

            richTextBox1.AppendText(RdrObj.Value);

        }

    }

    XmlDocObj.Load(RdrObj);

    richTextBox1.Text = XmlDocObj.InnerText;

}

 

Design Considerations

 

  • Avoid XML as much as possible.
  • Avoid processing large documents.
  • Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
  • Avoid DTD, especially IDs and entity references.
  • Use streaming interfaces such as XmlReader or SAXdotnet.
  • Consider hard-coded processing, including validation.
  • Shorten node name length.
  • Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower. 

Parsing XML

 

  • Use XmlTextReader to avoid validating readers.
  • When a node is required, consider using XmlDocument.ReadNode(), not the entire Load().
  • Set null for the XmlResolver property on some XmlReaders to avoid access to external resources.
  • Make full use of MoveToContent() and Skip(). They avoid extraneous name creation. However, it becomes nearly nothing when you use XmlValidatingReader.
  • Avoid accessing Value for Text/CDATA nodes as much as possible.

Validating XML

 

  • Avoid extraneous validation.
  • Consider caching schemas.
  • Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
  • Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string. 

Writing XML

 

  • Write output directly as long as possible.
  • To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.

DOM Processing

 

  • Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
  • Avoid PreviousSibling. XmlDocument is very inefficient for a backward traverse.
  • Append nodes as soon as possible. Adding a big subtree results in a longer extraneous run to check ID attributes.
  • Prefer FirstChild/NextSibling and avoid accessing ChildNodes. It creates XmlNodeList, that is initially not instantiated. 

XPath Processing

 

  • Consider using XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
  • Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
  • Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
  • Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
  • Compile XPath string to XPathExpression and reuse it for frequent query.
  • Don't run XPath query frequently. It is costly since it always must Clone() XPathNavigators.

XSLT Processing

 

  • Reuse (cache) XslTransform objects.
  • Avoid key() in XSLT. They can return all the kinds of nodes that prevent node-type based optimization.
  • Avoid document() especially with nonstatic argument.
  • Pull style (for example xsl:for-each) is usually better than template match.
  • Minimize output size. More importantly, minimize input.