XML in SQL Server Part 1

Introduction

XML was developed by the World Wide Web Consortium (W3C). All companies using XML and related technologies try to follow the XML standard that is in place.

The main purpose of XML is to describe data. The convenience of XML lies, first, in that data are described in a regular text format and, second, in that the structure of an XML document is described in the document itself. In other words, the language makes it possible to make data independent of applications. Simple ASCII is ideally suitable to encode XML documents for sending them over networks, including over the Internet. It is possible that in the future XML will become a universal data transmittal language. XML is a close relative of the hypertext markup language (HTML), developed earlier by W3C and intended for presenting data on an output device, mainly a video display. Like HTML, XML uses tags. But whereas standard predefined tags are used to describe an HTML document, an XML document is described by tags defined in the document itself. A program to parse an XML document must understand the general rules for creating and defining these tags.

A tag is an instruction to form a structure in a document or to define the relationship among structure elements. Tags are specified using the < and > characters, for example, <tag>. Tags can be paired, with the opening tag accompanied by a tag with the same name but preceded with a slash. For example: <tag> </tag>.

To introduce you to the XML world, I propose that you consider a table structure and several ways to represent it XML format. Let the table consist of three columns, Number, Name, and Phone and have only three rows. Here is the first way of representing this table using XML:

<?xml version="1.0" encoding="utf-8" ?>
<Customers>
  <Number='1000' Name='Stephen' Phone='111111'/>
  <Number='1001' Name='Vince' Phone='222222'/>
  <Number='1002' Name='Janet' Phone='333333'/>
</Customers>

Document starts with the <?xml tag, which contains information about the XML version and encoding. In addition, an XML document must have one unique root element, with the remaining, daughter elements nested in it. Daughter elements can contain other nested elements, and so on.

In the example, the root element is specified by the opening <Customers> and closing </Customers> tags.

In general, XML documents must satisfy the following requirements:

  • The document header must contain an XML declaration, which specifies the document's markup language, its version, and auxiliary information.
  • Each opening tag that defines a data area in the document must have a matching closing tag; that is, unlike in HTML, closing tags cannot be omitted.
  • XML is case-sensitive.
  • All attribute values used in defining tags must be enclosed in quotation marks.
  • Tag nesting in XML is strictly controlled; therefore, you must ensure that the opening and closing tags are in the right order.
  • All information between the opening and the closing tags is considered data; therefore, all formatting characters are processed.

Keeping all this information in mind, take another look at the sample XML document. There is one opening and one closing tag in it: <Customers> and </Customers>, respectively. These tags specify the root element. I emphasize that an XML document can have nested elements but cannot have any intersecting elements. The root element contains three tags, called empty tags. They can be viewed as being the opening and closing tags at the same time and must end with the /> character sequence. These tags can contain attributes. In this case, there are three attributes: Number, Name, and Phone. Attributes have values, which must be enclosed in quotation marks. An important point is that XML is case-sensitive. If you make an error and enter number instead of Number, you enter a new attribute.

In the below example, table columns were specified using attributes and their values. There is another way to do this, which is as follows:

<?xml version="1.0" encoding="utf-8" ?>

<Customers>

  <Customer>

    <Number>1000</Number>

    <Name>Stephen</Name>

    <Phone>111111</Phone>

  </Customer>

  <Customer>

    <Number>1001</Number>

    <Name>Vince</Name>

    <Phone>222222</Phone>

  </Customer>

  <Customer>

    <Number>1002</Number>

    <Name>Janet</Name>

    <Phone>333333</Phone>

  </Customer>
</Customers> 

The principle that the preceding XML code is built on is that each column has its own opening and closing tags and the column values are specified by the data enclosed by the tags.

Finally, a mixed way to specify columns is possible:

<?xml version="1.0" encoding="utf-8" ?>

<Customers>

  <Number='1000'>

    <Name>Stephen</Name>

    <Phone>111111</Phone>

  </Number>

  <Number='1001'>

      <Name>Vince</Name>

      <Phone>222222</Phone>

  </Number>

  <Number='1002'>

        <Name>Janet</Name>

        <Phone>333333</Phone>

  </Number>

</Customers>

Starting with version 2000, SQL Server understands XML. In particular, the for section of the select statement and the openxml function were introduced in this version. But Server 2000 did not have an XML data type, so XML documents had to be stored in text format.

This had several shortcomings, such as being unable to effectively search through a document. In Server 2005, the XML data type was introduced.

As a result, it became possible to create XML indexes.
 
Let's continue our journey in the next part of the articles.