XML Entities - Introduction


In this article, we are going to learn about XML Entities. We will briefly look at what XML is and its use in development then delve straight into Entities in XML.

What is XML?

XML means Extensible Markup Language. XML is a language designed for storing and transporting data. It is similar to HTML in that it uses a tree-like structure of tags and data but unlike HTML, XML does format the layout, rather it simply shows the data. Also, XML does not use predefined tags, and so tags can be given names that describe the data. XML is very extensible and is used in most programming languages as data storage and transportation format but to date, it is slowly losing its popularity due to the advent of JavaScript Object Notation (JSON) format.
This article is going to assume that you have prior knowledge to XML and we are going to jump straight to XML Entities.
What is an XML Entity?
Entities in XML have a similar role to variables in other programming languages. A variable is a storage location in programming. A variable is used as a storage container which instead of using the value or explicitly using something you can store it in a container and continue to use it frequently throughout your code.
Example in VB.Net
Dim a As Integer = 5
In the above VB.NET example our variable is ‘a’ of data type ‘Integer’ which has been given the value of ‘5’. The same also applies to Entities in XML, instead of using the data we can store it as an Entity.
Example in XML,
  1.  <!Entity mymessage "Hello World">  
  2. <email>  
  3.             <to>Jane Doe</to>  
  4.             <body>&mymessage;</body>  
  5. </email>  
In this case, our entity is '&mymessage' and the data it carries is "Hello World". So instead of writing the data, we intend to send to Jane Doe explicitly as “Hello World”, we have store the data in an Entity named &mymessage, and each time we intend to send the same message we simply use the Entity &mymessage.

DOCTYPE Declaration

In our previous example in VB.NET we assigned 5 to the variable a,
Dim a As Integer = 5
This is called variable declaration and it also applies to XML, this line,
  1. <!Entity mymessage "Hello World">  
The statement above is an Entity declaration, but where do we find it in a normal XML document?
  1. <?xml version="1.0"?>  
  2. <!DOCTYPE exampleDoC[  
  3. ...declare entities here...  
  4. ] >  
  5. <exampleDoC>  
  6. ...Body of XML Document...  
  7. </exampleDoC>  
Inside the DOCTYPE Declaration, you can declare what kind XML document it is depending on the data you wish to store, and in that DOCTYPE declaration this is where we declare our entities.
  1. <?xml version="1.0"?>  
  4. <!ENTITY firstchoice"Corporate">  
  5. <!ENTITY secondchoice"Individual">  
  6. ]>  
  7. <CUSTOMERS>  
  8.     <CUSTOMER>  
  9.         <NAME>Jane Doe</NAME>  
  10.         <ADDRESS>1 Thornicroft Avenue</ADDRESS>  
  11.         <CUSTOMERTYPE>&secondchoice;</CUSTOMERTYPE>  
  12.     </CUSTOMER>  
  13.     <CUSTOMER>  
  14.         <NAME>Google</NAME>  
  15.         <ADDRESS>1 Amery Way </ADDRESS>  
  16.         <CUSTOMERTYPE>&firstchoice;</CUSTOMERTYPE>  
  17.     </CUSTOMER>  
  18. </CUSTOMERS>  
When to use Entities
Entities can be used in some of the following cases,
  • When you intend to use a default piece of data repeatedly; e.g. signature or a default greeting message etc.
  • When you have a value that occurs so many times in a document and frequently changes; e.g. tax rate, you have to use an entity so that you only change the rate in the entity declaration which will affect a hundred or thousands of occurrences’.
  • When you are using common ASCII characters that don’t occur on your keyboard.
The Three Types of Entities
  • General- Used in an XML to avoid typing long pieces of text repeatedly.
  • Parameter-Used in DTD (only) to parameterize the long pieces of data. When declaring Parameter entities they marked with a (%) specifier before the Entity name in the DOCTYPE declaration. The (%) specifier cannot be used in General Entities.
  • Pre-Defined – Used to represent special characters such as &, <,>, etc.
Internal Entities – Exist in the same document where they are declared as in the example above.
External Entities – Refer to a storage unit outside the document which may be a file or a URL.
<!DOCTYPE CUSTOMER [ <!ENTITY ext SYSTEM "http://mywebsite.com" > ]>
<!DOCTYPE CUSTOMER [ <!ENTITY ext SYSTEM "file:///path/myfiles/file" > ]>
Pre-Defined Entities
Special characters cannot be used in XML because they have special or reserved syntactical meaning to the XML parser e.g. the (&). This is because it is used to specify an Entity when it’s being used in XML.
  1. <STUDENT>  
  2.    <NAME>Jane Doe</NAME>  
  3.    <HOBBIES>Basketball & Cricket</HOBBIES>  
  4. </STUDENT>  
Syntactically the above XML is wrong because it includes a special character (&).
Examples of Pre-Defined Entities
Entity Name (Pre-Defined)
  1. <STUDENT>  
  2.    <NAME>Jane Doe</NAME>  
  3.    <HOBBIES>Basketball &Cricket</HOBBIES>  
  4. </STUDENT>  


XML Entities are very useful in documents just like variables in any programming language.