XML and How it Will Change the Web
by Doug TidwellWednesday, 3rd August 2005
Why do we need XML?
When people first hear about XML, they often ask why we need another markup language. Everybody's browser supports HTML today, so why create more tags? Given that lots of HTML tags haven't been implemented the same way by the big browser vendors, why let anybody and everybody create their own tags?The answer to these questions is that HTML and XML serve different functions: HTML tags describe how to render things on the screen, while XML tags describe what things are. Put another way, HTML tags are designed for the interaction between humans and computers; XML tags are designed for the interaction between two computers.
To see this difference, look at the HTML and XML versions of a short document. Listing 1 shows the HTML version.
Listing 1. The HTML version of an address
1401 Main Street
Anytown, NC 34829
When this document is rendered in a browser, it looks something like this:
Mrs. Mary McGoon
1401 Main Street
Anytown, NC 34829
Anyone familiar with postal addresses in the United States will recognize this document as someone's address. Even if you're from another country where postal codes and other conventions are different, you can still surmise that this is someone's address. Imagine writing code to interpret this document, however. To extract the zip code from this address, our algorithm might look like this: Given a
tag that contains two
tags, take the text of the second
tag. In that text, everything up to the comma is the name of the city, the two-character token following the comma is the name of the state, and the final token is the zip code.
While this algorithm would work for our sample HTML document, it's easy to think of a perfectly valid address that breaks our algorithm. We've also completely sidestepped the issue of distinguishing a
tag that contains an address from any other
tag. While the address formats beautifully in a browser, our HTML markup isn't nearly as well suited for use by another program.
Now let's take a look at an XML version of the same document in Listing 2.
Listing 2. The XML version of the same address
McGoon
1401 Main Street
Anytown
NC
34829
As with our HTML document, anyone familiar with U.S. postal addresses will recognize this document as an address. More importantly, a computer can recognize the parts of this address as well. Here's a much more robust algorithm for finding the zip code in our XML document:
The zip code is the text of thetag.
This algorithm is much simpler to code, and it would be difficult, if not impossible, to write a valid address that breaks this algorithm. A computer can understand all of the parts of the address and how they relate to each other, and the computer can decide the best way to render that data. For example, the XML document might be rendered like this:
Mrs. Mary McGoon
1401 Main Street
Anytown, NC 34829
In rendering the XML tags in this style, you could convert them into HTML markup that's virtually identical to the earlier HTML document. If you want to print a mailing label for this address, you might render the document like this:
In this case, you print Mrs. McGoon's zip code as a bar code for the benefit of the scanners at the post office. The most important concept here is that content and presentation are separate. The data and its structure are tagged in a presentation-independent way, and the decision of how to render it is delayed as long as possible.
Options:
Printer Friendly
Email Friend
Doug Tidwell is a Senior Programmer and Cyber Evangelist at IBM. He has more than a seventh of a century of programming experience and has been working with XML-like applications for several years. His work as a Cyber Evangelist is basically to look busy, and to help customers evaluate and implement XML technology. Using a specially designed pair of zircon-encrusted tweezers, he holds a Masters Degree in Computer Science from Vanderbilt University and a Bachelors Degree in English from the University of Georgia. He can be reached at dtidwell@us.ibm.com.
