Extensible Markup Language (XML) was born in the height of the browser wars in the mid-1990's. As Microsoft, Netscape, and W3C produced new and better versions of HTML. Jon Bosak, of Sun Microsystems, started the W3C SGML working group. Before long this name was changed to XML and the light cousin of SGML was born. Since that time XML has become a major server side resource for web site presentations.
The Standard Generalized Markup Language, SGML, is a very complex and rigid mark-up language used mostly for the publishing industry. Pre-dating HTML, SGML has been around for a little over a decade. SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language defined in SGML. In order to bring sanity to the chaos the browser wars created, HTML has also become a rigid and standardized mark-up language for web presentation. This leaves XML to fill in the flexibility void that was formed. XML was designed from its inception to be flexible enough to describe any kind of mark up schema that the industry could devise.
This flexibility allowed for the development of many uses that were not envisioned when XML was formalized. XML has diverged into the general categories of describing data structures, moving data around, transforming data from one format to another, and the current trend of Web Services. Web Services are possible as a language-neutral communication
Channel, since XML's design is programming language-neutral.
When XML was first designed, many people thought that XML would be quickly adopted as a replacement for HTML. At first, XML was used to describe scientific terms and domains such as chemistry and music. This never really caught on beyond the small groups who championed their mark-ups. Instead, XML has become more of a server side tool than a presentation tool. However, HTML 4 has been extended into XHTML. Even though this is a new standard, compatibility with existing HTML user agents is possible by following a small set of guidelines. This means that the original vision might yet be realized as more tools are developed that produce content in this format.
The most basic use for XML is to describe the data structure of the provided data. The following example of an employee record shows how this might be formatted.
A xml version="1.0"?>
DOCTYPE employee SYSTEM "employee.dtd"> employee> name>Mr. XML Schema position>facilitator contact> mail-stop>B8 phone-ext>2388 email> contact> employee>
The first line defines what XML version is being used, which is used by parsers. The next line is the tag. Everything between that point and the ending tag is considered to be an Employee. This employee contains attributes of name and position, as well as an object attribute of contact. This self documentation assists developers and parsers know what the data contained between the tags represents.
While this is a major break through in the description of data, it doesn't define what is allowed as content. This validation function and description is done using the complex document type definition, DTD, XML markup declarations. A DTD for the example above might look similar to this example.
ELEMENT employee (name, position, contact)>
ELEMENT name (#PCDATA)>
ELEMENT position (#PCDATA)>
ELEMENT contact (mail-stop, phone-ext, email)>
ELEMENT mail-stop (#PCDATA)>
ELEMENT phone-ext (#PCDATA)>
ELEMENT email (#PCDATA)>
The requirement to get this employee record into a web page for presentation is called a transformation from XML to HTML, which is done with the XML file, a XSL transformation file, and a parser. This is how XML was used, right after it was invented. Over the years, this has grown in complexity so that PDF files, scalable vector graphics, and other advanced transformations can be produced.
Another use for these transformations has made it into the database world for transferring data from one table schema to another one. In this same legacy, data can be provided in new formats for new systems. These kinds of processes have greatly enhanced the data warehouse industry. Without transformations, the web browser will either show the raw XML or use its default XML parser to display the data to the user, which doesn't look very good for general web viewing.
A xml version="1.0"?>
DOCTYPE employee (View Source for full doctype...)> employee> name>Mr. XML Schema position>facilitator contact> mail-stop>B8 phone-ext>2388 email> contact> employee>
Parsers come in two flavors: validating and non-validating against the DTD.
Further, there are two different parsing styles: DOM and SAX. The XML developer must pick which parser style to use and if validation should be performed or not.
DOM, Document Object Model, parsers reads the entire XML document into memory as a document tree. The developer then traverses this model by reading the value of each node. Since the entire document is in memory this parsing process can be very slow. This development leads to the next parsing style.
SAX parsers read XML documents one element at a time and are considered to be event based. This lowers the memory requirement considerably, but provides a very narrow view of the contents. SAX is the preferred parsing mode in modern programming languages, and each element in the XML file is an event. Based on the contents of that event you can determine what to do. This is useful for object creation using reflection.
Java, and other languages, provide a language feature called reflection. Using reflection, data objects can be serialized into XML; then transfers to other languages; then de-serialized again in that other language. One way this is done, is to have matching objects in both languages. The reason is to provide a web front-end to a legacy system.
XML is so popular that many new web sites are moving around their data into an XML format, and converting their OO objects into XML for business logic processing. Java can provide this kind of functionality using the Domify library from SourceForge.net. Domify is a Java library which adapts an arbitrary graph of Java objects to a W3C DOM interface. The DOM nodes are lazy-loaded to minimize processing overhead. This feature is easily tied into the Model-View-Controller architecture for web presentation, which has been done in the Maveric MVC framework.
XML has become the defacto format for modern configuration files. The up side to this is that XML is much more powerful than the traditional ini file format. This allows for custom configurations which Java's Enterprise Java Beans, EJB, make use of. EJB configuration files can have SQL statements, database pooling information, class names for reflection invocation, and many other possibilities. This is only one of many examples how XML has become the server side companion.
XML often is an integral part of the Open Source movement. The Jakarta ant utility uses a file called build.xml to provide details about how to compile, build, and deploy applications. One of the most widely known Open Source examples of using XML in a configuration file is in the Apache web server. With an army of switches, the configuration options are almost endless. So while XML may not be at the browser level, it has become an integral part of the web viewing experience.
The number one down side to using XML for application configuration is that there is always a learning curve involved with every new application, which mean many hours are spent reading the configuration manual to learn the options and configuration structure. As an assistance, most applications come with default configurations that will be very close. Often developers place the items that must be changed at the top of the configuration file. This issue will lessen as more and more XML editor tools are produced that can allow the user more of a drag and drop interface rather than a manual text editing process.
Web Services are the new hot battleground between Sun and Microsoft. Both sides are promoting their view of internet communication; however, both sides agree that XML should be at the heart of this communication. A key component of Web Services are the XML Registries that allow registry operations over divers registry provides.
Java is promoting the JAXR wrapper service to hide the complexities of communicating with UDDI (Universal Description, Discovery, and Integration) and ebXML registries. These registries allow their users to find out what Web Service resources a company is currently advertising. The number one service protocol that businesses are advertising right now are SOAP services. SOAP, which was invented by Microsoft, uses XML to transport data by using "envelopes" The header tells the SOAP message how to go from the sender node to the receiver node. This is a sample SOAP message for travel reservations.
A xml version='1.0'?> env:Envelope xmlns:env= www.w3.org/2002/06/soap-envelope env:Header> m:reservation xmlns:m= env:role= www.w3.org/2002/06/soap-envelope/role/next env:mustUnderstand="true"> m:reference>uuid:093a2da1-q345-739r-ba5d-pqff98fe8j7d m:dateAndTime>2001-11-29T13:20:00.000-05:00 m:reservation> n:passenger xmlns:n= env:role= www.w3.org/2002/06/soap-envelope/role/next env:mustUnderstand="true"> n:name>John Q. Public n:passenger> env:Header> env:Body> p:itinerary xmlns:p= > p:departure> p:departing>New York p:arriving>Los Angeles p:departureDate>2001-12-14 p:departureTime>late afternoon p:seatPreference>aisle p:departure> p:return> p:departing>Los Angeles p:arriving>New York p:departureDate>2001-12-20 p:departureTime>mid-morning p:seatPreference/> p:return> p:itinerary> q:lodging xmlns:q= > q:preference>none q:lodging> env:Body> env:Envelope>