EOM June 2005 > Departments > Understanding Technology

The Extensible Markup Language (XML): Hype and Reality

Chris Andrews

In the years that I have been a consultant I have encountered technical managers who wish to implement the Extensible Markup Language (XML) to solve problems ranging from application integration to poor project management. Next to the term "Web," XML is probably the most overused buzzword that I have encountered inside the geospatial industry and out of it. Even experienced developers sometimes abandon reason and tout XML as the solution to problems that require thought and not hype.

XML is a text-based data formatting language derived from the Standard Generalized Markup Language (SGML). SGML is used to define data formats that encompass both the data content and semantic information about the content. When used properly, XML becomes a tool that facilitates data exchange and helps support functionality with its flexible, easily navigated format. While many people believe that the primary strength of XML is that it can be human readable, its true strengths lie in being reliably validated and read using standardized, platform-independent text parsing tools.

Validation is generally overlooked as an important part of XML technology because, until recently, XML parsers incompletely implemented standard XML validation tools. Within the last few years, the World Wide Web Consortium (W3C) switched from using the Document Type Definition, a non-XML definition language, to XML Schema, an XML-based tool, for defining valid XML documents. This relatively minor fluctuation in standards introduced insecurity in XML validation tools that caused many managers and architects to ignore a crucial benefit to using XML validation. Validation is a crucial component of complex integrated systems because it allows workflows to fail quickly when interfaces are out of synch or incomplete. Schema is now here to stay.

Readability is both the boon and the bane of the XML using world. There exist numerous standard toolkits to programmatically manipulate XML. However, because XML can be parsed by any language that can handle text data and file I/O, developers have spent countless wasted hours implementing custom tools to parse XML and extract data. The Java Architecture for XML Binding (JAXB), the Microsoft Corporation standard XML parsers for .NET, and the pyXML package for the Python programming language all represent standard parsers that incorporate XML validation and reading capability. When complex architectures fail to use such tools, the cost of time spent reworking code in response to changing document schema will quickly surpass the benefits of using XML at all.