XML
Physical Evaluation
Of XML PAT Algebra Operators
This abstract proposal presents a foreshadowing of what would be the foundation for a full written evaluative report on how the Extensible Markup Language (XML) PAT Algebra Operators could be physically evaluated and then assign each with a cost so as to help standardize the XML programming query process. With the advances in computing hardware and software, the internet has grown and transformed the world. For example, email has revolutionized communications and throughout the world, other technological breakthroughs will continue to shape how politics, science and even business are conducted. A cornerstone of technological advancement can be tied to certain practices, one being standardization.
In the 1920's, one of Henry Ford's top executives implemented a process of factory standardization and it became the norm throughout the organization's automobile manufacturing plants. The process was so successful that it was later adopted by the aeronautics industry. "Coffin was one of the leading figures in promoting technical standardization through the Society of Automotive Engineers, and he later contributed to the development of aeronautics -- in fact, he was responsible for changing the Society's name from "Automobile Engineers" to "Automotive Engineers" so as to include the aircraft men." (Rae, p. 57)
Standardization has therefore greatly influenced and enhanced the majority of research, development and productivity in most industries today by helping to reduce maintenance, production costs and at the same time simplifying cycles of the production process. Even the industries like food service utilizes the process as can be demonstrated by the McDonald's restaurant chain's many systems for serving the perfect French-fry or Big Mac. Information technology firms like Microsoft, CISCO and Hewlett Packard and every business school and university throughout the world present standardization as a way of establishing financial and industry growth. The Extensible Markup Language should be no exception. XML has consistently used operators and access methods to enhance the query process and organizing PAT Algebra Operators could greatly help advance the current query processes.
Background
The Extensible Markup Language or XML was created to help allow users to share and pass documents that were richly structured over the web in an easy cost effective manner. XML's predecessors were not well equipped to handle these responsibilities. "The only viable alternatives, HTML and SGML, are not practical for this purpose." (XML.com, 2005) For example, SGML as a predecessor, failed to be useful for documents richly structured because the process provided an arbitrary structure that was far too difficult to implement for the likes of a web browser and if a full SGML system was to be utilized, it would have been far too complex or expensive to justify cost and implementation headaches.
Thus XML was the solution for the future. "While XML is being designed to deliver structured content over the web, some of the very features it lacks to make this practical, make SGML a more satisfactory solution for the creation and long-time storage of complex documents. In many organizations, filtering SGML to XML will be the standard procedure for web delivery." (XML.com, 2005)
XML was initially a creation of the W3C Generic SGML Editorial Review Board which was formed in 1996 by the W3 Consortium which was chaired by Jon Bosak of Sun Microsystems. "XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encodings." (Oasis, 2005) XML is considered a markup language for internet-based documents that contain a formal or structured amount of information. "Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure." (XML.com, 2005)
The World Wide Web has created a need for very sophisticated markup languages such as XML. "The number of applications currently being developed that are based on, or make use of, XML documents is truly amazing (particularly when you consider that XML is not yet a year old)! For our purposes, the word "document" refers not only to traditional documents, like this one, but also to the myriad of other XML "data formats." These include vector graphics, e-commerce transactions, mathematical equations, object meta-data, server APIs, and a thousand other kinds of structured information." (XML.com, 2005) A markup language can be considered as a methodology of identifying the inherent structure of a document and therefore XML is a critical aspect of the World Wide Web because it helps explain the way to add markup to any and all documents.
XML is very different from HTML in the sense that HTML is a system that tags semantics with fixed sets. Xml on the other hand, does not specify semantics or any tag sets. "In fact XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them." (XML.com, 2005) This entails that because there are no predefined tag sets, the XML process does not confine the user to any preconceived semantics because the document is externally defined.
The Extensible Markup Language covers a full spectrum of logic and process concerns due to the fact that it has become the universal format for the majority of structured documents and data on the internet and World Wide Web. However, the process of utilizing XML is not a completely simple process as of yet. "XML is primarily intended to meet the requirements of large-scale Web content providers for industry-specific markup, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of Web documents by intelligent clients. It is also expected to find use in certain metadata applications." (Oasis, 2005)
XML is a subset of Standard Generalized Markup Language defined or SGML as defined by ISO 8879. "The Extensible Markup Language (XML) is descriptively identified in the XML 1.0 W3C Recommendation as an extremely simple dialect [or 'subset'] of SGML the goal of which is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML, for which reason XML has been designed for ease of implementation, and for interoperability with both SGML and HTML." (Oasis, 2005)
With that being said, XML is technically only considered a part of SGML and there are subtle differences. This entails that any fully conformant SGML system should also be capable of reading XML documents. For example, white space following tags may be different in XML and SGML.
History
We now have a very complex world of information technology. In the past, computers were always 'dumb' terminals only capable of accepting data from their mainframe overseer. The problem was that machines could not communicate with one another. Through standardization the industry found a solution and from that point forward machines were able pass data seamlessly. Suddenly the inter-machine office needs required communication through Local Area Network or LAN. "It's depressing how often we see that those who don't remember history are doomed to repeat it. When cordless phones and the first analog cell phones hit the market, anybody with a scanner that operated at the right frequency could easily listen to calls not intended for them." (Gast, 2002)
Data was always more easy to manage because there were only a few people that actually had the necessary access to mainframes. However, today even the less complicated networks consist of many individual nodes which can each be considered even more powerful than mainframes of only a few decades ago. Networking and networks have the ability to access the Internet which opens up the entire world. "The Internet began in 1969 as the ARPANET, a project funded by the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense. One of the original goals of the project was to create a network that would continue to function even if major sections of the network failed or were attacked. The ARPANET was designed to reroute network traffic automatically around problems in connecting systems or in passing along the necessary information to keep the network functioning. Thus, from the beginning, the Internet was designed to be robust against denial-of-service attacks, which are described in a section below on denial of service." (Dekker, 2004)
Networks are a series of integrated machines that require complicated software and produces seamless communication. "Today's network incorporates all sorts of wonderful but unsettling services. Voice data travels over the enterprise network. Files are shared. Corporate networks now include travelers and customers, often in the name of e-business and e-commerce." (Avolio, 2000)
Governing bodies
In less than six decades, the hardware and software industries have completely transformed themselves. Consider that the UNIVAC required several rooms but today a single PC can easily dwarf UNIVAC's capabilities. Networking came this far because of the concept of standardization as ARPANET and other bodies instituted standards that helped the development process of networking processes like the internet.
One such body is the American National Standards Institute or ANSI which is a non-profit private organization that surprisingly institutes standards the industry accepts voluntarily. Other influential standards organizations include the Institute of Electrical and Electronic Engineers or IEEE and the Organization for Standardization or ISO. The IEEE was the organization that defined LAN standards in the Project 802 or the 802 series. These projects could be the blueprints that could be used to make XML more effective by using PAT Algebra Operators for query needs.
XML PAT Algebra Operators
The internet is based on a foundation of distributed hypertext. There is also plenty of proof that the internet could be regarded as a large distributed database where there are million to billions of queries processed daily. "XML is too slow an exchange format for any large volume of data transfer. It is fine for exchange of small amounts of data on the fly but when you get to the stage of wanting to transfer Gigabytes of data when converted to XML that mushrooms pretty rapidly. Before XML becomes any more mainstream it should be looked at now to see how compression can be adopted. It must be backward compatible so that uncompressed XML will work with new compressed aware apps." (Tech Republic, 2005)
XML as a language does exactly that. XML was designed to make the information that is scattered all around the world to seem more like a large repository of database that can be retrieved by using XML. The problem is that databases such as Oracle or ACCESS can use the specific programming languages that are like or incorporate the Sequential Query Language or SQL. "The mission of the XML Query project is to provide flexible query facilities to extract data from real and virtual documents on the World Wide Web, therefore finally providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases." (W3C, 2005)
You’re 81% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.