metadata has become one of the hottest topics surrounding the World Wide Web. Metadata forms the basis for the development of the new Semantic Web, a technology touted as a revolutionary advance in how people use the Internet. The Semantic Web will be built on a technologically-driven understanding of the meaning (or semantics) of information, and an accompanying understanding of the relationships between these meanings. It is thought that the Semantic Web will bring incredible advances in how the web is used, including improving scheduling and the marketplace.
Despite the touted advantages of using metadata in the new Semantic web, there are some serious potential issues that must be worked out before the Semantic web is a viable and successful reality. Some of these issues are purely technical problems with creating meaning (semantics) out of the syntax of metadata. Other issues, that are potentially even more problematic, include privacy issues. The Semantic Web may, paradoxically, provide anonymity to criminals, while creating a variety of serious privacy issues for individuals and businesses.
This paper will review and analyze two recent Internet-based articles about the future of the Semantic Web, and uses of metadata. The specific articles reviewed within this paper are as follows:
1) T. Berners-Lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American (50), May, 2001. http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21,and 2) Ford, Paul. How Google Beat Amazon and EBay to the Semantic Web. Brooklyn, NY: Ftrain, 2002. http://www.ftrain.com/google_takes_all.html
What is Metadata?
Simply put, metadata is just data about data. It describes the content, characteristics, and quality of data. For different professional communities, metadata can have different definitions. For example, metadata was first used mostly within communities that manipulated geospatial data, and referred to standards and internal and external documentation and data that were needed to identify, represent, manage, perform and use data in an information system (Gilliland-Swetland).
Gilliland- Swetland considers that metadata should be considered "the sum total of what one can say about any information object at any level of aggregation." In this definition, an information object is simply anything that can be manipulated as a discrete identity. It can be a single item or consist of many items. Information objects have three consistent features that can be reflected in metadata: 1) content, 2) context, and 3) structure (Gilliland- Swetland).
Metadata can take many forms. Administrative metadata like acquisition information can be used to manage and administer information resources, while descriptive metadata like catalogue resources identify or describe information resources (Gilliland- Swetland).
Ford - How Google Beat Amazon and EBay to the Semantic Web
Paul Ford's article, How Google Beat Amazon and EBay to the Semantic Web, is a fictional article published at a future date in a business magazine in 2009. In the article, Ford describes a scenario where the search engine Google has come to dominate the Semantic web. Google makes $17 billion per annum from Google Marketplace, while Amazon makes $1 billion, and EBay pulls in $1.8 billion.
Ford describes the Semantic way as "just a way to describe things in a way that a computer can 'understand.'" He goes notes logic is the basis of the Semantic web, which is based on a markup language called Resource Description Framework (RDF) that allows you to enter logical statements on the web, and have these statements searched, analyzed, and processed.
So far, the process for the Semantic Web is similar that for a regular search engine. In the case of the Semantic web, however, logical statements written in RTF can be combined. As such, the Semantic web defines relationships between things "whether one thing is a part of another, or how much a thing costs, or when it happened" (Ford).
In this fictional essay, Ford notes that the Simweb was designed to give the World Wide Web intelligence in expressing relationships. When Google first began to experiment with the Simweb in 2003, it was still little understood by most people, learning it was difficult, and coders were scarce. The Simweb promised a great improvement in scheduling appointments, checking schedules, coordinating shipments, updating you computer, and searching for things.
One of the great challenges in creating the Semantic web, notes Ford, was getting meaning (or semantics) out of information (or syntax). Humans are proficient at this task, but as of the early 2000's, computers were not. Ford notes that Google went on the basic idea that meaning in the Simweb could be generated by throwing "together so much syntax from so many people that there's a chance to generate meaning out of it all" (Ford).
Ford notes that Google produced four innovative products that revolutionized the Simweb. These were: 1) Google Marketplace Search, 2) Google Personal Agent, 3) Google Verification Manager, and a software product, 4) Google Marketplace Manager. Google Marketplace Search used the logical statements/pointers (like verb phrases, nouns, and even URLS, and so on) within RDF code to determine relationships within the web. Accountability became an important issue in Google Marketplace Manager, and soon RDF statements from credit cards were logically linked to people's identities, and then to their own recommendations to determine the trustworthiness of sellers, sites, and so on.
In this fictional piece, Ford notes that the privacy afforded by the Semantic web may make it easy for criminal elements to buy and sell illegal items. All they would have to do is create a "ghost taxonomy," where real items or services acted as a 'code' for illegal items. For example, a list of specialized yacht parts may simply be code for cocaine (sailcloth) or weapons-grade plutonium (engine).
Ford brings up an interesting and important issue in his discussion of the privacy implications of the Semantic Web. The Semantic web, according to Ford's fictional description at least, brings a host of other potential privacy issues, which Ford does not investigate thoroughly enough. The Semantic web allows agents to exchange personal data about individuals, compiling detailed databases about items purchased, sites visited, hobbies, and personal information. As such, the Semantic Web may only worsen the complaints of a growing number of individuals who are worried about privacy issues on the existing web.
Berners-Lee, Hendler and Lassila - The Semantic Web
Berners-Lee, Hendler and Lassila's article, The Semantic Web outlines a number of revolutionary new possibilities that may come out of the Semantic Web. They note that the Semantic Web may bring "structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users" (Berners-Lee, Hendler and Lassila). In their 2001 article, they noted that the first steps in creating the Semantic Web were already underway.
Berners-Lee, Hendler and Lassila provide a more detailed description than Ford about the technology that underlies the Semantic web, including a description of how RTF works to create meaning. They note, "In RDF, a document makes assertions that particular things (people, Web pages or whatever) have properties (such as "is a sister of," "is the author of") with certain values (another person, another Web page)." Further they describe how ontologies, simply a type of way to describe metadata, can be used to help define common meanings within databases.
While they note that the potential for the Semantic Web is great, the authors also acknowledge that the technology that would drive the "automated reasoning" of the Semantic Web is in its infancy. Adding logic to the existing web is a profoundly complex and difficult task that relies on a number of mathematical and engineering solutions.
Like Ford, Berners-Lee, Hendler and Lassila fail to provide a useful and thorough analysis of privacy issues that come from the use of metadata on the Semantic Web. Certainly, from a personal point-of-view, privacy is becoming an important issue for a number of web users. In the past year, I have had several versions of Spyware appear on my computer. This often-illegal programs send information to a centralized database about a variety of personal information, including websites visited, e-mail addresses, and items purchases. The Semantic web may simply provide a sanctioned, legal way for companies to invade my privacy.
It is profoundly important to consider how privacy issues will be handled on the Semantic Web (and even the "old" web). It is the right of every individual to keep certain information private. As such, many people may be reluctant to take place in both a new Semantic web that sacrifices privacy for a great deal of convenience. While Berners-Lee, Hendler and Lassila note the importance of digital signatures in the Semantic Web, these are not a fail-proof solution to ensure personal privacy.
As a long-time web user and technology user, these issues are beginning to affect me strongly, and I am often concerned about how a less technically adept user would respond. My personal firewall warns me constantly about unauthorized users trying to access my machine, I regularly receive viruses in e-mail attachments, web sites routinely place cookies on my computer, and I have to be vigilant about revealing my…