¶ … Wide Web is available around the world today, and consists of billions of pages of information and several pages are being added every second. As a result, billions of users are increasingly turning to the Web for answers, as well as recreation, shopping and even education. In addition, the healthcare community is increasingly turning to the World Wide Web to obtain information and to provide it to their peers and patients as well. One of the unfortunate consequences of the enormity of the amount of information that is available on the Web is the inability of many search engines to identify the precise information desired by the user. Indeed, a simple search might result in hundreds or even thousands (or tens of thousands) or irrelevant or spam-filled matches that only serve to delay the user in reaching the desired information. The purpose of the study was to identify current methods that optimize the World Wide Web for research purposes in general and for the research needs of the healthcare community in general and physicians in particular. To this end, a review and meta-analysis of the relevant literature and related studies is followed by a summary of the research, conclusions and recommendations in the concluding chapter.
Table of Contents
Chapter 1: Introduction
Statement of the Problem
Purpose of Study
Importance of Study
Overview of Study
Chapter 2: Review of Related Literature
Chapter 3: Methodology
Description of the Study Approach
Data-gathering Method and Database of Study
Chapter 4: Data Analysis
Chapter 5: Summary, Conclusions and Recommendations
World Wide Web Search Optimization for Physician Research
Chapter 1:
Introduction
By any measure, researchers today enjoy a wide range of advantages compared to just a few years ago, and computer-based innovations continue to be introduced on virtually a daily basis. The so-called "Age of Information," though, was made possible by the introduction of the World Wide Web. For instance, Burnett and Marshall report that, "The 1990s saw the rise of the Internet (variously described as the Infobahn, the Information Highway, the Net, the Matrix, or the Web), mostly due to the establishment of the World Wide Web (WWW) as the user friendly multimedia portion of the Internet. The Web part of the Internet enabled access to increasing amounts of information and data and new possibilities for interaction" (2002, p. 1). Not surprisingly, healthcare practitioners have embraced the World Wide Web in the delivery of professional services of all types, including so-called "telemedicine." According to Miller, Hillman and Given (2004), the results of 1,200 physician responses to a Deloitte Research/Fulcrum Analytics questionnaire concerning office-based physician use of the Internet and other information technology determined that about half of medical doctors in the United States are currently using, or are prepared to use, information technology for clinical care purposes. Although many of the physician respondents voiced concerns about the privacy considerations involved in using email to communicate with their patients, the survey also showed that ". . . policies aimed at increasing physician it use for clinical management should be tailored to specific segments of the physician it user spectrum, rather than using a 'one-size-fits-all' policy approach" (Miller et al., 2004, p. 72).
Moreover, physicians are increasingly relying on the World Wide Web to help provide their patients with healthcare communications that may be superior to text-based guidance only. In this regard, Thompson, Dorsey, Miller and Parrott (2003) report that, "One strength of this communication medium is that health messages can be delivered to receivers through multiple communication channels such as text, graphics, photos, animations, audio, and video. These nontext channels may be more accessible and understandable by people with low health literacy than text-based messages alone" (p. 596). In this environment, identifying opportunities to improve the ability of healthcare professionals in general and physicians in particular has assumed new importance and relevancy, issues that also form the problem considered by this study which is discussed further below.
Statement of the Problem
The World Wide Web is currently comprised of billions of Web pages, with several more being added every second, that are readily searchable and represent an enormous and valuable resource for the healthcare profession. There also exists a "dark" or "invisible" Web, though, that requires more sophisticated search techniques to access (Pedley, 2001). Moreover, the software engineers who developed the search algorithms that allow Web users to search for relevant material have focused on functionality from their perspective rather than what the people who are actually using these search engines may need. According to Diaper and Stanton (2004), "The core of the problem is the historical separation of software engineering and human-computer interaction. Many task analysis methods were developed by researchers with a psychological background, and these methods and their outputs often do not integrate well with those of software engineering" (p. 30). This point is also made by Welborn and Kasten (2003) who emphasize, "There is still an enormously tacit aspect in determining exactly what a given application should do. There are practices that guide technologists regarding how they should go about finding out what they need; however, the actual description of what an end user wants the application to do involves business people communicating with technologists across a significant language gulf" (p. 89). In response to these constraints in the use of the Web, a growing number of search algorithms have been developed that help fine-tune the types of site matches that are delivered to Web users. According to Jeh and Widom (2003), "Recent web search techniques augment traditional text matching with a global notion of 'importance' based on the linkage structure of the web, such as in Google's PageRank algorithm" (p. 1).
The trend toward providing Web users with the opportunity to more precisely hone in on what they specifically need while avoiding thousands and thousands of irrelevant sites and spam messages is clear, but the process remains under development in many way. For instance, Jeh and Widom add that, "For more refined searches, this global notion of importance can be specialized to create personalized views of importance -- for example, importance scores can be biased according to a user-specified set of initially-interesting pages" (p. 1). In reality, there are a number of constraints involved in providing a truly personalized search capability with the technology available today. According to Jeh and Widom, "Computing and storing all possible personalized views in advance is impractical, as is computing personalized views at query time, since the computation of each view requires an iterative computation over the Web graph" (2003, p. 1).
Therefore, innovations such as the Unified Modeling Language (UML) have been introduced in an effort to bridge the gulf between user accessibility, ease of use and relevancy of search results and Web site design. According to Yang and Lu (2005), "Unified Modeling Language is a standard language for specifying, visualizing, constructing and documenting the artifacts of software systems, as well as for business modeling and other non-software systems" (p. 3). In this regard, Welborn and Kasten add, "There are some development approaches, such as the methods used with the Unified Modeling Language that facilitate bridging the gulf, but generally it takes knowledgeable people working together to make effective applications" (p. 89). Although it is reasonable to posit that both software engineers and physicians are "knowledgeable people," there remains a fundamental need to identify ways to improve the ability of these disciplines to communicate with each other and to develop superior approaches to searching the World Wide Web to provide meaningful results in a timely fashion without being forced to wade through hundreds or thousands of unrelated or only tangentially relevant Web sites to find what is desired.
Purpose of Study
The purpose of the study was to identify current methods that optimize the World Wide Web for research purposes in general and for the research needs of the healthcare community and physicians in particular. In support of this purpose, it was the goal of the study to improve the quality of the search results based on implicit and explicit feedback by filtering the irrelevant search results based on information contained in user profiles.
Importance of Study
There is so much information available on the World Wide Web today that it is like trying to drink from a fire hose. Following the introduction of the Internet and World Wide Web in the closing years of the 20th century, a number of approaches have been experimented with in an effort to identify superior techniques to search for relevant information while weeding out what was irrelevant or insufficiently relevant to include in search results. In this regard, Welborn and Kasten (2003) advise, "Over the years, there have been many approaches to modeling and design, but it appears that the clearly preferred approach is object-oriented design, with the design expressed in the Unified Modeling Language" (p. 214). According to Welborn and Kasten, the return on investment for learning UML is well worthwhile: "Object-oriented design is a discipline that must be learned, and UML is a language in which we strive to be fluent. The reward for the effort of learning is access to a vocabulary that is shared by a very large population across all industries globally" (p. 214). Moreover, according to Bell, because UML is a language rather than a methodology, practitioners who are familiar with UML can join a project at any point from anywhere in the world and become productive right away. Therefore, Web applications that are built using UML provide a useful approach to helping professionals gain access to the information they need when they need it.
Overview of the Study
This paper used a five-chapter format to achieve the above-stated research purpose. Chapter one of the study was used to introduce the topic under consideration, provide a statement of the problem, the purpose of the study and its importance of the study. Chapter two of the study provides a review of the related peer-reviewed and scholarly literature concerning search optimization on the World Wide Web, and chapter three describes more fully the study's methodology, including a description of the study approach, the data-gathering method and the database of study consulted. Penultimately, chapter four consists of an analysis of the data developed during the research process and chapter five presents the study's conclusions, a summary of the research and recommendations.
Chapter 2:
Review of the Related Literature
The World Wide Web
The World Wide Web (hereinafter the "WWW" or alternatively, "the Web") is a unique information environment because it is (a) very large and growing larger daily, (b) highly searchable, (c) virtually ubiquitous, and (4) potentially very useful (Ratner, 2003). By any measure, the Web is enormous and continues to grow at exponential rates. For example, in 2003, one new server was introduced to the WWW every 2 seconds, seven-and-a-half Web pages added every second, and there were already 27.5 million Web sites and 413.7 million users (Ratner, 2003). Today, there are more than one-and-a-half billion Web users (Turner, 2009) and the WWW represents a highly accessible medium that features a wide range of search engines that are used to locate relevant and desired information (Ratner, 2003). Google, for example, provides hundreds of millions of searches each day (Ratner, 2003). According to Wade (2009), "Over the past decade, Google has revolutionized the internet. By devising complex search algorithms and amassing vast storehouses of computational power, the Mountain View, California-based company has democratized knowledge distribution to the point where every individual can now the access volumes of information that historically required the backing of an organization" (p. 37).
Moreover, the WWW has become increasingly available in other countries and access has been simplified in a number of ways; in addition, access to the Web can be achieved through the use of various handheld peripherals and television sets (Moyer, 2009). According to Ratner (2003), "Last but not least, the Web contains information that users want. A common phrase among Net-savvy users is 'You can find the answer on the Web'" (p. 267). Indeed, a commonly heard phrase in response to a question today is to "Google it." The WWW has introduced some superior and fundamental changes in the way people go about searching for information compared to years past, but there are still some constraints to its effective use firmly in place. For instance, Ratner advises that, "The Web is larger, more searchable, more ubiquitous, and more useful than previous digital libraries. However, even though the Web has made a wide variety of information available, this increase in the amount of accessible information actually exacerbates the problem of information access, because as humans we have limited human capacity for absorbing information" (2003, 268).
As noted in the introductory chapter, there are also Web sites that are more difficult to find during searches, resulting in the reference to these resources as the "dark" or "invisible" Web (Pedley, 2001). On the one hand, Pedley notes that, "The visible web is the 'publicly indexable' or 'surface web' -- those Web sites that have been picked up and indexed by the search engines" (p. 4). On the other hand, there is the so-called "invisible" or "dark Web." In this regard, Pedley advises, "The phrase 'the invisible Web' refers to information that search engines cannot or do not index. The content that resides in searchable databases, for example, cannot be indexed or queried by traditional search engines because the results are generated dynamically in response to a direct query" (p. 4). The term, "invisible Web," refers to the hidden nature of the Web pages that are not readily accessed using standard search engines. For instance, according to Pedley, "Whilst the search engines might be able to index the home page of a database, they are unable to index each individual record within that database. So, in effect, an enormous amount of valuable content on the web is 'invisible' because it is locked up within databases" (p. 5). There are other constraints to providing efficient search results using many popular search engines. Beyond the "visible" and "invisible" Web exists yet another component that contains an enormous amount of information that is easier to access than the invisible Web but more difficult to access than the visible Web. In this regard, Pedley notes that, "The World Wide Web is so big that to index every single page available would put a great strain on the available computer power, and consequently the search engines may impose a limit on the number of pages that they retrieve from a Web site" (p. 6).
This constraint in particular is the result of a management decision on the part of the search engine industry that places limits on the number of pages that their services will index from a specific site; however, once a Web site is located using a search engine, it may be possible to access these "hidden" pages through the use of the hyperlinks maintained on the site that have been indexed; or through the site map of a given Web site (Pedley, 2001). This part of the WWW has been termed the "barely visible Web" or the "opaque Web" in contrast to the "dark" or "hidden" Web, and there are a number of important reasons why this segment of the WWW exists today, including the following:
1. Depth of crawl -- the search engines may have a fixed limit on how many pages they will index within a site;
2. Frequency of updating -- while some sites are updated many times a day, the search engines might only revisit the site every few weeks or months and so there will always be a time lag between new data being loaded onto a site and the search engines indexing that new information. The search engines are not geared up for sites with real-time or frequently updated content.
3. Robots.txt or the NOINDEX metatag -- search engines use "robots" in order to scan and index a website. It is possible to tell them which pages and directories they can index by using the robots.txt file; however, some ISP's might not let users have access to the robots.txt file, in which case they can use the NOINDEX and the NOFOLLOW metatags. A value of "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. A value of "NOFOLLOW" allows the page to be indexed, but no links from the page are explored (Pedley, 2001, p. 7).
Therefore, identifying the most relevant Web sites based on user profiles can help facilitate the search process by eliminating extraneous sites and targeting those that specifically match the search terms as well as the profile for the individual conducting the search. According to Colborn (2006), the search engine industry has become increasingly aggressive in its marketing efforts in an attempt to remain competitive with industry leaders such as Google, Yahoo!, MSN Search and Ask.com. A review of the top-ranked search engines compiled by Wall (2006) describes the attributes and weaknesses of these search engine services which are set forth in Table ____ below.
Table
Comparison of Respective Search Attributes and Weaknesses of Google, Yahoo!, MSN Search and Ask.com
Search Engine Service
Attributes/Weaknesses
1. Has a great deal of experience in the search industry.
2. Is much better than the other engines at determining if a link is a true editorial citation or an artificial link
3. Looks for natural link growth over time.
4. Heavily biases search results toward informational resources.
5. Trusts old sites far too much.
6. A page on a site or subdomain of a site with significant age or link related trust can rank much better than it should, even with no external citations
7. They have aggressive duplicate content filters that filter out many pages with similar content.
8. If a page is obviously focused on a term they may filter the document out for that term. Page variation and link anchor text variation are important. A page with a single reference or a few references of a modifier will frequently outrank pages that are heavily focused on a search phrase containing that modifier.
9. Crawl depth determined not only by link quantity, but also link quality. Excessive low quality links may make a site less likely to be crawled deep or even included in the index.
10. Off topic reciprocal links are generally ineffective in Google when the associated opportunity cost is taken into account.
Yahoo!
1. Has been in the search game for many years.
2. Is better than MSN but nowhere near as good as Google at determining if a link is a natural citation or not.
3. Has an enormous amount of internal content and a paid inclusion program, both of which give them incentive to bias search results toward commercial results
4. Off topic reciprocal links still function well in Yahoo!
MSN Search
1. Relatively new to the search industry.
2. Poor at determining if a link is natural or artificial in nature due to link analysis in which it places too much weight on the page content.
3. Their poor relevancy algorithms cause a heavy bias toward commercial results.
4. Prefers recent links.
5. New sites that are generally untrusted in other systems can rank quickly in MSN Search.
Ask.com
1. Looks at topical communities.
2. Due to their heavy emphasis on topical communities they are slow to rank sites until they are heavily cited from within their topical community.
3. Due to their limited market share they probably are not worth paying much attention to unless a company is in a vertical where they have a strong brand that drives significant search traffic.
Source: Wall, 2006, para. 4-5
Because it is one of the current search industry leaders, understanding how Google's search algorithm functions provides some useful insights into what is involved in searching the Web and what areas remain deficient. According to "Google Basics" (2009), "When you sit down at your computer and do a Google search, you're almost instantly presented with a list of results from all over the web" (para. 1). Google accomplishes this split-second search of billions of Web sites by "crawling," "indexing" and "serving." In this regard, Google reports that, "In the simplest terms, you could think of searching the web as looking in a very large book with an impressive index telling you exactly where everything is located. When you perform a Google search, our programs check our index to determine the most relevant search results to be returned ('served') to you" (Google Basics, 2009, para. 2).
The three foregoing key processes used to provide search results to Web users are described further in Table ____ below.
Table
Key Processes Used by Google to Search the World Wide Web
Key Process
Description
Crawling
Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index. Google uses a huge set of computers to fetch (or "crawl") billions of pages on the Web. The program that does the fetching is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. Google's crawl process begins with a list of Web page URLs, generated from previous crawl processes, and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index. Google does not accept payment to crawl a site more frequently, and the company keeps the search side of its business separate from our revenue-generating AdWords service.
Indexing
Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, Google processes information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, Google is unable to process the content of some rich media files or dynamic pages.
Serving
When a user enters a query, Google searches the index for matching pages and returns the results that are believed to be the most relevant to the user. Relevancy is determined by over 200 factors, one of which is the PageRank for a given page. PageRank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on a site from another site adds to the site's PageRank. Not all links are equal: Google works hard to improve the user experience by identifying spam links and other practices that negatively impact search results. The best types of links are those that are given based on the quality of site content. In addition, Google's Related Searches, Spelling Suggestions, and Google Suggest features are designed to help users save time by displaying related terms, common misspellings, and popular queries. Like our google.com search results, the keywords used by these features are automatically generated by our web crawlers and search algorithms. Google displays these suggestions only when it believes they might save the user time. If a site ranks well for a keyword, it is based on the algorithmic determination that its content is more relevant to the user's query.
Source: Google Basics, 2009 at http://www.google.com/support/webmasters/bin/answer. py?answer=70897#1
Although they differ in their algorithmic approaches to searching the Web, the algorithms used by the foregoing companies and others competing in the search industry share some commonalities. In this regard, Colborn makes the following points:
1. Search engines were initially designed to find the proverbial needle in a haystack and make sense of the millions, now billions, of Web pages on the Internet.
2. Search is a rapidly growing industry with a large amount of opportunity for all marketers, especially because many current players are not employed any specific type of strategy.
3. Companies learned that optimizing a Web site to gain stronger listings in search engines was an effective way of gaining prominent listings for certain searched terms.
4. As the opportunity to optimize Web sites in an ethical fashion emerged, so did unethical search optimization which resulted in search engine listing being inundated with irrelevant matches.
5. New, stricter rules concerning relevancy have resulted in paid mechanisms that subsequently too the process a stage further by allowing non-optimized Web sites to be listed based on the amount of money advertisers were willing to pay (i.e., Google Ads).
6. Search is most commonly used for driving awareness, branding, sales, and leads through a company's Web site conversion points (Colburn, 2006).
According to Google, the optimization of search engines has been especially affected by number 4 above. In this regard, the company emphasizes that, "While search engine optimizers can provide clients with valuable services, some unethical search engine optimizers have given the industry a black eye through their overly aggressive marketing efforts and their attempts to manipulate search engine results in unfair ways" (Search Engine Optimization, 2009, para. 3).
Usability Engineering
A wide range of potential benefits can result from implementing a usability engineering initiative for a given Web site to improve targeted searches, depending on the business goals of the sponsoring company. In this regard, Ratner (2003) emphasizes that these benefits extend across the entire range of the enterprise. For instance, Ratner notes that, "The site development team realizes savings, as problems are identified early, when they are cheap to fix. The customer support team realizes a reduced call support burden. More usable sites will have higher buy-to-look ratios, a lower rate of abandoned shopping carts due to errors and confusion, and increased return visits" (p. 71). In addition, Web sites that have undergone usability engineering will tend to experience fewer failed searches, fewer errors, and therefore increased user productivity (Ratner, 2003).
The key aspects of user accessibility for Web sites are (a) audience, (b) content, and (c) usability; therefore, it is vitally important to identify the prospective targeted users for the Web site so that relevant information can be provided. In his regard, Chandra and Kumar (2001) report that, "Content of the web site is strongly linked with audiences' information needs and its presentation. The way to increase value of the web site to users is to enhance the quality of its contents" (p. 179). In fact, the usability of a Web site is designed to facilitate the ability of users to acquire specific types of information from it. As Chandra and Kumar emphasize, "The construction of the Web site offers utility to its primary user. It is designed to navigate important focus areas and drilling down for details on these topics. For a web-design to provide an effective user interface, its architecture must incorporate certain characteristics that can best be achieved using a modular Web design (Chandra & Kumar, 2001).
Assimilation and dissemination are main components of a modular web design. Assimilation relates to identifying, locating, and selecting information pertinent to the web site.
1. Identifying published literature and related areas appearing in, (a) books and articles, (b) Web pages maintained by peers in academe, and industry.
2. Locating topics via search in research indexes.
3. Selecting information relevant to the focus of research. That is, pruning information for common interface.
The problems associated with assimilation of information relate to validity, currency, and reliability of sources. This is verified by ensuring that information captured:
1. Is relevant to the search criteria.
2. Is current in terms of its publication date or significance of the problem.
3. Can be relied upon based on its citation record and applicability to problems.
The dissemination process involves creating a database and a Web site for its use. Its main activities are: organizing information in supply chain management in a database using clusters of related topics:
1. Accessing information for topics through structured menus and formatted pages.
2. Providing connections for links between information across networks.
Because of the enormous amount of information that is posted on the Web, investigating layouts, connecting methods and software tools enable developing improved techniques for this activity. A strategy for design of the web site helps in focusing information presented towards its target audience. A design also facilitates adopting appropriate models) for retrieving information needs, identified in the web-design problem. Scenarios varying in user profile suggest differing opportunities for interactivity and information manipulation in Web site design (Chandra & Kumar, 2001).
Optimizing Searches Based on Personal Preferences
Many Web users may not realize just how much information is already being collected about them during their normal use of the WWW. According to Burnett and Marshall (2002), "One of the key functions of any Web site that provides information to individuals is to have them provide information as a form of access to the site. Passwords and sign-ons are often that starting point for individuals to provide further demographic information and details of interest areas" (p. 29). These approaches to Web search optimization have a number of advantages for the users. For instance, Burnett and Marshall point out that, "Many users are drawn to free e-mail accounts offered by indexers such as Yahoo! Or software providers such as Microsoft. With Yahoo.com, the new e-mail account personalizes their use of Yahoo!'s search engine devices; in setting up that account you are asked to indicate your age, occupation, address, previous e-mail account, income and gender along with other details of how you like to use the Internet" (p. 29). In fact, Google and other search engines already taken into account the user's search history to help fine-tune the results that are provided whether or not any actual demographic information has been provided to the service (Google Web history, 2009). From the perspective of the Web user, the information that is provided in response to these standard questions appear to be intended to fine-tune the search capabilities of the provider to better match the user's needs (Burnett & Marshall, 2002).
The downside to this approach to search optimization, though, is that users are trapped to some degree in a Web environment that is characterized by high levels of provider-affiliated content. In this regard, Burnett and Marshall note that, "With Yahoo!, the service moves beyond the provision of e-mail to helping set up a 'personalized' homepage that provides links to the user's requested areas of interest. The new homepage is thus composed of these categories that appear simply to connect to the user's interests; but they are linked to specific, allied Web sites for information, news and weather with which Yahoo! has developed content deals" (p. 30). This constraint means that users are not searching the entire WWW to find the most relevant matches to their queries, but are rather limited by the personal information they have provided. While it is possible to move beyond this personalized provider homepage to search the entire Web, researchers who rely strictly on their personalized homepage will not enjoy the benefits of information that may be more suited to their needs. According to Burnett and Marshall, "Although the user is free to move to the wider Web and other search engines, the presented space and information helps hold the user into a pattern of use. This constructing of a controlled space was pioneered by America Online, where content and surfing the net were more or less organized into a closed network" (2002, p. 30).
One approach to optimizing Web searches was introduced by Datahost (http://www. datahost.com/) launched a new division that focuses on search engine optimization called, "Hero Web Marketing and Design." According to Datahost's chief executive officer, Michael Stearns (2008), "Since 1996, Datahost has provided Web site development services to hundreds of clients located throughout the United States. The addition of Hero Web expands the company's products and services to encompass skilled in-house search engine optimization services, search marketing solutions and strategic consulting as well as Web design and hosting" (quoted in Connor at 27). The focus on search optimization was based on the company's recognition that many users are faced with a flood of information when they actually required specific information that could be more readily provided through improved search algorithms. In this regard, Stearns added, "We know that in order to be successful in an increasingly competitive e-commerce world, our clients' Web sites need to be optimized for natural search engine traffic, and they need strategies to increase their online name recognition" (quoted in Connor at 27).
Another approach that has been used by the developers of CommunityWeb, Inc. (http://CommunityWeb.com) is to provide its subscribers with a proprietary, one-of-a-kind search technology that delivers a complete range of connectivity, as well as other Web-related solutions (Kuratko & Matthews, 2004). According to Kuratko and Matthews, the search technology that is used to support CommunityWeb "enables businesses, nonprofit organizations, and individuals to interact in a specific geographic locale via the Internet. People who use the CommunityWeb portal to search CommunityWeb's Web site database will find content that is of local relevance, including news, sports scores, classified ads, and even a community calendar" (p. 265).
Moreover, Web searches are based on the individualized needs of the subscriber based on their demographic data, and zooms in on Web sites and pages that match the users' unique requirements. In this regard, Kuratko and Matthews note that, "This free and simple search process is done based on the individual needs of each end-user. This allows each user to search for information, goods, and services in their hometown or any specific area without wading through non-relevant links that traditional search engines yield" (p. 266).
In fact, even Microsoft's application, BackOffice, features a personalization system that allows Web sites to provide information and a more personalized online experience based on preset user preferences (Veltman, 2006). According to Veltman, though, these approaches to a personalized Web experience are not without their problems. For instance, this author notes that, "NetMind has a patented technology that lets you track any Web page at any level of detail, including images, forms, links and keywords, then alerts you via mobile access (cell phone, PDA, pager) or email" (p. 205). In response to these recent trends, an organization was formed called the "Personalization Consortium" that seeks to promote appropriate use of personalization applications (Veltman, 2006).
The official Web site for the Personalization Consortium states, "The Personalization Consortium is an international advocacy group formed to promote the development and use of responsible one-to-one marketing technology and practices on the World Wide Web. The consortium encourages the growth and success of electronic commerce that delivers the benefits of personalized electronic marketing while articulating best practices and technologies that protect the interests of consumers" (para. 1). In order to achieve its objective of expanding the range and employment of personalization technology that provides users with appropriate levels of privacy, the Personalization Consortium (a) provides a forum for industry discussion and information, (b) sponsors research, (c) develops standards for technology and best practices, and (d) works towards educating consumers concerning these applications (the Personalization Consortium, 2009). To achieve these goals, the consortium promulgated its Ethical Information and Privacy Management Objectives that set forth these objectives in order to provide Web users with the confidence and information they need to use personalization software effectively and safely (the Personalization Consortium, 2009).
Unified Modeling Language
One application that has been especially useful in Web site design that promotes the ease of access and improves the ability of Web users to find the precise information they are searching for is the Unified Modeling Language or UML. According to Diaper and Stanton (2004), the modeling techniques used by UML are highly regimented and require some degree of training to be used effectively and appropriately. For instance, these authors notes that, "The UML specification (Unified Modeling Language Specification v 1.4, 2001) is very prescriptive in its symbolism of text, nodes, lines, and shapes" (Diaper & Stanton, 2004, p. 427). In addition, UML uses more than a dozen diagrams to illustrate the relationship between the constituent components of a model as described in Table __ below.
Table
Description of Diagrams Used in UML
Diagram
Description
Activity Diagram
Depicts high-level business processes, including data flow, or to model the logic of complex logic within a system.
Class Diagram
Shows a collection of static model elements such as classes and types, their contents, and their relationships.
Communication Diagram
Shows instances of classes, their interrelationships, and the message flow between them. Communication diagrams typically focus on the structural organization of objects that send and receive messages. Formerly called a Collaboration Diagram.
Component Diagram
Depicts the components that compose an application, system, or enterprise. The components, their interrelationships, interactions, and their public interfaces are depicted.
Composite Structure Diagram
Depicts the internal structure of a classifier (such as a class, component, or use case), including the interaction points of the classifier to other parts of the system.
Deployment Diagram
Shows the execution architecture of systems. This includes nodes, either hardware or software execution environments, as well as the middleware connecting them.
Interaction Overview Diagram
A variant of an activity diagram which overviews the control flow within a system or business process. Each node/activity within the diagram can represent another interaction diagram.
Object Diagram
Depicts objects and their relationships at a point in time, typically a special case of either a class diagram or a communication diagram.
Package Diagram
Shows how model elements are organized into packages as well as the dependencies between packages.
Sequence Diagram
Models the sequential logic, in effect the time ordering of messages between classifiers.
State Machine Diagram
Describes the states an object or interaction may be in, as well as the transitions between states. Formerly referred to as a state diagram, state chart diagram, or a state-transition diagram.
Timing Diagram
Depicts the change in state or condition of a classifier instance or role over time. Typically used to show the change in state of an object over time in response to external events.
Use Case Diagram
Shows use cases, actors, and their interrelationships.
Source: ARIKAN Productivity Group GesmbH, 2007 at http://www.modelcvs.com/modelcvs / template.do?action=modelcvsbyAPG.uml2diagrams
The model-based approach that is used by UML is commonplace in software engineering applications. In this regard, Diaper and Stanton (2004) advise, "If we consider UML, one of the most successful model-based approach for the design of software systems, we notice a considerable effort is usually made to provide models and representations to support the various phases and parts of the design and the development of software applications" (p. 494). None of the nine modeling representations available through the use of UML, though, is especially effective in supporting the efficient design of user interfaces. It is possible, however, to integrate UML with task models to support user interface design using various strategies that can take advantage of the extensibility mechanisms built into UML itself (constraints, stereotypes, and tagged values); in this way, the usefulness of UML can be extended without requiring changes in the basic UML metamodel (Diaper & Stanton, 2004). A description of these various strategies is provided in Table __ below:
Table
Strategies for Integrating UML with Task Modeling for User Interface Design
Strategy
Description
Advantages/Disadvantages
Representing elements and operators of a task model by an existing UML notation.
For example, if the ConcurTaskTrees model is viewed as a forest of task trees, where ConcurTaskTrees (CTT) operands are nodes and operators are horizontally directed arcs between sibling nodes, designers can represent the model as UML class diagrams. Specific UML class and association stereotypes, tagged values, and constraints can be defined to factor out and represent properties of and constraints on CTT elements.
It would be possible to have a solution compliant with a standard that is already the result of many long discussions involving many people. This solution is surely feasible. constraints associated with UML class and association stereotypes can be defined so as to enforce the structural correctness of ConcurTaskTrees models. However, two key issues arise: whether the notation has enough expressive power and whether the representations are effective and support designers in their work rather than complicate it. The usability aspect is important not only for the final application but also for the representations used in the design process. For example, activity diagrams are general and provide sufficient expressive power to describe activities; however, they tend to provide lower level descriptions than those in task models, and they require rather complicated expressions to represent task models describing flexible behaviors.
Developing automatic converters from UML to task models.
For example, it is possible to use the information contained in system-behavior models supported by UML (i.e., use cases, use case diagrams, and interaction diagrams) to develop task models.
Some of the problems associated with this straegy include the fact that it is difficult to first model a system in terms of object behaviors and then derive a meaningful task model from such models. The reason is that object-oriented approaches are usually effective for modeling internal system aspects but less adequate for capturing users' activities and their interactions with the system.
Building a new UML for interactive systems
A new UML can be obtained by explicitly inserting ConcurTaskTrees in the set of available notations while still creating semantic mapping of ConcurTaskTrees concepts into a UML metamodel. This encompasses identifying correspondences, at both the conceptual and structural levels, between ConcurTaskTrees elements and concepts and UML ones and exploiting UML extensibility mechanisms to support this solution.
This strategy offers more promise as a way to capture the requirements for an environment supporting the design of interactive systems; however, care should be taken to ensure that software engineers who are familiar with traditional UML can make the transition to this new method easily and that the degree of extension from the current UML standard remains limited. More specifically, use cases could be useful in identifying tasks to perform and related requirements, but then there is no notation suitable for representing task models, although there are various ways to represent the objects of the system under design. This means that there is a wide gap that needs to be filled in order to support models able to assist in the design of user interfaces.
Source: Diaper & Stanton, 2004, p. 494
During the definition of a UML for interactive systems, designers have the ability to specifically introduce the use of task models represented in ConcurTaskTrees (CTT); not all UML notations are equally relevant to the design of interactive systems, though; the most important in this respect appear to be use cases, class diagrams, and sequence diagrams. In the initial part of the design process, during the requirement elicitation phase, use cases supported by related diagrams should be used. Use cases are defined as coherent units of externally visible functionality provided by a system unit. Their purpose is to define a piece of coherent behavior without revealing the internal structure of the system. They have shown to be successful in industrial practice (Diaper & Stanton, 2004).
The task-modeling phase is the next in line which allows designers to obtain an integrated view of functional and interactional aspects. In particular, interactional aspects (aspects related to the ways of accessing system functionality) cannot be captured well in use cases. In order to overcome this limitation, use cases can be enriched with scenarios (i.e., informal descriptions of specific uses of the system). More user-related aspects can emerge during task modeling. In this phase, tasks should be refined, along with their temporal relationships and attributes. The support of graphically represented hierarchical structures, enriched by a powerful set of temporal operators, is particularly important. It reflects the logical approach of most designers, allows the description of a rich set of possibilities, is declarative, and generates compact descriptions (Diaper & Stanton, 2004).
In parallel with the task-modeling work, the domain modeling is also refined. The goal is to achieve a complete identification of the objects belonging to the domain considered and the relationships among them. At some point there is a need for integrating the information between the two models. Designers need to associate tasks with objects in order to indicate what objects should be manipulated to perform each task. This information can be directly introduced in the task model. In CTT it is possible to specify the relationships between tasks and objects. For each task, it is possible to indicate the related objects, including their classes and identifiers; in the domain model, though, more elaborate relationships among the objects are identified (e.g., association, dependency, flow, generalization, etc.), and they can be easily supported by UML class diagrams (Diaper & Stanton, 2004).
There are two general kinds of objects that should be considered: presentation objects, those composing the user interface, and application objects, which are derived from the domain analysis and responsible for the representation of persistent information, typically within a database or repository. These two kinds of objects interact with each other: Presentation objects are responsible for creating, modifying, and rendering application objects. The refinement of tasks and objects can be performed in parallel so that first the more abstract tasks and objects are identified and then the more concrete tasks and objects. At some point, the task and domain models should be integrated in order to clearly specify the tasks that access each object and, vice versa, the objects that are manipulated by each task (Diaper & Stanton, 2004).
According to Fesenko (2008), one way of integrating user profiles using UML is to used the "Profile" mechanism. In this regard, Fesenko reports that, "UML cannot cover needs of all possible domains. Standard UML metamodel has to be augmented in order to meet requirements of a particular domain. That's why Profile mechanism was created - a way to create a lightweight extension of Standard UML" (2008, para. 1). The workflow of creating and using UML profile is illustrated in Figure __ below.
Figure ____. Workflow of Creating and Using UML Profile
Source: Fesenko, 2008, para. 2 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_UML_Profiles
The steps involved in workflow needed to create and use the UML Profile feature for this purpose are as follows:
Step One: Create Profile Definition Diagram using wizard 'Profile Definition Diagram' from 'UML 2.1 Diagrams' located in File > New > Other > ... menu item. Profile is a root element of the created diagram. Key elements of Profile Definition diagram are: (a) Profile; (b) Stereotype; (c) Metaclass; and (d) Extension link.
Figure __. UML 2.1 Specification for Defining a Simple EJB Profile
Source: Fesenko, 2008, para. 3 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_ UML_Profiles
Step Two: When profile is completely finished, it's time to define it. This is obligatory step in our workflow, because it saves defined profile as a static Ecore structure in the UML model, it allows subsequent use of profile content. In order to define a profile in UML2 Tools call action "Profile > Define" from the context menu of the profile diagram:
Figure __. Definition of Profile
Source: Fesenko, 2008, para. 3 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_ UML_Profiles
The next step is to register the profile. Adding profile to the registry simplifies profile application later. The step can be omitted. Profiles are registered in plugin descriptor (plugin.xml file) using 'org.eclipse.uml2.uml.dynamic_package' extension point. The plugin should be deployed to the platform later. For example, UML Standard profile is registered this way:
The next step is to apply the profile so that the stereotypes defined in the profile to elements in the model can be applied. To Apply profile call 'Apply Profile> [Profile Name]' action from the context menu of the diagram as shown in Figure __ below.
Figure __. Apply UML Profile Function
Source: Fesenko, 2008, para. 4 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_ UML_Profiles
The profiles that appear in the drop-down list in the 'Apply Profile' action are those profiles that had been registered or profiles from loaded resources. To make the particular profile appear in the list, register it or load corresponding *.profile.uml resource.
The penultimate step is to load resource defining profile as follows:
1. Call 'Load resource...' action from the context menu of the diagram.
2. In the displayed wizard choose 'Browse Workspace...' button.
3. Choose needed resource, file containing profile ends with 'profile.uml':
Figure __. Loading Profile Resources Sample Screenshot
Source: Fesenko, 2008, para. 4 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_ UML_Profiles
The final step in creating extended element is applying a stereotype. Stereotypes can be applied to both nodes and links. Create an element of the extended metaclass, in the context menu of the newly-created element call 'ApplyStereotype>[name of stereotype]':
Figure ____. Stereotype Application in UML
Source: Fesenko, 2008, para. 5 at http://wiki.eclipse.org/MDT-UML2Tools_How_To_ Use_ UML_Profiles
Finally, a current catalog of modeling and metadata specifications for UML is provided at Appendix a.
Chapter 3:
Methodology
Description of the Study Approach
In order to provide a well-rounded and robust response to the research purpose described in the introductory chapter, this study used a mixed methodology consisting of a review of the relevant juried, scholarly and organizational literature concerning search optimization of the World Wide Web combined with a meta-analysis of relevant studies and texts. This approach is congruent with a number of social researchers who suggest that a review of the literature is an important first step in virtually any type of research project today. For example, Fraenkel and Wallen (2001) report that, "Researchers usually dig into the literature to find out what has already been written about the topic they are interested in investigating. Both the opinions of experts in the field and other research studies are of interest. Such reading is referred to as a review of the literature" (p. 48). Moreover, a well conducted literature review can also serve to identify any existing gaps in the relevant literature. For example, Gratton and Jones (2003) note that, "A literature review is the background to the research, where it is important to demonstrate a clear understanding of the relevant theories and concepts, the results of past research into the area, the types of methodologies and research designs employed in such research, and areas where the literature is deficient" (p. 51).
The use of the meta-analysis component is also congruent with various social researchers who emphasize the approach's ability to provide a synthesis of studies of various types such as those encountered in this study. In this regard, Baskin and Enright report that, "Meta-analysis is a popular vehicle of synthesizing results across multiple studies" (p. 79). Although a number of meta-analyses are primarily quantitative in nature, this study employed a qualitative meta-analysis of both qualitative and quantitative studies. This approach is congruent with Hall (1997) who advises, "A meta-analysis integrates the results of separate studies and investigations. A quantitative meta-analysis correlates diverse data and/or mathematical models. A qualitative meta-analysis links theories and equivalents by demonstrating logical relationships" (p. 387). Likewise, Allen, Burrell, Eayle and Preiss (2002) emphasize that, "The ability of this particular method to generate a synthesis permits the scientific community to generate a consensus about information that permits ease of representation and a more complete analysis" (p. 385). According to Banyard and Miller (1998), "qualitative research methods are ideally suited to putting a valuation of diversity into practice. The link between qualitative research and diversity can be seen when considering qualitative research as purely a set of tools or methods, and also when examining it as reflective of an alternative research paradigm" (p. 485).
Data-Gathering Method and Database of Study
The data-gathering method used in this study consisted of an exploratory approach that can be likened to an inverted pyramid. General information concerning the World Wide Web and search optimization was followed by a more specific search for ways in which user profiles could be used to optimize Web searches and what tools are available to software designers and engineers to facilitate this. The database of study consulted included both university and public libraries, as well as reliable online research services such as EBSCO and Questia.
Chapter 4:
Data Analysis
Table
World Wide Web
Author/Date/Title / Publication
Key Findings
Comments
Burnett, R. & Marshall, P.D. (2002). Web theory: An introduction. London: Routledge, pp. 28-29.
The Web has developed not only the graphic architecture of point and double-click from one Web page to another, but also the hypertextual interconnections of Web pages. The powerful search engines such as Lycos, Alta Vista, and Google provide the powers to do searches of material throughout the Web. The Web has transformed the personal computer into a smarter machine. The search technologies, which have been instrumental in connecting the wide array of Web sites, have gained in sophistication over time through essentially cybernetic techniques. The Boolean search, where searches of databases link the occurrence of one word or phrase with another and then display those sources that have both of these elements in their contents, is at the base of the Web search engines. The wondrous quality of the Web is this new power to search and find and, of course, enjoy and use what you have been able to locate. With the exponential growth of Web sites, the capacity to search the entire Web has made the search engines some of the nodal hubs of the entire Web - in essence weigh-stations for Web traffic.
The personal computer now has the capacity to seek out and through a combination of automated and human-led interventions arrive at particular points where one can find very specific material.
Hoy, M.G. & Lwin, M.O. (2007). Disclosures exposed: Banner ad disclosure adherence to FTC guidance in the top 100 U.S. Web sites. Journal of Consumer Affairs, 41(2), p. 285.
The relevancy of search results on the WWW are affected by Internet advertising which appears in a variety of forms: search, classifieds, lead generation/referral, sponsorship, e-mail, and display. Search-related advertising encompasses a variety of categories including paid listings where text links appear along the side or top of the search results page for specific search terms, contextual search where a text link is within page content rather than resulting from user-generated terms, and paid inclusion that guarantees that the marketer's URL is indexed by the search engine and site optimization. For classifieds, advertisers pay a fee to online companies to list specific products or services. Similarly, for lead generation/referrals advertisers pay fees to online companies that refer qualified purchase inquiries.
You’re 81% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.