Paper Example Undergraduate 4,443 words

Literature survey on database advances and critical issues

Last reviewed: July 20, 2011 ~23 min read

¶ … geometrically, various query languages has been developed in response to help access and retrieve information of interest from these resources. Although query languages differ in terms of their functionality and applicability, they share certain commonalties and provide a useful framework in which to examine current trends and project future developments. To this end, this study provides a review of the relevant peer-reviewed and scholarly literature, as well as reliable online resources, to develop a background and overview and specifics concerning query languages and query optimization. An analysis of current trends and projections of these trends into the future is followed by a summary of the research in the conclusion.

Query Languages and Query Optimization

Introduction

In the Age of Information, making sense of all of the available resources has been likened to trying to drink from a fire hose. Moreover, innovations in geographic technologies have added to the flood of information, and making sense of it all in terms of the when, where, and what aspects of an event or issue require specialized queries. Therefore, identifying the current level of technological development as well as recent trends can establish relevant benchmarks that can be used by researchers in the future, as well as providing a snapshot of these issues as they currently exist. This snapshot can be used to extend the recent trends in query language and optimization into the future given the known increases in computer processing speeds. To this end, this study provides a review of the relevant peer-reviewed and scholarly literature, as well as reliable online resources, concerning query languages and query optimization, prefaced by a background and overview section, as well as a projection of recent trends into the future in the conclusion.

Review and Analysis

Background and Overview

In this regard, Bidlack and Wellman (2010) recently observed that an increasingly wide array of information resources has driven the demand for more efficient ways to access the specific information that is needed [1]. Indeed, the efficient use of information resources has become an integral part of many business models today. For instance, according to Webster, "It is impressive to watch the rapid development of the online world, encompassing the World Wide Web, journal and reference databases, library catalogs, e-books, and other e-content. We have come to depend on this newer online world as it grows more powerful and more complex" [2].

Satisfying the need for accessing information systems in general and databases in particular involves developing and applying some type of query. For instance, Calvanese and de Giacomo report that, "Data sources have been considered simply as systems that provide data but make no further contribution to the query-answering process" [3]. With Internet browsers, this need can be satisfied by a simple and straightforward search using Google, for example, but accessing various types of databases requires specialized query techniques and languages. For instance, a database management system (DBMS) is a program that can input, edit and retrieve information from a database. A database is a collection of information organized into records and fields, and stored as files on a computer. Sometimes the term database is used to include the DBMS as well. Relational, object-oriented, network, flat and hierarchical are all types of DBMS. They differ in how they organize information for storage. Retrieval from a DBMS requires a query language, a structured way for expressing search requests. Relational DBMS alone have a standard query language called SQL (structured query language) [4]. Structured Query Language (SQL) is really a programming language designed to get information out of and then put it into a relational database. Queries are constructed from a command language that allows one to select, insert, update and locate data. SQL is a recognized standard [5]. Other query languages have more focused applications, while others are designed to provide searches in more broad-based settings and these various languages are discussed further below.

Query Languages

In order to understand how query languages work, it is important to describe the environment in which they function. Although query languages differ in terms of purpose, functionality and interface, they share a common feature with respect to access data represented in a database in some fashion and returning the aggregated results to the user. A representational schema needs to anticipate all possible queries and analysis to be performed in an information system. Since relational database management systems were dominant, respective temporal languages were developed as extensions of standard query language (SQL) [6]. Query languages can be designed to allow query computation to be performed internally and to ease users from having to remember technical operations in a query process [7].

The when, where and what (the so-called "spatio-temporal) factors related to query languages are presented in Table 1 below.

Table 1

Query Language Spatio-Temporal Factors

Query Language

Description

Query Spatio-Temporal Information about When

This kind of query is used to obtain information on temporal objects. Answers can be obtained by referring semantic or spatial objects to temporal objects through a proper relation table. Life-oriented questions inquire when birth, death, splitting, merging or reincarnation occur in a certain period of time, while motion-oriented questions ask when a move, jump or spread takes place.

Query Spatio-Temporal Information about Where

This type of query aims at obtaining information about spatial objects for locations and spatial properties of a semantic object at a specific time. Where questions can be static (asking whereabouts or states of entities or attributes) or dynamic (asking paths of an entity changing its location through time).

Query Spatio-Temporal Information about What

This type of query seeks information about changes in which the focal information is semantic objects such as changes of supermarket services for a particular area. We first identify the area of interest, and then examine what has been changed in that area by referring to its corresponding semantic objects at that time.

Source: Frank 2001, p. 226

In recent years, database engineers have proposed the development of pure database models and query languages that can be used for representing and handling semi-structured data (SSD) in order to facilitate the processing of information exchanges and the efficiency of software systems in a broad array of applications, including genome databases, digital libraries and electronic commerce platforms [8]. According to Stefankis, "In a SSD set it is expected that there will be objects with missing attributes, objects with multiple occurrences of the same attribute, different data types associated with the same attribute in different objects, and/or semantically related information represented differently in various objects. All the above factors render traditional data models (such as the relational and the object-oriented model) inadequate to represent and handle SSD sets" [9].

Recent innovations in technological development have provided two basic alternatives for modeling semi-structured data sets, with the first alternative being created by the database community and the second alternative from software engineers who are actively involved in the development of Web-based technologies to query semi-structured data [10]. Some of the most well-known examples of such languages include the Object Exchange Model (OEM) and the Lightweight Object REpository query Language (LOREL) [11].

Some other current examples of query languages and their applications are presented in Table 2 below.

Table 2

Query Language Types

Query Language

Description

eXtensible Markup Language (XML)

Web-based technology offers the eXtensible Markup Language (XML) and its surrounding technologies that are well suited for modeling and querying semi-structured data (SSD) sets [12]. Extensible markup language is a flexible way to create standard information formats and share both the format and the data on the World Wide Web. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable way [13]. Prior to the introduction of XML, there was SGML (Standard Generalized Markup Language), which was developed in the early '80s and widely used for large documentation projects. The development of HTML (Hyper Text Markup Language) started in 1990. The designers of XML simply took the best parts of SGML, guided by the experience with HTML, and produced something that is no less powerful than SGML; however, it is vastly more regular and simpler to use. It must be said that SGML is mostly used for technical documentation and much less for other kinds of data; with XML, it is exactly the opposite. XML is a pared-down version of SGML, designed especially for Web documents. It allows designers to create customized TAGS (a special word inserted in a document that specifies how the document, or a portion of the document, should be formatted), enabling the definition, transmission, validation, and interpretation of data between applications and organizations [14].

SPARQL

Relationship networks can be queried using Simple Protocol and RDF Query Language (SPARQL -- pronounced "Sparkle"). SPARQL is a form of Structured Query Language (SQL) specifically used for querying within inference software [15].

Web Ontology Language (OWL-DL.3)

These standards-based knowledge representation mechanisms provide computationally feasible knowledge representation (KR) for business processes. (OWL) is a W3C standard for semantic knowledge representation. Web Services and Web Services Architecture provide envelope and transport mechanisms for information and knowledge exchange [16]. Queries are posed in terms of a certain query language over the alphabet of the global ontology and are intended to extract a set of tuples of elements of the semantic domain. In accordance with what is typical in databases, each query is required to have an associated arity and that it extract only tuples of that arity. Given a source database for O, the tuples of interest are those that are guaranteed to be in the answer of the query for every model for O. with respect to the source database. In other words, certain answers are of interest. One of the most common ways to express knowledge on a domain of interest is to use class-based formalisms, in which knowledge is represented in terms of objects grouped into classes and relationships between classes. Examples are entity-relationship diagrams in databases, UML class diagrams in software engineering, and ontology languages for the semantic web such as OWL-DL. All such formalisms can be captured in a fragment of first-order logic in which one can express inclusions and equivalences between classes and possibly pose additional constraints on the relations between classes. Such fragments correspond to a class of logics called description logics [17]. The Web Ontology Language (OWL) is a World Wide Web Consortium Standard and a leading approach to semantic Web ontologies. OWL-Description Logics (OWL-DL) uses DL as its fundamental knowledge representation mechanism. Ontology descriptions are presented formally through description logics for theoretical soundness; and in machine readable format using an OWLDL to provide practicality for our model. Software reasoners, such as Racer, support concept consistency checking, T-Box reasoning, and A-Box reasoning on models developed using SHIQ description logics translated into OWL-DL. These provide the basis for development of a knowledge base of machine interpretable knowledge representation, in OWL-DL format, that can be used for developing computational ontologies for knowledge integration in inter-organizational eBusiness processes [18].

Description Logic ALCQI

ALCQI is a notable example of an expressive DL that features constructs that are typical of conceptual modeling formalisms and that in fact allow ALCQI to capture the most important features of such formalisms. The ALCQI DL provides concept constructs for complement, intersection, union, existential restriction, universal quantification, and number restrictions. As for roles, it provides the construct for inverse roles [19].

Contextual Query Language (CQL)

This is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.

Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery); or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and Google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accommodate complex concepts when necessary [20]. CQL is so-named ("Contextual Query Language") because it is founded on the concept of searching by semantics or context, rather than by syntax. The same search may be performed in a different way on very different underlying data structures in different servers, but the important thing is that both servers understand the intent behind the query. In order for multiple communities to define their own semantics, CQL uses Context Sets in order to ensure cross-domain interoperability [21].

XPath

This is a node addressing language that is used with XML documents [22]. This query language allows users to query for various index services and results are gathered and returned to the user based on the following steps:

1. The client sends the search request to its nearest DDS with the XPath query.

2. The DDS contacts with a FADA node to search all the Index Services (ISs) of the system.

3. The ISs search request is broadcasted to all the Federated Advanced Directory Architecture (FADA) nodes using the FADA internal protocol.

4. The FADA node returns to the DDS the list of ISs.

5. The DDS contacts with each IS making the XPath Query.

6. Finally, the results are gathered to the client [23].

Z39.50 search standards for cross-search capabilities for library catalogs

The Open Archives Initiative (OAI) has become another approach to integrated online searching. OAI is a protocol for the automated harvesting of descriptive and location metadata about content from diverse online sources. Metadata is stored in a common index database where it can be searched. Searchers can then be automatically routed to the source content for any retrieved search results. This is basically the same technique employed by Web search engines, which use automated software to collect information about many Web sites, storing it in a common index. The metadata harvesting approach differs considerably from the approach taken by Z39.50.

Z39.50 searches many silos by passing a query to each separate database in a common query language. Responses are then received back from each database, in turn. The broadcast search method is similar to the approach used by many metasearch tools such as WebFeat and MuseGlobal. While Z39.50 searches rely on a common protocol and query language, metasearch tools may have to translate each separate query to suit the individual data source (information silo) being searched. While broadcast searching has had some success, as the number of different online resources grows, metadata harvesting seems to be the more promising approach to search integration [24].

ActivePrime

ActivePrime leverages AI-related techniques in three broad categories: lightweight ontologies, search-space reduction (SSR), and query optimization. In summary, lightweight ontologies are deployed as modules and classes in the Python programming language, enabling rapid, iterative development of ontologies using a popular scripting language. The ontologies also benefit from the large repository of built-in Python operators. Sophisticated operations on ontologies can be performed with just a few lines of code. SSR techniques are utilized when performing inexact matching on larger volumes of data, when record counts grow into the many thousands and millions. Query optimization techniques allow for real-time detection of duplicate records when matching one record to a large remote data base [25].

Query Optimization

Query optimization is used to allow the most efficient matching of a queried record to a remote database. The query optimization process is fairly straightforward, but the actual process that is used by the query language and the search strategies that are employed by human users will inevitably differ. Generally speaking, though, in order to optimize a query, a user's query is analyzed using the context of the fields (such as company name or state name) together with relevant domain knowledge to expand the query to gather additional information or information about expanded areas of interest. In this regard, Bidlack and Wellman give the example, "For instance, the state Massachusetts may have MA and Mass as synonyms and the query is expanded appropriately" [27]. Although the query optimization process appears intuitive and easy from a human's perspective (perhaps because the process resembles how the human brain processes information in ways that allow for its later access and retrieval), the algorithms that guide the query optimization process are truly sophisticated and robust in specific areas depending on their application. According to Bidlack and Wellman, "Besides expansion through domain knowledge, query expansion occurs using phonetic rules as well as heuristics around transposition and removal of characters. The query optimizer effectively constructs a query that has a high probability of finding potential inexact matches while only retrieving a very small subset of the remote database. The subset of records is then analyzed using SSR techniques to compute actual inexact matches" [28]. The results that are returned to the user are therefore based on several steps that winnow data to provide the most meaningful results to the query, with the respective strengths and effectiveness of each query optimization approach largely depending on the platform or information resource that is involved.

Semantic query optimization (SQO) is a real-time computing system retains local control while scaling to numerous other machines but does not use centralized query optimization and scheduling techniques [29]. Rather than using centralized query optimization and scheduling techniques, the semantic query optimization approach is "An enterprise-class data federation system supports dynamic load balancing across system resources. As loads on individual machines and networks change, the system adapts and adjusts query execution. As a result, the system can support many machines with high performance and throughput. Such a system can be viewed as the complement to a transactional approach. From this view, this system is capable of obtaining the desired data in real-time or as near to real-time as possible" [30].

Semantic query optimization uses so-called "integrity constraints" to restrict a search to make it more efficient. According to Minker, integrity constraints that are specifically designed for SQO have been developed for application in deductive databases (DDBs) as well as relational databases; in addition, other researchers have also developed a partial subsumption algorithm for use with the SQO [31]. The general approach to SQO has also been refined:

(1) To allow bottom-up evaluations;

(2) To allow searches of databases with negation in the body of clauses;

(3) To provide processing of recursive rules

(4) To provide the foundation for cooperative answering systems and provide information to users about why a particular query succeeded or failed [32].

The reasons a particular query succeeds or fails in a given setting can help important implications for future searches. For instance, Minker reports that, "When a query fails, a user, in general, cannot tell why the failure occurred. There can be several reasons: The database currently does not contain information to respond to the user, or there will never be an answer to the query. The distinction could be important to the user. Another aspect related to integrity constraints is that of user constraints" [33]. The constraints that users can place in their queries to optimize the searching function depend on their individual preferences and needs, with various Boolean and other operators being available in different query languages to facilitate the process. In this regard, Bidlack and Wellman report that, "A user constraint is a formula that models a user's preferences. It can constrain providing answers to queries in which the user might have no interest (for example, stating that in developing a route of travel, the user does not want to pass through a particular city) or provide other constraints that might restrict the search" [34]. In sum, various query optimization approaches have been developed that significantly improve the search function and performance of the operating systems on which they are used. These innovations are important for a number of reasons, for both the public as well as the private sectors. As Bidlack and Wellman point out, "With corporate databases for even midsize companies now growing into the millions of records, and the desire for better results as enabled by domain knowledge integration, high-performance matching is becoming critical to user adoption of any data-quality solution" [35].

You’re 81% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime

Cite This Paper

PaperDue. (2011). Literature survey on database advances and critical issues. PaperDue. https://www.paperdue.com/essay/geometrically-various-query-languages-has-43424

Always verify citation format against your institution’s current style guide requirements.