¶ … relationships and distinctions between the information systems concepts of data warehousing and data mining, which combined with online analytical processing (OLAP) form the backbone of decision support capability in the database industry. Decision support applications impose different demands for OLAP database technology than the online transaction processing (OLTP) model that preceded it. Data mining with OLAP differs from OLTP queries in the use of multidimensional data models, different data query and analysis tools at both the user-facing front end and the database back end, and different mechanisms for data extraction and preparation before loading into a data warehouse can take place. The construction of data warehouses entails the operations of data cleaning and data integration, which are key pre-processing steps for enabling data mining. Furthermore, the concept of metadata (data about data) is essential to the functioning of a data warehouse, and must be managed appropriately for an effective and efficient installation (Chaudhuri et. al, 1997).
The major commercial players in the data warehousing market today include IBM, Oracle-Sun, Teradata and Microsoft. Data mining functionality is typically included within the data warehousing vendor's software suite. Some vendors have specialized further by creating product suites sold as data warehousing appliances. These consist of an integrated, pre-packaged combination of server and storage hardware, with pre-installed operating system and relational database software that has been optimized for typical medium to large scale customer implementations (Microsoft, 2008).
Gartner (2008) predicted that a fifth of all organizations worldwide would have customized software-as-a- service (SaaS) applications created to supplement their business intelligence operations by 2010. The value-added business of information aggregators is to provide domain-specific analysis capability using competitive business information as a base. This by its nature tends to generate monopolies in vertical information domains, due to the need for aggregators to ensure the confidentiality and secure protection of their clients' sensitive business data. Without proper integration into proprietary internal information stored in data warehouses, customized SaaS-based tools cannot generate the benefits they are expected to provide.
Data warehousing may be defined in its simplest forma as "a process of centralized data management and retrieval" (Palace, 1996). Ideally, a data warehouse is the centralized repository of all of an organization's data, made available for users to access and analyze according to their individual needs through the process of data mining. It provides the tools and mechanisms for business executives to systematically organize, comprehend, and utilize their data to make strategic decisions. In recent years with competition mounting in every industry, data warehousing has become an essential method for organizations to retain customers by learning more about their needs using a solid platform of consolidated historical data and powerful analysis and mining tools (Berson et. al., 1997)
Data mining refers to the ability to enable analysis, categorization and summarization of data from multiple angles or different dimensions. Palace (1996) defines data mining as "the process of finding correlations or patterns among dozens of fields in large relational databases." The relationships, associations, historical patterns and future trends extracted from data in the database are what constitutes useful information or knowledge to the user. Data mining was initially used and promoted by consumer-oriented organizations that needed to deal with large volumes of data related to their business, finances, and customers, so as to be able to effectively design and price their products to address competition and meet customer priorities.
Douq (2009) outlines the set of marketing criteria that most often addressed by vendors of data warehouse products in comparing their own offerings with competing providers. Physical architecture and design, scalability, parallelism, performance and optimization, system availability, ease of operations and management are the subjects most frequently discussed and debated by vendors and analysts in industry circles.
It is useful to distinguish commercial relational databases from the multidimensional database structures used in data mining and warehousing. Traditional relational databases emphasize the operation of normalization (minimizing data redundancy), and are specifically tuned and organized to permit ad-hoc queries upon normalized data stored in tables and indexes. Multidimensional databases organize data in the form of data "cubes," which can be visualized as data sets and subsets implemented in array structures. A data cube consists of a large set of facts or measures, along with a number of associated dimensions. Dimensions are hierarchical entities that the organization wants to record and keep information about (Berson et. al., 1997). For example, a 3-D data cube could display the value of sales dollars, according to the measures of city, product and month sold. A 4-D data cube could add the dimension of year sold to the original three. Figure 1 provides a simplified...
Our semester plans gives you unlimited, unrestricted access to our entire library of resources —writing tools, guides, example essays, tutorials, class notes, and more.
Get Started Now