Relationships and Distinctions Between the Information Systems Essay

Download this Essay in word format (.doc)

Note: Sample below may appear distorted but all corresponding word document files contain proper formatting

Excerpt from Essay:

relationships and distinctions between the information systems concepts of data warehousing and data mining, which combined with online analytical processing (OLAP) form the backbone of decision support capability in the database industry. Decision support applications impose different demands for OLAP database technology than the online transaction processing (OLTP) model that preceded it. Data mining with OLAP differs from OLTP queries in the use of multidimensional data models, different data query and analysis tools at both the user-facing front end and the database back end, and different mechanisms for data extraction and preparation before loading into a data warehouse can take place. The construction of data warehouses entails the operations of data cleaning and data integration, which are key pre-processing steps for enabling data mining. Furthermore, the concept of metadata (data about data) is essential to the functioning of a data warehouse, and must be managed appropriately for an effective and efficient installation (Chaudhuri et. al, 1997).

The major commercial players in the data warehousing market today include IBM, Oracle-Sun, Teradata and Microsoft. Data mining functionality is typically included within the data warehousing vendor's software suite. Some vendors have specialized further by creating product suites sold as data warehousing appliances. These consist of an integrated, pre-packaged combination of server and storage hardware, with pre-installed operating system and relational database software that has been optimized for typical medium to large scale customer implementations (Microsoft, 2008).

Gartner (2008) predicted that a fifth of all organizations worldwide would have customized software-as-a- service (SaaS) applications created to supplement their business intelligence operations by 2010. The value-added business of information aggregators is to provide domain-specific analysis capability using competitive business information as a base. This by its nature tends to generate monopolies in vertical information domains, due to the need for aggregators to ensure the confidentiality and secure protection of their clients' sensitive business data. Without proper integration into proprietary internal information stored in data warehouses, customized SaaS-based tools cannot generate the benefits they are expected to provide.

Data warehousing may be defined in its simplest forma as "a process of centralized data management and retrieval" (Palace, 1996). Ideally, a data warehouse is the centralized repository of all of an organization's data, made available for users to access and analyze according to their individual needs through the process of data mining. It provides the tools and mechanisms for business executives to systematically organize, comprehend, and utilize their data to make strategic decisions. In recent years with competition mounting in every industry, data warehousing has become an essential method for organizations to retain customers by learning more about their needs using a solid platform of consolidated historical data and powerful analysis and mining tools (Berson et. al., 1997)

Data mining refers to the ability to enable analysis, categorization and summarization of data from multiple angles or different dimensions. Palace (1996) defines data mining as "the process of finding correlations or patterns among dozens of fields in large relational databases." The relationships, associations, historical patterns and future trends extracted from data in the database are what constitutes useful information or knowledge to the user. Data mining was initially used and promoted by consumer-oriented organizations that needed to deal with large volumes of data related to their business, finances, and customers, so as to be able to effectively design and price their products to address competition and meet customer priorities.

Douq (2009) outlines the set of marketing criteria that most often addressed by vendors of data warehouse products in comparing their own offerings with competing providers. Physical architecture and design, scalability, parallelism, performance and optimization, system availability, ease of operations and management are the subjects most frequently discussed and debated by vendors and analysts in industry circles.

It is useful to distinguish commercial relational databases from the multidimensional database structures used in data mining and warehousing. Traditional relational databases emphasize the operation of normalization (minimizing data redundancy), and are specifically tuned and organized to permit ad-hoc queries upon normalized data stored in tables and indexes. Multidimensional databases organize data in the form of data "cubes," which can be visualized as data sets and subsets implemented in array structures. A data cube consists of a large set of facts or measures, along with a number of associated dimensions. Dimensions are hierarchical entities that the organization wants to record and keep information about (Berson et. al., 1997). For example, a 3-D data cube could display the value of sales dollars, according to the measures of city, product and month sold. A 4-D data cube could add the dimension of year sold to the original three. Figure 1 provides a simplified example of the 3-D case illustrating the conceptual model.

Figure 1. OLAP Cube (Microsoft TechNet, 2011)

Unlike traditional relational database implementations, data may be repeated or reorganized extensively within a multidimensional database to meet the needs for faster search and query operations. Therefore, the needs of data warehouses are most compatible with data mining operations carried out on multidimensional databases (Palace, 1996). Data warehouses commonly utilize three-tier architecture. The first or bottom tier is the data warehouse database server's relational database system. The second or middle tier is an OLAP server implementing the multidimensional OLAP database functionality. The third or top tier is a client layer providing the user-facing query and reporting tools used for mining the data warehouse (Berson et. al., 1997).

Two leading commercial implementations of data warehousing and data mining functionality include Oracle Corporation and NCR Teradata. Both solutions are based upon relational database management systems (RDBMS) at their core. However their origins, implementation specifics, and performance characteristics have significant differences. Oracle's database originally evolved to respond to the market for traditional online transaction processing (OLTP), then gradually evolved to incorporate data warehousing and mining capabilities through its online analytical processing (OLAP) offerings. OLAP functionality is encompassed within the larger Business Intelligence (BI) disciplines, and includes both relational queries and data mining functions to produce output reports oriented to the business functions of finance, marketing, and management. Oracle's OLAP implementation deals effectively with multi-dimensional data by using algorithms optimized to handle rapid drill-down and aggregation in large data sets. This enables the Oracle data warehouse system to respond to complex information queries that may be posed in different ways from different angles (Douq, 2009).

Teradata is generally acknowledged to be the original large scale data warehouse offering. It originated as part of NCR Corporation, and formally separated into its own entity in 2007. The Teradata relational database was created and architected from its earliest beginnings for optimized information retrieval. As such, it is arguably faster and more efficient for certain "pure" data warehousing implementations than Oracle (Douq, 2009).

At a smaller scale, data warehousing and mining capability can also be created using desktop tools such as Oracle MySQL, or Microsoft Access and Microsoft Excel spreadsheets. With the Microsoft product suite, using features such as pivot tables, fact tables and the Query-By-Example function enables search indexing for practical performance on databases of over a million records while bypassing the more sophisticated programming methods involving Structured Query Language (SQL) commonly found in commercial RDBMS products (Microsoft Corporation, 2009)

How effectively a vendor or small business is able to integrate the operations of warehousing and mining of data is a key determinant of not only its competitive strength, but also the type of target implementations where a satisfactory outcome is most likely to result for the end customer. As such, the strategic business intelligence derived from data warehousing and data mining has become a management tool of critical importance to gaining and retaining competitive advantage (King, 2009).

IBM made a strategic entry into the commercial data warehouse appliance space with its acquisition of Netezza as a subsidiary in 2010. Netezza-based appliances feature a proprietary hardware and software implementation called Asymmetric Massively Parallel Processing (AMPP). This architecture incorporates rack-mounted blade format servers and disk storage, with a hardware-based data filtering component using field-programmable gate arrays (FPGA). Following IBM's acquisition of the ten-year-old Netezza technology, it has modified the TwinFins standard configuration to exchange processing modules for additional disk storage within the same two or four-rack assembly, to offer a "near-line" data warehouse appliance (Prickett Morgan, 2010, 2011). Figure 2 illustrates a typical example of a large-scale, commercial data warehouse appliance product, the IBM Netezza.

Figure 2. Typical data warehouse appliance (Prickett Morgan, 2011).

The technical implementation of a data warehouse RDBMS can differ substantially from a standard commercial implementation. For example, data warehouses are designed to optimize the speed of complex data retrieval queries involved in data mining. To accomplish this, a data warehouse RDBMS may store multiple copies of the same data in granular format using a technique called aggregation. De-normalization of data (that is, the use of data repetition and grouping) is common for read-intensive database applications to ensure adequate query response times. Without de-normalization, performance can be seriously hindered by the overhead involved in accessing normalized logical views or join tables across multiple physical data files (Shin…[continue]

Cite This Essay:

"Relationships And Distinctions Between The Information Systems" (2011, November 14) Retrieved December 2, 2016, from

"Relationships And Distinctions Between The Information Systems" 14 November 2011. Web.2 December. 2016. <>

"Relationships And Distinctions Between The Information Systems", 14 November 2011, Accessed.2 December. 2016,

Other Documents Pertaining To This Topic

  • Geographic Information Systems if the

    The introduction of a GIS system that is first defined through the development and implementation of the applications all based on defining and then re-designing the processes by which GIS systems users will be able to do their jobs more efficiently is at the center of the human factors associated with GIS systems. Commonly referred to as change management, human factors are the study of how processes can be

  • Business Information Systems

    Business Information Systems Advantages Internet Electronic Commerce The Internet has much to offer companies like FedEx and UPS. More and more businesses like FedEx and UPS are realizing the long-term advantages and benefits they stand to gain from engaging in electronic commerce. For one companies like these can handle scheduling, shipping and tracking all using the World Wide Web. E-commerce offers businesses several unique advantage, none the least of which is the

  • System Theory the Origin and

    However, in the most recent theory of evolution which discusses the living world appears as the result of chance and an output of different randomly selected natural mills. This kind of development came to present as a result of the need of more subjects or topics in areas such as cybernetic, general system theory, information theory, theories of games which is needed in most decision making process in line

  • System Paradigms Humans Have the

    The religious organization has other-worldly goals, but must adapt to the demands of this world in order to survive. There are generally two kinds of responses to this problem -- the church response and the sect response. The church response is to adapt at the expense of the goals and the sect response is value-rational-to maximize goal commitment at the expense of adaptation (Satow, 1975). EXAMPLE NATURAL -- Management NEED

  • Systems Administration v Network Administration

    The subject matter of systems administration includes computer systems and the ways people use them in an organization. This entails knowledge of operating systems and applications, as well as hardware and software troubleshooting, but also knowledge of the purposes for which people in the organization use the computers. The most important skill for a system administrator is problem solving. The systems administrator is on call when a computer system goes

  • English System Order Out of

    One example of this is the "famous egg box metaphor of international society (in which states were the eggs, and international society the box), one might see this unevenness as a pan of fried eggs. Although nearly all the states in the system belong to a thin, pluralist interstate society (the layer of egg-white), there are sub-global and/or regional clusters sitting on that common substrate that are both much

  • Distinction and Need for Governance at All Three Levels Corporate...

    Corporate governance, IT Governance and Information Security Governance IS 8310 Governance, Risk Management and Compliance Governance is the process of empowering leaders to implement rules that are enforceable and amendable. For comprehensive understanding of the term' governance' it is essential to identify the leaders and the set of rules, and various positions that leaders govern. Corporate governance, IT Governance and Information Security Governance embraces a linkage with certain acquiescence system while

Read Full Essay
Copyright 2016 . All Rights Reserved