Research Paper Doctorate 10,375 words

Data Warehouse a Strategic Weapon of an Organization

Last reviewed: August 9, 2003 ~52 min read

Data Warehousing: A Strategic Weapon of an Organization.

Within Chapter One, an introduction to the study will be provided. Initially, the overall aims of the research proposal will be discussed. This will be followed by a presentation of the overall objectives of the study will be delineated. After this, the significance of the research will be discussed, including a justification and rationale for the investigation.

The aims of the study are to further establish the degree to which data warehousing has been used by organizations in achieving greater competitive advantage within the industries and markets in which they operate. In a recent report in the Harvard Business Review (2003), it was suggested that companies faced with the harsh realities of the current economy want to have a better sense of how they are performing. With growing volumes of data available and increased efforts to transform that data into meaningful knowledge that can be used to aid in gaining competitive advantage, companies are increasingly recognizing how knowledge can be used as business intelligence to reduce risk and to accomplish business outcomes (Harvard Business Review, 2003). More importantly, companies are wanting to make certain that enterprise data is integrated to the point that it can be used to guide the business in making critical decisions at the right time and right place in relation to customers.

As noted by Foote and Krishnamurthi, (2001), until very recently, the forecasting process used by companies was relatively subjective and was dependent upon the opinions of company executives, sales force analysts, and industry analysts, who were not always extremely reliant in aiding the company to in the production of satisfactory outcomes. Quite frequently, as reported by Foote and Krishnamurthi, companies found that they had missed the mark in forecasting and consequently had failed in achieving profitability, reliability and a competitive vantage position in their industry. Thus, companies are increasingly recognizing the value of investing in an information system to support their forecasting process. According to Foote and Krishnamurthi, a data warehouse has come to be identified as assuming a pivotal role in gaining the knowledge needed by companies to implement reliable systems for forecasting. It enables companies to collect data from many sources, perform analyses, and make informed decisions in real time for the purposes of achieving competitive advantages and accuracy in its forecasting operations in an unprecedented manner.

As data warehousing has been identified as offering extensive promise to companies in improving and gaining greater accuracy in forecasting, it is the intent of the study to further examine the documented experiences of companies who have implemented data warehousing in order to gain a better understanding as to whether improved forecasting has been obtained within these companies. As well, it is the aim of the study to further determine whether companies have been able to achieve greater competitive advantage.

Objectives of the Study

The overall objective of the study is to further explore the degree to which data warehousing has been effective in assisting companies with the process and activities of forecasting as well as in gaining competitive advantage

Significance of and Justification for the Study

As evidenced within the current literature, some companies have reported success with data warehousing while others have not. While Foote and Krishnamurthi (2001) developed a model for understanding the stages of a data warehouse for the purposes understanding and predicting how companies data warehouses change over time, it would appear that this model may also offer utility in determining why some companies are more successful and gain greater competitive advantage than other companies. The proposed study offers the opportunity to examine the success or lack of success of data warehouses through the systematic examination of a number of different variables (i.e., those identified in objective 4 above). As well, Foote and Krishnamurthi's stages model of data warehousing has not of yet been tested in this manner and the results of the study may offer further opportunity to further validate the model while demonstrating its potential utility in examining the degree of competitive advantage achieved by companies on the basis of the stages model.

CHAPTER TWO

LITERATURE REVIEW

Data Warehousing: Background

During the 1990s, data warehousing emerged as one of the most important developments in the information systems field. Prior research has suggested that 95% of the Fortune 1000 companies either have a data warehouse in place or are planning to develop one (META Group, 1996). Predictions had suggested that the data warehousing market would grow to a $113.5 billion market by the year 2002, including the sales of systems, software, services, and in-house expenditures (Eckerson, 1998). Such predictions have not been surprising as research findings had suggested that company executives had identified data warehousing and electronic commerce as most critical for their company's strategic initiatives (Eckerson 1999).

As explained by Wixom and Watson (2001), data warehousing emerged largely in response to business need and technological advances. As the business environment has become more global, competitive, complex, and volatile, there has been a greater demand for data warehousing. Customer relationship management and e-commerce initiatives are creating requirements for large, integrated data repositories and advanced analytical capabilities. More data are captured by organizational systems (e.g., barcode scanning, clickstream) or can be purchased from companies. As further explained by Wixom and Watson, through hardware advances such as symmetric multi-processing, massive parallel processing, and parallel database technology, it has now become now possible to load, maintain, and access databases of terabyte size. As a result of these changes, organizations have changed significantly in the manner in which they conduct business, particularly in sales and marketing, allowing companies to analyze the behavior of individual customers rather than demographic groups or product classes (Wixom & Watson, 2001).

As data warehousing has emerged, a number of definitions have been applied to describe the activities and tasks involved in the construction of a data warehouse. As defined Inmon (1992), a data warehouse is a managed database in which the data is:

Subject oriented: There is a shift from application-oriented data (i.e., data designed to support application processing) to decision-support data (i.e., data designed to aid in decision making). If designed well, subject-oriented data provides a stable image of business processes, independent of legacy systems. In other words, it captures the basic nature of the business environment.

Integrated: The database consolidates application data from different legacy systems (usually means old-style mainframe databases) which use different encoding, measurement units, and so on, and eliminates inconsistencies in the data.

Time-variant: Informational data has a time dimension: each data point is associated with a point in time, and data points can be compared along that time axis unlike operational data which is valid only at the moment of access capturing a moment in time.

Nonvolatile: New data is always appended rather than replaced. The database continually absorbs new data, integrating it with the previous data.

As Inmon suggested, data warehousing has been and continues to be an evolving concept. A data warehouse (or smaller-scale data mart) has been described as a specifically prepared repository of data created to support decision making (Gray & Watson, 1998). Data are extracted from source systems, cleaned/scrubbed, transformed, and placed in data stores (Gray & Watson 1998). As further explained by Gray and Watson, a data warehouse has data suppliers who are responsible for delivering data to the ultimate end users of the warehouse, such as analysts, operational personnel, and managers. The data suppliers make data available to end users either through SQL queries or custom-built decision-support applications.

According to Foote and Krishnamurthi (2001), by definition, a data warehouse is a subject oriented (i.e., by product, store and department), integrated, time-variant and non-volatile collection of data to support decision-making. Conceptually, a data warehouse is created as data from older systems are copied into a new computer system dedicated completely to analyze the data. The purpose behind data analysis is to better understand what is happening, or what did happen within a company. The value of better understanding is translated into better decision-making. As further conceptualized by Foote and Krishnamurthi, there are four commonly used terms to describe the architecture and the functionality of a data warehouse. They are:

Data Mart: A data warehouse about a particular subject such as a store, a department and a product.

Data Warehouse: A repository of data from older systems and other sources that has been cleaned, transformed and duplicated into a data warehouse database.

Enterprise Data Warehouse: A data warehouse built for an entire company.

Operational Data Store: A data warehouse that requires faster response time and update capability. It is extensively used to provide an up-to-date view of data.

According to Babcock (1995), data warehousing tasks are oriented towards information, analysis and decision-making rather than operation or transaction processing. As suggested by Whitten, Bentley and Barlow (1994), data warehouses are best understood as stored data that has been extracted from production databases and conventional files. Kador (1995) emphasized that a complete data warehouse is not possible as the opportunities for developing and applying new tools and products are continually emerging.

On the basis of information provided by Information Sciences at the University of California-Berkley (1997), there are a number of defining features associated with data warehousing as contrasted with the attributes of operational applications. These include the following:

The data warehouse is oriented around the major subjects of the enterprise, e.g., customer, vendor, product and productivity. Hence it focuses on data modeling and database design exclusively, and it excludes data that is not useful for DSS processing. In contrast, the operational applications are designed around processes and functions, e.g., loans, savings, bank card and trust for a financial institution. Consequently, they are concerned both with database design and process design, and they contain data that satisfies immediate functional/processing requirements.

There are differences in orientation between the data warehouse and operational applications in terms of the relationships of data. While data warehouse data spans a spectrum of time, maintains many relationships, and represents many static business rules (and correspondingly, many data relationships) between two or more tables, operational data maintains an ongoing relationship between two or more tables based on a business rule that is in effect.

Data contained within the boundaries of the warehouse is integrated, that is, stored in a singular, globally acceptable fashion, although the underlying operational systems may store the data in various different ways. Data warehouse systems prove most successful when data can be combined from multiple source applications, when all sorts of data inconsistencies have to be effectively addressed. "Data scrubbing" or "data staging" enables the DSS analyst to focus on using the data that is in the warehouse, without having to wonder about its credibility or consistency. The integration of data is manifested in many ways -- in consistent naming conventions, in consistent measurement of variables, in consistent encoding structures, in consistent physical definition of attributes, and so on.

All data in the data warehouse is "time variant," i.e., accurate as of some moment in time, whereas in the operational environment data is accurate as of the moment of access. Thus, the time horizon represented for the data warehouse is much longer (which can involve years) than that for the operational environment (which ranges from the current values of today to ninety days). Every key structure in the data warehouse contains an element of time either implicitly or explicitly.

The data warehouse is nonvolatile. Data warehouse data is a long series of snapshots, and cannot be updated once correctly recorded, while record-to-record real-time updates -- inserts, deletes, and changes -- are done regularly to the operational environment. That is, once data is loaded into the warehouse from the application-oriented operational environment (and/or external sources), it does not change, but is merely accessed there. Therefore, there is no need to be cautious of the update anomaly, an important factor to consider in operational application systems; nor does data warehousing require the complex technologies supporting backup and recovery, transaction and data integrity, and the detection and remedy of deadlock. Data "updating" in the data warehousing environment consists of periodic mass loading of data from the operational environment. The simplicity of data management and the much less rigid response time requirements allow data warehouse designers to take liberties in optimizing the access of data. De-normalization of the physical data model is conducted to enhance performance and simplicity, which are more prominent for data warehouse operations because the amount of data involved is typically very large.

Inmon (1999a) emphasized the importance of monitoring the environment of a data warehouse once it has been deployed. As explained by Inmon, in order to manage the data warehouse environment, two types of data warehouse monitors are required, including activity monitors and data base monitors. A data warehouse activity monitor is one that analyses the activity - the queries - that operate against the data warehouse, The data warehouse activity monitor addresses the following questions in monitoring efforts:

who is using the data warehouse,

A how much is the data warehouse being used?

What is the nature of the queries that are being asked?

A what time of day is the warehouse being used the most?

A are there periodic patterns of usage that are occurring that are notable on a weekly basis? On a monthly basis? On a quarterly basis? An annual basis?

A how much growth is there in the usage of the data warehouse?

I should indexes be added to enhance performance?

A how should the data warehouse be tuned?

As also noted by Inmon (1999a), it is also important to determine what data is being used in the data warehouse. The reality associated with warehouses is that as the data warehouse grows in size and in importance, the percentage of data that is used actually shrinks. Thus, as explained by Inmon, determination of what data is being used and what data is not being used serves as a basis for removing unused data rather than adding additional disk storage. The data warehouse activity monitor can be used in determining what data needs to be removed.

As discussed by Inmon (1999a), the data warehouse data base monitor is used to address the following questions:

how has growth been occurring in the data warehouse?, what profile is there of data in the data warehouse:

key data?

A indexed data?

A non-key data?

The data warehouse data base monitor is used to track the contents of the warehouse and how the contents have changed over time. Not only is standard detailed data tracked, but summarized data is tracked as well. As well, as described by Inmon, the data base monitor is used to monitor the profiling of classifications of record types within the data warehouse. The results of this form of monitoring may be used by the DSS analyst who needs to be able to be knowledgeable of the profile of data subsets prior to the submission of a query.

According to Hackathorn (1995), five information flows are associated with data warehousing: the first four flows to get data in from legacy systems (Inflow), up to a more compact form (Upflow), down to archival storage (Downflow), and out to consumers (Outflow), and the fifth flow to manage the warehouse itself (Metaflow). Data warehouses require tools to make the functions associated with each flow more effective (Mattison, 1996).

According to Mattison (1996), when considering the tools necessary for developing data warehouses, there are three basic categories based on their activities: acquisition tools (for inflow), storage tools (for upflow and downflow), and access products (for outflow). As explained by Francett (1994), acquisition tools are critical in performing tasks such as modeling, designing, and populating data warehouses. These tools are used to extract data from various sources and transform it (i.e. condition it, clean it up, and denormalize it) to make the data usable in the data warehouse. As well, they are used to establish the meta data, where information about the data in the warehouse is stored.

As explained by Mattison (1996), storage is typically managed by relational databases and other special tools in a way that data is used for effective decision support. Alternatively, according to Mattison, access products include data mining tools such as multidimensional analysis products, neural networks, and data discovery tools that support end users in accessing and analyzing the data in the warehouse in various ways In Data mining is the process of making discovery from large amounts of detailed data (Barry, 1995; Mason, 1995). Data mining tools are used to sift through the data in efforts to determine patterns or similarities in the data. With data mining, data is evolved to information, then to knowledge, resulting in business intelligence by means of variety of statistical analyses and data visualization (Brown, 1995; Fogarty, 1994).

Deployment Obstacles

Inmon (1999b) identified a number of factors that have served as obstacles in deploying data warehouses. Each of these obstacles will be reviewed.

According to Inmon (1999b), accessing and pulling data from the source for the data warehouse represents one of the most challenging obstacles to the deployment of a data warehouse. Most often, the legacy systems environment serves as the source of the data needed for the warehouse. As noted by Inmon, a number of problems are associated with accessing and securing data from legacy systems including the following:

finding legacy data: Data is often so secured and convoluted within the legacy system that accessing it without a map is extremely difficult.

A understanding what data exists and means in the legacy environment: Lack of documentation of data within legacy systems makes it difficult to know what data exists and what the data represents.

A efficient traversal of the legacy environment: Legacy systems are complex as is accessing the data secured within systems; thus, finding one's way while attempting to access data can be challenging.

A lack of integration of legacy data: The data found within legacy systems is most often not integrated, requiring data transformation prior to placing the data in the data warehouse.

Inmon (1999b) also explained that the uniqueness of the decision support system (DSS) environment itself represents another major challenge in data warehousing deployment. Most importantly, the DSS environment is one that is uniquely different to the classical operational environment which represents the environment that most information professionals are familiar with. According to Inmon, the DSS environment involves a spiral development life cycle while the operational environment is one that is built using the waterfall approach to development. As well, while the operational environment is focused on very current data, historical data is the focus of the DSS environment. Other difference identified by Inmon included the time required for transactions in both environments and the degree of integration representative of each environment.

A third obstacle identified by Inmon (1999b) is that data warehouse deployment requires data models in order to achieve integration. Data models are critical in serving as a model for the work that is required in data warehouse deployment. When data models are not present, as explained by Inmon, one must be designed and developed for utilization.

Inmon (1999b) also explained that data warehouse deployment requires that massive amounts of data must be dealt with. While organizations may be use to acquiring massive amounts of data, the management of such data is frequently a new and untraditional activity.

The creation and integration of metadata into the DSS environment also presents challenges to the deployment. While developers and designers are frequently familiar with metadata, professionals working within the operational environment are not use to dealing with metadata. Consequently, as Inmon (1999b) explained, the creation and integration of metadata represents new and unfamiliar territory within the operational environment.

Similarly, as explained by Inmon (1999b), the job skills associated with DSS process and data warehousing are very different from those typically required within the operational environment and are not easily transferable. As well, DSS designers and developers are not familiar with the job skills utilized within the operational environment. Thus, according to Inmon, the lack of transferability of job skills from one environment to the other leads to problems in deployment.

A final major complication identified by Inmon (1999b) is that which was created at the time data warehousing emerged when established vendors of operational systems proclaimed that their products were what was needed for data warehousing. These products, while suitable for operational environments, were not appropriate for data warehousing, leading to confusion on the part of prospective customers.

Data Warehouse Design

As reviewed by Mailvaganam (2003), the two major design methodologies of data warehousing are those based on the work of Ralph Kimball and Bill Inmon. According to Mailvaganam, the design methodologies developed by Kimball and Inmon are very distinct from each other, with designers tending to represent one or the other of the two methodological schools. Kimball and Inmon share the commonality of viewing data warehousing as separate from OLTP and Legacy applications.

According to Mailvaganam (2003), Kimball views data warehousing as a constituency of data marts, used to enable businesses in achieving departmental objectives. Kimbell reportedly recognized the data warehouse as a conformed dimension of the data marts. Thus, as explained by Mailvaganam, a unified view of the enterprise is best achieved via dimension modeling on a local departmental level. The following diagram represents a visual depiction of Kimbell's data warehousing design methodology.

As further explained by Mailvaganam (2003), the data warehouse design methodology recommended by Inmon is one that focuses in on a subject-by-subject area basis. Subject areas are selected on the basis of current needs, with other subject areas added to the data warehouse as needs change. From Inmon's perspective, the data mart is the creation of a data warehouse's subject area. The subsequent diagram provides a visual depiction of Inmon's data warehouse design methodology.

Benefits and Disadvantages Associated with Data Warehousing

Sakaguchi and Frolick (1996) conducted a massive review of the literature for the purposes of identifying the benefits and disadvantages that had been documented in relation to data warehousing. The authors reportedly identified 788 articles published from April 1992 to July 1996. After analyzing all of the articles, Sakaguchi and Frolick eliminated some of the articles due to the limited coverage of data warehousing within the publication. The resulting total number of articles used within their comparative literature analysis was 456. On the basis of their review, Sakaguchi and Frolick identified a number of benefits as well as disadvantages that had been identified in the literature as associated with data warehousing. For the purposes of this literature review, the top 5 benefits and advantages identified will be reviewed.

As reported by Sakaguchi and Frolick (1996), in terms of benefits, the following factors were identified most frequently, resulting in their rank within the top 5 beneifts:

Simplicity: As reported by the authors, simplicity represented the most frequently identified benefit of data warehousing. As summarized by Sakaguchi and Frolick, data warehousing was believed to aid organizations in making their activities more simplistic by: providing a single image of business reality through the integration of data; allowing existing legacy systems to continue in operation while consolidating inconsistent data from various legacy systems into one coherent set, and providing access to vital information about current operations; providing a means of monitoring and comparing past operations for utilization in predictions of future operations and devising new business processes as well as new operational systems to support those processes; offering a means for storage of and the transformation of massive amounts of historical data for the purposes of creating vital business information; offering a means for a single, centralized data location while maintaining local client/server distribution; and providing a system for communication throughout an organization.

Better quality data for improved productivity: Sakaguchi and Frolick indicated that data warehousing was frequently perceived as offering the opportunity for the use of better quality data, including that which shared the attributes of consistency, accuracy, and documentation. As indicated by the authors, quality data was recognized as providing the means for improved decision-making as well as improvements in productivity.

Fast access: According to Sakaguchi and Frolick, data warehousing was perceived often as reducing the response time necessary to retrieve data throughout the organization, as the data was located in a central place and users could access the data themselves rather than rely on others to retrieve it for them.

Easy to use: The benefits of ease of use were identified within almost one-half of all the articles reviewed by Sakaguchi and Frolick. Ease of use was seen as important as use of the data warehouse did not interfere with normal operations while focusing on subjects, supporting support on-time, ad-hoc queries for fast decision-making and regular reporting; and are targeted at end users.

Separate decision-support operation from production operation: According to Sakaguchi and Frolick, data warehouses provide a means for separating operational, continually updated transaction data from historical, more static data required for business analysis, allowing for the use of historical data in decision-making without interfering with the production operation.

The top five disadvantages identified by Sakaguchi and Frolick (1996) included the following:

Complexity and anticipation in development: As documented by Sakaguchi and Frolick, this represented the top disadvantage identified within their literature review. The complexity associated with data warehousing was referred to in relation to building the data warehouse and the uniqueness of the architecture and the set of requirements associated with the individual needs of the organization. The design process was noted as complex and required an awareness on the part of the developers as to the importance of anticipating future ways that the data might be used by the organization as well as the constantly changing needs of the organization and the capabilities of the available and emerging hardware and software.

Takes time to build: According to Sakaguchi and Frolick, the time associated with building a data warehouse was also identified frequently as a major disadvantage, with emphasis given to the amount of time often required for justifying the need for the warehouse.

Expensive to build: As reported by the authors, the expense associated with building data warehouses was also identified as another primary disadvantage.

Lack of API: According to the authors, several articles mentioned that a disadvantage associated with data warehousing was that data warehousing software continue to lack of a set of application programming interfaces (API) or other standards that move data smoothly through the entire warehouse process.

End-user training: The need to develop a new "mind-set" with all employees in relation to the use of and the innovative data analysis provided by data warehouses was identified as a disadvantage. The consequent need for extensive training was viewed as problematic.

As prior research has documented, a number of companies have reported greater success with their utilization of data warehousing (e.g., Beitler & Leary, 1997; Grim & Thorton, 1997) in spite of the fact that a data warehousing project is an expensive, risky undertaking. The typical project costs over $1 million in the first year alone (Watson & Haley, 1997). While hard figures are not available, it is estimated that one-half to two-thirds of all initial data warehousing efforts fail (Kelly, 1997). The most common reasons for failure include weak sponsorship and management support, insufficient funding, inadequate user involvement, and organizational politics (Watson et al., 1999).

While there are numerous studies cited in the literature that investigate the factors that affect the implementation of decision-support applications (e.g., Guimares et al., 1992; Rainer & Watson 1995), even though this information has utility, it does not provide important information as data warehousing as an IT infrastructure component that enables present and future business applications (Duncan 1995). Few studies have examined the implementation success of infrastructure projects (Duncan 1995; Parr et al. 1999).

A number of factors associated with the development and deployment of a data warehouse have been identified as important to further understanding the degree to which data warehousing can aid in achieving competitive advantage and the utility of the data warehouse in utilization for forecasting purposes. As identified within the literature, these factors suggest a stages approach to evaluating the effectiveness of data warehousing efforts and are as follows:

identification of techniques and strategies used in the creation of data warehouses by companies, including methods for extraction, data cleaning, data transformation, and data loading; (Foote & Krishnamurthi, 2001) identification of the characteristics of the data used in the creation of a data warehouse including the major subjects of the organization that the data are organized around (i.e., customers, sales or items produced); the degree to which the data are integrated; time variant associated with the data; and, the volatility of the data used with the data warehouse (e.g., is the data read only or can it be updated and changed by users?); (Inmon, 1992) identification of how the data warehouse is used to support decision making, the tools that are used by users to access the data, the applications that are used in relation to the data warehouse (e.g., a decision support system or an executive information system).

A identification of the stage of the data warehouse in its evolution (i.e., initiation, growth, or maturity) by examining the following:

Data -- the number of subject areas, the data model(s) used, and the quantity of data stored

Architecture -- the structure of marts and warehouses

Stability of the production environment --established processes for maintaining and expanding the warehouse

Warehouse staff -- the experience, skills, and specialization of the warehouse staff

Users -- the types, numbers, and locations of users of warehouse data

Impact on users' skills and jobs -- how users' jobs and required skills change because of the warehouse

Applications -- the kinds of applications that utilize warehouse data

Costs and benefits -- the costs and benefits associated with the warehouse

Organizational impact -- how much impact the warehouse has on organizational performance (Foote & Krishnamurthi, 2001).

Conclusions

While there is also ample anecdotal evidence of data warehousing success provided by practitioners, such reports do not suffice for research efforts and the findings that emerge from systematic investigations the factors associated with data warehousing success in aiding companies to gain competitive advantage. The lack of such research further justifies the need for the study and further systematic examination of data warehousing success in terms of competitive advantage through the utilization and application of the stages model as the foundation and framework upon which the study is based. The results of the study will therefore be useful in further developing a more thorough understanding of data warehousing as a means for gaining competitive advantage.

CHAPTER THREE

RESEARCH METHODOLOGY

Within Chapter Three, the research methodology utilized within the study will be presented. The research design will be addressed as will the procedures utilized for data collection and data analysis.

Research Design

The research design selected for implementation within the study was the case study method. This design was selected for a number of reasons. For example, Yin (1994) indicated that the case study method is a way to investigate real-life field settings and to investigate a phenomenon in its actual context. According to Walker (2002), case studies provide credible representations of reality and aim to let the reader see the reality through the eyes of the individuals and groups that have experienced it. The case study approach has become a major feature in the mainstream of social science and business research. It represents a shift from the use of quantitative to qualitative research methods and from measurements and statistics to descriptive case studies and field research (Walker, 2002).

Yin (1994) proposed the use of multiple-case studies as the preferred data-collection technique for use in a qualitative case study design. Yin asserted that the data obtained with multiple perspectives are rich and constitute the strength of a multiple-case study approach. Yin (1994) also suggested the preference for case studies when researchers pose "how" or "why" questions. Walker (2002) also advocated the use of case studies in qualitative research and Stake (1995) was equally in favor of qualitative case studies when investigating a specific context as it helps the reader to see the event through the description provided by the one that has experienced it.

According to Zucker (2001), a qualitative case study design can be viewed as an alternative to traditional quantitative approaches to research and emphasizes the perspective of the participants as central to the process. Several authors (Creswell, 1997; Stake, 1995; Yin, 1994; Yin, 1999) have used the case study method to develop comprehensive understandings of people and their experiences in relation to particular events and situations. Therefore, the multiple-case study approach was appropriate for this study since it sought to evaluate the experiences of businesses in relation to building and deploying data warehouses.

Data Collection

Using secondary databases, a minimum of 4 companies who have created data warehouses were identified for inclusion in the study. Using the stages framework for understanding the development and deployment of data warehouses, each of these companies were examined rigorously by reviewing available case information on the experiences of the companies in their efforts in relation to data warehousing. Using the stages framework as outlined within the objectives, data was gathered on each of the companies.

Data Analysis

The information was gathered and analysis of the data was completed through a qualitative examination of the data. Using the stages framework, a case study profile of each company was developed. After review of the individual company profiles, a review of the multiple-case studies for similarities and differences was conducted and key themes, extremes, trends or patterns in the data were documented. No cause and effect relationship was reported, as that is not the intent of descriptive research (Leedy & Ormrod, 2001). Overall, the results of the analysis were used to determine the degree to which data warehousing has aided companies in achieving competitive advantage as well as the factors associated with successful data warehouses.

CHAPTER FOUR

RESULTS OF THE STUDY

Within this section, the results of the study will be presented. Using four existing case studies on companies that have built and deployed or are in the process of deploying a data warehouse, the results were obtained by evaluating each of the case studies using the stages model of data warehouse development and deployment as the framework for examining effectiveness of data warehousing in terms of competitive advantage and forecasting. The results were used to develop a new case study on each of the companies on the basis of the following stages model framework:

identification of techniques and strategies used in the creation of data warehouses by companies, including methods for extraction, data cleaning, data transformation, and data loading;

identification of the characteristics of the data used in the creation of a data warehouse including the major subjects of the organization that the data are organized around (i.e., customers, sales or items produced); the degree to which the data are integrated; time variant associated with the data; and, the volatility of the data used with the data warehouse (e.g., is the data read only or can it be updated and changed by users?);

identification of how the data warehouse is used to support decision making, the tools that are used by users to access the data, the applications that are used in relation to the data warehouse (e.g., a decision support system or an executive information system).

A identification of the stage of the data warehouse in its evolution (i.e., initiation, growth, or maturity).

Case Study One: Godrej Consumer Products Limited

Godrej Consumer Products Limited (GCPL) is the flagship company of the Mumbai-based Godrej Group. The company manufactures consumer products, including soaps, detergents, and hair care solutions. The group has 18 factories and 120 locations all over India. In 1995, in an effort to respond to more effective use of data that was in a standardized format in the company's servers, the company decided to build a data warehouse that would allow for the use of the data to enhance business productivity.

Data Warehousing Creation Techniques and Strategies

In reviewing GCPL's data warehouse initiative, in terms of the techniques and strategies used for the creation of the data warehouse, the company instituted enterprise resource planning (ERP) software to coordinate the common functions of the enterprise. ERP software usually has a central database as its hub, allowing applications to share and reuse data more efficiently than previously permitted by separate applications (Smith, 2002). Providing a means of source data capture, a central ERP database provides a means for the development of a data warehouse for manipulating that data for analysis (Smith, 2002). GCPL implemented MFG/PRO which is a ERP software package from QAD, Inc.

Characteristics of the Data

In terms of the characteristics of the data used in the creation of GCPL's data warehouse, prior to building the data warehouse, the company had retained data associated with sales information on goods from the factory sold to distributors; sales information on goods sold by distributors to retailers; and sales information on goods sold by retailers to end-users. Inventory information was also being stored in relation to the factories, distributors and retailers. Data had also been kept and maintained on work flow and processes.

After the data warehouse was developed, the use of the ERP ensured that data was generated in a consistent and structured format which could be easily archived. The data is now highly integrated and allows GCPL to use the data to contribution analyses, profit and loss analyses, and sales breakup analyses. As detailed by GCPL, the company is now able to generate reports from accessing the data that provides the amount of sales that particular products have made (product-wise contribution); the amount of profit that a particular customer has generated for the company (customer-wise contribution); the profit contribution of each factory for the same product; and comparisons of efficiency of different factories. Data can also be generated to produce trends in terms of sales and costs. Vital data is collected and collated to the company's benefit.

Access and Utilization of the Data Warehouse

As identified by GCPL, employees of the company now depend heavily on the ERP for their daily operations. The system serves as a decision support platform based on historical sales and cost pattern analysis. As reported by GCPL, such uses of the data warehouse have helped to enhance product lines, build greater customization, and favorably impact the bottom line.

As reported by GCPL, data is extracted from the ERP system with the help of extract routines and uploaded into an Oracle warehouse with the help of upload routines once at the end of every month. This creates a separate undisturbed database in the warehouse. This data is now routinely processed for a few days and results derived from it. The company has planned and is in the process of implementing an e-commerce initiative on the basis of the greater business knowledge gained from use of the data warehouse.

The range of data warehousing products used by GCPL includes Oracle Express Server and the OLAP (OnLine Analytical Processing) client. The OLAP draws from the collected database and performs analysis, calculation, and recalculation to support what-if scenarios and other strategy-setting aides. The Express Server uses a caching scheme to store, manage, and analyze relational data.

As reported by GCPL, data warehousing tools are used mostly by the company for analysis and trends that allow the company to create short- and long-term strategies and business problem solutions. Consequently, the company has suggested that they are now able to divert more production activity to the better performing factory. As noted by GCPL, the limit of the capabilities of the warehouse is bound by the creativity of the end-user.

Stage of the Data Warehouse

On the basis of the information provided by GCPL, it appears that in terms of the evolution of the data warehouse within the company, the company continues in a stage of growth.

GCPL continues to strategize and plan for new ways in which the data warehouse can be used to create business benefit. Efforts have been made and reportedly continue to be directed towards cultivating usage of the data warehouse by end users and towards further developing the capability through training and education of employees to extract useful business intelligence from the warehouse. The company had started the warehouse implementation around three years ago. According to information made available by the company, tts usage has recently picked up and continues to emerge as a driving factor for business in the near future. There are plans to allow access to the warehouse on the Web. GCPL has reported that the warehouse and its utilization have provided the company with greater competitive advantage and the ability to manage resources better.

Case Study Two: Safeway

Safeway is a supermarket chain, located within the UK. It reportedly has annual rate of $10 billion in sales, 70,000 employees, and more than 410 stores, Safeway is the third-largest grocery chain in the UK.

Prior to 2000, the company put together a strategic plan for growing its retail business and gaining competitive advantage in the future, calling the plan "Safeway 2000." The plan represented an effort by the company to focus on more than products and storefronts, with an interest in creating a "marketplace of one." The idea behind this plan was to market Safeway services to the individual, becoming increasingly customer-centered and creating greater business advantage. In order to fulfill Safeway 2000, the company recognized that they need to be able to better utilize the data available to them to identify and target the buying habits of individual shoppers.

Data Warehousing Creation Techniques and Strategies

While Safeway first started working with IBM some years ago to develop better information systems to support its growth in the marketplace, in order to build a data warehouse to meet the demands of Safeway 2000, the company utilized IBM's

DB2. In describing the DB2 system, on the basis of information provided by Jones (2003), the DB2 family spans a wide variety of UNIX®, Linux and Windows platforms and the IBM iSeries™ (OS/400® operating system) and zSeries™ (OS/390®, z/OS®, VM, VSE, and Linux) server lines. As also explained by Jones, DB2 Everyplace™ supports handheld devices and embedded Linux environments and provides data synchronization with larger systems. As reported by Jones, DB2 technologies address emerging customer requirements in several areas:

Autonomic computing requires that servers, operating systems and middleware including DB2, diagnose and correct problems without human intervention. Database self-management and automation for the database administrator are areas of particular emphasis in the most recent edition of DB2.

Standards-based Web services have emerged as a new style of application processing with full support from DB2.

Grid computing, or the idea of large-scale computing resources used as a utility or service, including database services, takes advantage of the vast clustered scalability of DB2 to support large databases and large numbers of simultaneous users in a highly available manner. Standards-based Web services are another key component of grid computing supported by DB2.

The "e-business on demand" business model requires an operating environment built on open standards to allow quick and cost-effective innovation and reconfiguration. The infrastructure to support e-business on demand must be reliable, scalable and secure. DB2 is part of that infrastructure.

In terms of DB2's database administration, as reported by Jones (2003), the system has attempted to ease the burden of database administration in several ways:

Its Control Center provides a central place for DBAs to perform their work across networks of DB2 systems.

An array of advisor tools provide expert resource monitoring, problem diagnosis and corrective action. An example of this is the Configuration Advisor used to rapidly achieve peak DB2 performance for new installations on UNIX, Linux and Windows. Another is the Health Center, which serves as a centerpiece for much of the recent DB2 work on self-management. Its rules-based problem diagnosis and corrective action capabilities complement the new DB2 Performance Expert and DB2 Recovery Expert tools, an emerging class of IBM database tools providing more expert guidance and automatic action than previously possible.

Continued advancements in cost-based optimization and automatic query rewrite technologies, there since the beginning of DB2, continue to remove the burden of DB2 performance management from the database administrator. The goal for each new version of DB2 is to require fewer and fewer database administration resources. DB2 benefits from the overall IBM focus on and investment in autonomic computing.

As well, as noted by Jones, DB2 is the core of a wide variety of data management products and solutions in the areas of: business intelligence, enterprise content and records management, and federation and information integration.

Characteristics of the Data

Safeway is challenged by storing and using data on 410 stores, with 25000 product lines, while attempting to serve the individual needs of more than six million customers by analyzing every item in each shopper's basket. Prior to the creation of Safeway's data warehouse using DB2, reportedly to manage, use and access this data were a statistical nightmare. As reported by Safeway, the complexity of managing nearly 30 gigabytes of data each day, warehoused in two data centers linked by satellite network to each point of sale, represented a major challenge in itself. Equally challenging, prior to the deployment of the DB2 system, was the process of accessing the data on customer spending and utilizing it in such a way as to develop an understanding of the spending profile of individual customers. Through the deployment of the DB2 system, customer data has been structured and organized to create a dynamic database, with data mining utilized to work the data stream continuously, segmenting Safeway's Information Warehouse into hundreds of customer characteristics.

Access and Utilization of the Warehouse

On the basis of information provided by Safeway, it would appear that the data is accessed primarily by data/system analysts within the company. The reports generated from the data warehouse are used by Safeway executives for a number of marketing activities including tailored and individual mailings as well as analyzing product performance, and forecasting shopping patterns. Reportedly, the data warehouse has come to actually drive the growth of Safeway's business by focusing on the meaning behind the data analyzes generated. Data mining efforts have aided Safeway's marketing, finance, and retail departments get a handle on the strategic information buried in its data assets, allowing for identification of regional variations in sales and product trends. Safeway has indicated that the warehouse has aided them in decision-making and increasing effective marketing decisions, providing the company to manage the business through data on customers.

Stage of the Data Warehouse

On the basis of the information provided by Safeway, it appears that in terms of the evolution of the data warehouse within the company, the company continues in a stage of growth. It continues to expand its' utilization of the data warehouse in further determining ways in which the company can boost its business performance and long-term strategies and goals.

Case Study Three: Wachovia Corporation

Wachovia Corporation is an interstate bank holding company that offers credit and deposit services plus insurance, investment and trust products to consumers in the southeastern U.S. In order to provide these services, Wachovia offers a network of retail offices and ATMs as well as telephone and Internet banking. Wachovia also serves customers nationwide through its credit card business. The company, while maintaining a commitment to grow the number of accounts, is also committed to a strategy centered on building relationships with existing customers so they bring repeat business and referrals. Increasingly, Wachovia has recognized the importance of marketing the right products to the right customers based on good information and advanced decision support tools. While many banks reportedly purchase name-and-address lists and send out millions of applications, Wachovia was interested in developing a strategy that would allow them to refrain from expending mass mailing resources while alternatively marketing to clients who they could be reasonable certain would respond favorably to specific products.

Data Warehousing Creation Techniques and Strategies

Wachovia has built an enterprise-wide data warehouse. As reported by the company, the data warehouse runs on DB2 in a distributed UNIX (AIX) environment.

In building the data warehouse, the company wanted to utilize a warehousing strategy that would allow them to maximize its investment in its data warehouse, by processing the greatest amount of data in the least amount of time. For the purposes of handling the warehouse processing load, Wachovia exploits RS/6000 SP technology, clustering multiprocessing nodes in a parallel environment, which allows for sorting and processing data in parallel.

You’re 80% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2003). Data Warehouse a Strategic Weapon of an Organization. PaperDue. https://www.paperdue.com/essay/data-warehouse-a-strategic-weapon-of-an-152837

Always verify citation format against your institution’s current style guide requirements.