Data Warehouse technology has changed the way that global organizations conduct business. Many have found it impossible to create a business strategy without a data warehouse. The purpose of this discussion is to research and explain the importance of data warehouse management.
We will begin by defining data warehouse and describing the business uses for the technology. Our discussion will then focus of data warehouse management. We will examine the three components of data warehouse management. In addition, we will discuss the assurance of safety and privacy which are needed to maintain the integrity of the data warehouse. Our discussion will also focus on the availability and reliability of the data warehouse. We will also investigate different management tools that are used to maintain the data warehouse.
Definition of Data Warehouse is defined as "A Database system containing large amounts of data that uses sophisticated software optimized for fast searches and data retrieval." (Compact American Dictionary of Computer Words) There are two main components that make up the data warehouse; the data mart and the info mart. The data mart is defined as, a file that houses data that is clean and ready to be analyzed and requires no additional manipulation on the part of the engineer. (The Quality Data Warehouse 1999) The info mart is described as data that is used to obtain reports, user interfaces and graphs. An info mart aids users in making important strategic decisions. (The Quality Data Warehouse 1999)
Data warehouses are an indispensable part of any global organization. Data warehouses are used to keep track of sales, inventory, and customer spending patterns. ("Data Warehousing") In fact, "a data warehouse may contain very different things, ranging from the traditional financial, manufacturing, order and customer data, through document, legal and project data, on to the brave new world of market data, press, multi-media, and links to Internet and Intranet web sites." (Barker 1998)
Data warehouses allow firms to learn more about their customers so that they can develop strategies to maximize profits and minimize cost.
White paper published by the SAS Institute explains, data warehouse delivers "one version of the truth" across the enterprise. This allows meaningful comparisons between plants, production lines, and products. The data become information that is meaningful for all levels of decision-makers within the company. For the IT staff, data are in a clean, consistent, and documented format. For the engineer or analyst, data are convenient, in a common format, and if desired, exportable to other common formats." (The Quality Data Warehouse 1999)
The majority of data warehouses that exist today are created by integrating data from various sources into one database. (Barker 1998) However, some companies use more advanced data warehouses that can duplicate files such as graphs, images, sounds and drawing. Many data warehouses can also has store a combination of structured and unstructured data. (Barker 1998)
Data Warehouse Management
Managing a data warehouse can prove to be a complicated task. There are several steps that must be taken to ensure the quality and safety of the data warehouse. Therefore the management of a data warehouse must be carefully planned and coordinated. This section of the report will discuss the tasks involved in the management of a data warehouse. Let's begin by discussing the three components of data warehouse management.
The three Components of Data Warehouse Management
According to a report, created by the Veritas Software Corporation there are three components of Data warehouse management. Without the proper management of the components maintenance of the data warehouse would be impossible. These components (shown in the image below) include; load management, warehouse management and query management.
Load management is the most important of the three and involves "the collection of information from disparate internal or external sources." (Barker 1998) The loading component of data warehouse management is so important because it involves the transforming of data into a format that is conducive with processing. During load management raw data should be maintained within the data warehouse. (Barker 1998)
Warehouse management concerns itself with the everyday management of the data warehouse. The maintenance of the data warehouse is dependent upon maintaining its security, supplying availability to users and creating backup of the warehouse contents. (Barker 1998) Maintaining the warehouse in this manner ensures that the data warehouse works properly.
Query Management is the process of granting permissions to users so that they have access to the contents of the warehouse. (Barker 1998) Query management is important because it only gives access to the people that the company wants to have the information. In many cases permissions are given with passwords or some other custom application that is already in the system. (Barker 1998)
Safety and Privacy
According to the Veritas report, safety is one of the most important aspects in the management of a data warehouse. The report asserts that there are some very simple things that should be done when an organization is considering the safety of a data warehouse. One of the primary considerations should be a backup system. All of the files that are contained within the warehouse should be backed up in the event of system failure or in case a virus infiltrates the system. (Barker 1998)
The Veritas report also claims that it is important to have a plan for disaster recovery. (Barker 1998) This plan is accomplished through the development of a recovery site which houses all of the pertinent files. These file are updated on a regular basis to ensure that users get the correct data in the event of a disaster. (Barker 1998)
Additionally it is imperative that a balanced amount of the data be stored offline and online. The report explains, "A well balanced system can help control the growth and avoid 'disk full' problems, which cause more than 20% of stoppages on big complex systems. Candidates for offline storage include old raw data, old reports, and rarely used multi-media and documents." (Barker 1998) When considering offline storage Hierarchal Storage Management also needs to be considered.
HSM involves the ability to quickly send data offline into a second storage location. (Barker 1998) The user can still access the data but it is stored in a different place. (Barker 1998) The report explains that, "when accessed, the file is returned to the online storage and manipulated by the user with only a small delay. The significance of this to a data warehousing environment in the first instance relates to user activities around the data warehouse." (Barker 1998)
Hierarchal Storage Management is also important because it addresses problems with disk space that are sometimes common with data warehouse management. (Barker 1998) Quite often the raw data that is stored in the online warehouse takes up a great deal of space. (Barker 1998) The use of HSM to store the data in a different location allows for more disk space. HSM also shortens the time that it takes to backup files in the warehouse. (Barker 1998)
Another important part of data warehouse safety is versioning and data warehouse recovery. According to the report created by Veritas, versioning involves the creation of a clone or checkpoint that allows the user to view a "version" of the warehouse at a specific time. Barker explains the versioning process saying,
This is achieved by creating a storage checkpoint in the file system that underpins the database and the normal files that constitute the online data warehouse. Subsequent changes of any block in the database continue to be held in the live database, while their corresponding before images of the blocks are held in a 'checkpoint file'. Thus applications could log on to a storage checkpoint version of the data warehouse, say taken at exactly 6:00 P.M. On a Friday, and conduct a whole series of analysis over several days purely on the warehouse data as it was at that instant." (Barker 1998)
Versioning and checkpoint recovery permits the organization to create complete backups of the data warehouse. It also allows a new standard of decision making within the corporate entity. In addition, versioning allows the recovery of the system to occur faster in case of a system shut down.
Safety and privacy issues concerning the technology can also have an effect on consumers. Since many data warehouses contain sensitive information about consumers, organizations have to reassure consumers that their information is safe. Hackers can infiltrate a data warehouse and retrieve information about customers. They can then use this information to get credit cards and purchase products. Stolen identity is a major problem facing the 21st century consumer. When data is stored online it is vulnerable to hackers. This data is also vulnerable to hackers if it is stored offline and is not protected by a firewall or some other safety measure. When an organization puts safety first they will be able to manage their data warehouses more effectively. (Barker 1998)
Organizations must also stipulate to customers whether or not their information will be sold to other companies. In…