Research Paper Doctorate 19,027 words

Data Warehouse a Strategic Weapon of an Organization

Last reviewed: July 31, 2003 ~96 min read

Growth Aided by Data Warehousing

Adaptability of data warehousing to changes

Using existing data effectively can lead to growth

Uses of data warehouses for Public Service

Getting investment through data warehouse

Using Data Warehouse for Business Information

Ongoing changes in Data Warehousing

The Origin of Data Warehousing and its current importance

Relationship between new operating system and data warehousing

Developing Organizations through Data Warehousing

Telephone and Data Warehousing

Choose your own partner

Data Warehousing for Societal Causes

Updating inaccessible data

Data warehousing for investors

Usefulness of Data warehouses for fashion industry

Section7 -- Conclusion

Data Warehouses must expand to meet user needs

Costs of IT projects

Developing Software

Usefulness of Data mining

Business Information from Data Warehousing

Business Information is a step beyond Data Warehouse

Development of Data Warehousing into management tools

Data warehousing for e-business

Section8 -- Bibliography

Section9 -- Personal Evaluation

DATA WAREHOUSE: A STRATEGIC WEAPON OF AN ORGANIZATION

ABSTRACT

This is a summary of the business of data warehousing to day. We have highlighted the meaning of the term "data warehouse." Then the discussion has been on the different aspects of data warehouse including the differences between the uses of data in the operation of data in business. A few very recent instances where data warehouses have been used successfully have then been given. These cases have shown that data warehouses are used by practically every type of business - including hardcore engineering, banks, consumer soft like apparels and outdoor gear, consumer services like Airlines and telephones, public service industries like cancer research organizations, for the collection of investment by better projection of the organization to potential investors, etc. One of the great qualities of data warehousing in that it makes accessible data of the company in forms that are no longer used in the company can be brought back to life with this technique. Of course, like any other business tool this is not magic and cannot solve all possible problems of the organization.

After this the problems faced by organizations with data warehouses have been dealt. The major difficulty that has been noticed is that data warehouses are often under- budgeted and this results in the requirement of rapid expansion. This is expensive, but essential, as most organizations after implementation or revamping of data warehousing projects have had an overflow of users. Many of them are new users and try to use the resource in such a manner that it requires too much of computer time and other resources to answer their queries. This however should be seen as a positive feedback on the use of data warehouses rather than a failure. Some new techniques have also come up in the profession that allows other uses of the data warehouse. These uses are mainly for business applications. All this is only an indication that the use of data warehouses is still an important and useful technique for most businesses. It is also a relatively recent technique, and may develop further in the years to come.

TERMS OF REFERENCE

Data Warehousing is one of the newest tools that have been made available to business. This was impossible in the paper information storage days. Then the information could have been available in the various files in different places, but it was difficult for the data to be retrieved and analyzed. Limitations of the medium and the human mind came into play. Today, the data is stored in electronic bits, which are consolidated into "bytes." This data can freely be transported over wire, over previously unthinkable distances. This has resulted in data being available to everybody, who wants it. The speed of the collection of data is also fantastic and easily available on the telephone, over the Internet.

A little more than ten years have passed since this was first started. At that time, the main component of the data revolution, the computer was much more primitive than they are today. This was in terms of processing speed, storage capacities, and data transfers - almost everything. The Internet was also in its infancy and the concept began more as an experiment than a serious attempt to modify the way of doing business. Today, it has become much more important to business and many businesses are being affected and others infected. Many have succeeded in using these tools to improve the quality of their business. This has affected all areas of business including the holiest cow of decision-making. Profits and competitiveness are being affected and businesses are using this new tool for their survival and development. We have made a study of this in many cases. These will be dealt with in the discussion about results.

One must remember and continue to remember that the existence of this tool is development of the new age information technology. This has gone ahead at incredible speeds. About ten years back, the random access memory of a standard PC or personal computer was 4, 8, or 16 MB or million bytes. Today, the memory is in the range of 256, 512 or 1024 MB. This is a jump of 64 times. The processor speeds have increased from around 500 MHZ or million cycles about three years back to nearly 2,400 MHZ today. The hard disk memory has increased by a similar amount of 4 times. It is said that computer components are getting outdated in less than a year. The other part of data warehousing is the speed of transfer of data. This is dependent on the communication links and basically the telephone links. These also have been developing at a very rapid rate, due to the improvement in links through the satellites and undersea cables. Technological developments in the production of cables and transfers of information technology have empowered similar sized cables to carry much more data.

The financial support for this expansion is coming not only from the U.S. But from all over the world. Often the data can be transferred from one point to another point through a number of routes, and it is first decided how the data would be transferred depending on which links are available free. The Internet has developed from a military technology to an every home tool. The customer is using these tools to make decisions on his purchases and thus the role and affectivity of the sales man has gone down to the point where he now redundant in many businesses. On the business side, these tools had pushed up the effective use of data warehousing, but new forms are beginning to emerge. These are right now a quantitative change, but may soon develop further and take us to a stage where the concept of data warehousing it self may change. These are dealt with in our section on The Problems to be Answered and Possible Solutions. The results are the findings with reference to the targets that we have set up for ourselves in this section.

LITERATURE REVIEW

It has become very popular among organizations that are trying to get their hands on information fast and quick to implement data warehousing. (Adhikari, 1996;Kador, 1995). The term 'Data Warehousing" was first framed by William Inmon in 1990 (Hackthorn, 1995;Kador, 1995) and since then most of the IS managers and vendors have found it attractive. (Francett, 1994; Parsons, 1995). A survey conducted by the Meta Group stated that nearly 95% of the two hundred and fifty companies who were contacted planned to introduce or use date warehousing during the forth-coming year. Considering that only the previous year only 15% people used this it was a spectacular change. (Bull, 1995b).

Another survey conducted by the Forest Research Inc., that nearly 96 of the senior IS managers at these Fortune 1000 firms who were surveyed had intentions of using data warehouses. Out of which nearly 60% had expectations that data warehousing would improve and would increase the overall access to the corporate data. And 32% felt that date warehousing was a way to broaden their corporate strategies and also to increase their business process, along with an improvement in customer support and also enabling individuals to be acquainted with new prospects coming their way. (Adhikari, 1996). Furthermore many vendors when they realized this decided to manufacture software, hardware and tools in order to be able the functions related to data warehousing more effectively, thus tying to bring an increase in their profit. (Francett, 1994; Parsons, 1995,). A study by the Meta Group stated that data warehousing is expected to grow to nearly eight million dollars by 1998. (Barney, 1995). Though people have shown an interest in data warehousing not much academic research has been conducted or published.

The basic advantage of using data warehousing is said to be the fact that it is very 'simple'. This concept is said to be simple because it one outlook of the business on the whole by the assimilation of data. Data warehousing helps in creating the legacy or inheritance system to continue maneuvers in merging data from various legacy systems. This is intended to form one rational conclusion and to also gain benefits from important information regarding the current operation of the organization. (Hackathorn,1995;Wallace,1994a). These current operations can also be contrasted and evaluated in terms of the old existing data, and alongside, predictions as to future operations and about new business procedures can be developed and also new systems can be planned out to sustain these processes. (FairHead, 1995; Hackathorn, 1995; Ricciuti, 1994a; Smith, 1995d; Wallace, 1994a; Weinberg, 1995a).

These date warehouses can also store huge amounts of past data and corporate data that most companies use as potential data for important businesses. (Bull, 1995b; Brown, 1995; Cafasso, 1994d; Eckerson, 1993b; Hackathorn, 1995; Lisker, 1994; Nash, 1995c; Smith, 1995f; Wallace, 1994a). Another advantage of these date warehouses is that they are single and centralized but can be used for local or server distribution client systems. [Ricciuti, 1994a). Also these warehouses are also company wide systems (Hoffman and Nash, 1995); and hence can be used for corporate wide communication. [Seybold, 1995].

There is also mention of other data like consistency, accuracy and documentation (Ladaga, 1995;Ricciuti, 1994a;Wallace, 1994b). An improvement was found in the decision making by making use of OLAP and a big improvement was seen in the data mining analysis. (Barry, 1995;Barquin, 1995,Broda, 1995). Another commonly mentioned advantage, is the fact that it allows speedy access. By making it possible for users to access their data on their own, the work of IS suspended. As the information that is needed is available in one place the response from the system is reduced. (Bull, 1995b; Fairhead, 1995; Lisker, 1994; Parsons, 1995; Reardon, 1995). Especially the articles that are mentioned to be 'easy to use' (Barquin, 1995,Broda, 1995.) The best part about this data warehousing is that even when other individuals are accessing the data present for their business purposes even then the operational database is not slowed down, as most of the operational data is placed in another database which is specially designed for the purpose. (Bull, 1995b Burleson, 1995; Fairhead, 1995; Lisker, 1994; Ricciuti, 1994a; Smith, 1995d; Smith, 1995f; Wallace, 1994a; Wallace, 1994b)

Most data warehouses try to concentrate on their subjects (Barquin, 1995; Broda, 1995), are on time, they also provide ad-hoc inquiries in which way it encourages immediate decision-making and also regular reporting and all these are basically made to make it more convenient for the end users.(Adhikari, 1996; Burleson, 1995; Smith, 1995d; Wallace, 1994a; Wallace, 1994b). Another need for data warehouses is that they are constructed to separate the data, which are used for purposes of operation, or the data that is being constantly upgraded in terms of the already existing past data, because of which most managers and analysts can use their past data to help them in decision making without having to affect the production operation. (Francett, 1995b; Taft, 1995; Wallace, 1994a).

A few articles have also mentioned that data warehousing makes the business so much easier and convenient to use that the businesses benefit from it and are able to become more cutthroat and it also enables them to understand their consumer needs better and meet the market requirements adequately. (Wallace, 1994a; Wallace, 1994b). Though it is expensive all these benefits to the organization can actually validate the expense. (Barquin, 1995) There are some articles that discuss that data warehouses that data warehouses help in gathering information from prospective organization who are not easily accessible and help them to be put in good use. For these functions software's like, Middle ware, data transfer software and client server tools are used. Therefore, it can be justifiably said that a data warehouse is an ultimate distributed database. (Burleson, 1995; Reardon, 1995; Wallace, 1994a).

There are also those other articles that state that data warehousing is the experiment field for new architect operational systems. (Hackathorn, 1995). With the help of these we have a great reduction in paper files (Cafasso, 1994d; Hackathorn, 1995; Ladaga, 1995; Parsons, 1995; Santosus, 1995; Wallace, 1994b) and though the initial investment is high once it is covered the information technology individuals do not need much more resources. (Barquin, 1995). Another important discussion is about the fact that data warehouses handle large amounts of data from operational sources and along with collecting they also manage the transfer of this data. In order to meet the changing business needs these production systems also keep changing and keep updating their data encoding and structures. These data warehouses, the meta data, assist in incessant incremental refinement that keeps track of the production and changing business environment. (Barquin, 1995; Hackathorn, 1995). There are still those articles that suggest that when data is processed paralleling it helps the users to perform their operations more quickly. (Brown, 1995; Bull, 1995b; Stedman, 1995a).

Now the users can also question regarding details that were probably too technical to be answered, and ultimately these data warehousing helps in managing more customers, users, more transactions, and messages. It supports higher performance that is most needed in client-server transactions and also provides unlimited scale ability and also ultimately gives a better performance which leads to better price.(Capacity Management Review,1995) Another article in connection to data warehouses states that these data warehouses enable the users to obtain their data directly and to also improve the data that is availed from the different software applications without having to affect the operational database and also to incorporate the various business tasks into one single efficient process that is in turn supported by real time information. This makes available to the users robust processing engines (Goldberg, 1995b; Seybold, 1995). There are some other articles that state that data warehousing can be constructed on a high end PC and even a mainframe, though most individuals make use of UNIX servers and run it in a client server environment. Certain software vendors like IBM and five others joined together to form a partnership in order to enable to clear the cross platform on which data warehousing is implemented. Some other software vendors have also formed such joint ventures. The independence that was not available in the legacy system is now available in these data warehouses and has proved to be very vital. (Wallace, 1994a).

Another few articles in relation to data warehousing also mention that these help the organization to build a computing structure so that any changes made in the computer systems and business structures can be supported. There are those articles that also state that these data warehouses make it easier for the employees of any organization to make decentralized decisions, as the data is easily accessible to the users too. They give the end users faster access to the information without having to use any of the systems or other resources. This also is beneficial as the users do not have to ask the IS managers about the needed information that allowing them to perform other tasks. This also removes the use of the middlemen who are used to transfer information to other places. (Bull, 1995b; Seybold, 1995a).

One more very good deal with this data warehousing is the pragmatic benchmarking. These data warehouses by providing the quantitative metrics that are necessary to base most business process on and which are accessed from past data and in turn allow the business managers to evaluate progress. There are few other articles that discuss about clients of data warehousing who cannot directly question the production database, therefore helping in protecting their privacy and their level of production. (Ricciuti, 1994a). Few warehouses specially designed for the purpose also provide security for various management services. (Smith, 1996)

Data warehousing has its disadvantages too. The most recurrently mentioned disadvantage is that it is complex in development. IS need to develop a warehouse whenever it requires one as each warehouse is built for a specific purpose and cannot be bought from anywhere. (Ladaga, 1995; Myers, 1995b). It is also necessary that IS asks a huge set of questions while building it.. (Redding, 1995; Goldberg, 1995b)While building the database the builders need to pay enough attention to the structure, the definition and the flow of data along with the hardware and the software bit. (Hildebrand, 1995; Adhikari, 1996; Edwards, 1995; Wallace, 1994a). For Data warehouse construction it is also essential that an analysis be done as to how the collected data will be accessed in the future. (Goldberg, 1995b). The individuals who are developing the database need to keep in mind the changing aspects of their business and the changing hardware and software available. (Lardear, 1995a). They need to be able to measure the warehouse requirements and meet the user demand for quantity and intricacy (Lardear, 1995a) in order to make their development more multifaceted. Alongside, they may also be faced with the difficulty of having to choose the correct product. (Harding, 1994). Precisely in order to develop such a huge and complex database we need individuals who are specialists. (Harding, 1994)

There are some other articles that it takes nearly 2 to 3 years to build a database. (Goldberg, 1995b; Hildebrand, 1995; Ladaga, 1995; Redding, 1995). Especially in a situation where there are no executive sponsorships, IS directors or any others who are interested in developing the warehouse it consumes more time. Also some articles also stated that these warehouses were very expensive nearly costing two to three million dollars. The main reason that these data warehouses are expensive is the fact that these data need to be copied from their existing database into a common format initially before it is probably at times manually copied into the databases. But in contrast to these there are some other authors that also suggest that these software's are still not very apt at Application Programming Interfaces (API) that help in transferring the data through the warehouse process, like the Open database Connectivity (ODBC) interface (Microsoft Corp.). But at the same time, Open database Connectivity Application Programming Interfaces that allows the PC to be accessed from different databases is not available everywhere. It is essential that employees bear themselves open for the change they would have to accept in terms of this innovative data analysis, which is provided by the data warehouses, they need to be ready to take training to make theme selves ready for change. In order to do so a communication plan is also essential.

Some of the writers state that if Symmetrical Multi-processing and Massively Multiprocessing is introduced into these data warehousing then it would become more complex. (Wallace, 1994b) They feel that synchronization and the ability to share data (Goldberg, 1995b; Burleson, 1995) will become difficult. The main purpose of data warehousing being to gather data together, it is a centralized option. And it is stated to this regard in nearly 5 or 1% of these articles. Where many companies are getting their warehouse database together the fact that these databases can be used only at one site as they are centralized is a drawback. Some of the authors have also stated that these databases are taken from the operational databases that are on the change all the time. To have a real time warehouse with a full scale database is not possible and is just an oxymoron. The corporate data stored in the warehouses (Burleson, 1995) only belong to a particular period and hence they ultimately get exhausted until the warehouse is reloaded.

The advantages of data warehousing are the fact that the data structures are simple and easy and enable individuals to access quickly and output good quality data that is beneficial for productivity and decision-making. But at the same time, the complex procedures involved in development, the time and cost of building warehouses are the biggest disadvantages. This provides a wide range of issues for the researchers to research.

METHODOLOGY

In order to answer the research questions and therefore satisfy the objectives of this dissertation, a research methodology needs to be ascertained.

Saunders et al., (2003) indicates that business and management research needs to engage with both the worlds of theory and practice. Consequently the problems addressed should grow out of interaction between these two worlds rather than either on their own.

Types of secondary data by Saunders et al. (2003). Different researches have generated a variety of classifications of secondary data. Fig. 2 (next page) shows three main subgroups of secondary data: documentary data, and those complied from multiple sources.

Fig.2

Because of specific type of my topic, which is connected with data ware, I am going to base my dissertation on the literature review, which has already been done in the previous chapter and secondary data. As my research project requires understanding of several case studies conducted secondary data will provide the main source to my work.

Different researches have generated a variety of classifications of secondary data. Fig. 2 shows three main subgroups of secondary data: documentary data, and those complied from multiple sources.

The research will be the analysis based on the data collected from secondary resources. The analysis would formulate a focus-oriented approach to the topic chosen.

I have chosen the specific topic for my research, as this area of business has been recently widely explored by many authors and several contributions have been written on this subject. But there are always questions to be answered. I expect that knowledge gained while drawing up this dissertation will help me to understand and get a new point-of-view about data warehousing as a strategic weapon of an organization.

RESULTS

Growth aided by Data Warehousing:

The first case discussed here is about Cabela's in the discussions that follow. That case clearly demonstrates one of the most important aspects of data warehousing. This is in terms of the growth to the organization. This was achieved because of the speed of data analysis and the storage capacities. They have talked in terms of 700GB of data. In terms of printed-paper, that will mean over 1 billion typed sheets. The costs of typing such amounts of data would have been too high for an organization like Cabela's to afford. Thus one might safely say that the modern technique of data warehousing and storage as made possible by present day technology has only made this success possible at Cabela's.

Adaptability of data warehousing to changes:

The second case of France telecom shows that after sometime, data warehousing systems need to change. This is but natural, especially when the data warehousing system is for a company that is also in one of the highest growth areas of today's business and industry. The resulting implementation has shown that data warehousing was able to meet up to the challenge. This means that in similar applications data warehousing is still going to remain useful for sometime in the present manner. The adaptation in terms of software and hardware however seems inevitable. Even a residential user has to update his computer regularly. The efficiency and commercial requirement is also clearly evident. Again, evidently this is a toll that the business requires for it survival.

Using existing data effectively can lead to growth:

In the case KDDI, the data already existed, and KDDI wanted to use the existing data more effectively in view of the changed business circumstances. For this purpose, the data had to be processed and new software was introduced for this processing. Here the important consideration was the speed of processing involved. The same data could have been stored in other manner, but the existence in the electronic form helped in the speed of processing. Further, appropriate tools were required and without that also the speed of processing would not have been fast enough. This again was the requirement for software. This again shows that data warehousing was essential for the company. It also shows that though this is relatively new field, the software has been developed well enough to meet many common requirements without too much difficulty.

Uses of data warehouses for Public Service:

Apart from the development of data warehouses for business purposes as has been done in the previous instances, the service is also required for the public good, CDC and its assisted units store a lot of data that are used for developing new techniques and medicines against different diseases. This data is also stored in the electronic form and often needs updating with the development of the technologies involved. People are getting to a stage where they would prefer to get all the data they need sitting at their desk. This was an impossible dream and one had to make personal visits to collect required data. This took time and the spread of the diseases were not controlled fast enough. The recent example sited here shows the importance of modern techniques to the possible control of one of the worst diseases knows to man, cancer. The next case sited for Boise is a similar operation though that is a purely commercial operation. Often enough the requirements of different types of organizations may be the same due to technical reasons though the organizations themselves may be quite different.

Getting investment through data warehouse:

It is often not enough to have the data, but it has to be adequately presented. This is most important when the presentation has to be made to users outside the company. They naturally will have less incentive to keep trying, and if they do not get enough data, or get out of date data, they will end up believing that the company is trying to hide something. Connotations are often applied impulsively, but are difficult to remove. In today's business world, most large companies are not owned by individuals, but are owned by anonymous investors from all over the world. These people have enough money to invest, but are not committed to any company or industry. They are all the time trying to maximize their own returns. Different investor groups, and their consultants, mutual funds all have the same basic objective. Today we are gradually moving into a new world which will be very small in terms of communication, and it is a clear sign of the times to come that an organization like Siemens has decided to inform the investors of the actual position as fast as it is informing its own staff. It is hoped that this effort will succeed and Siemens will have an improvement in its market valuation because of such a direct and daring step. This is a very important area of current business decisions and the example of Siemens is critical.

Using Data Warehouse for Business Information:

This is one of the areas in which data warehousing is playing a major role. Organizations, which do not have very critical data in terms of usage, can also use the data effectively and come out with reports, which can increase the effectiveness of the organization in increasing the profits. The role of business information in the development of the organization is no doubt very important, but without organized storage of data this business information cannot be collected. This is a natural development and a sign of progress of the organization when it starts using data warehouses for further application.

Ongoing changes in Data Warehousing:

In many cases, when a corporate data warehouse is first set up, the management does not really know how it will be used. There is a natural tendency to err on the side of caution. When the business people get to use it, they lose the capacity of the warehouse to analyze data fast and accurately to give them analysis, which was previously thought to be impossible. This encourages more and more people to use it. This builds up pressure on the system. Other people gradually get to know about it, certain databases, which were not thought to be relevant, also become required in the analyses, and they begin to get linked up. This begins to increase the size of the database. Thus the data warehouse keeps growing and more people getting to use it. In certain cases like the Japan Airways the growth is huge and really surprising, but growth is certainly a regular phenomenon. This also shows the remarkable utility of a data warehouse.

The cost acceptability: this is a problem that will always come whenever changes from existing systems are sought. These have to be looked in business terms and the solution suggested has to be implemented in many situations. It is also better to use as a part of the project when it is first presented. That report will probable cause the project to be seriously treated from the beginning, and not left on one side till the problems become insurmountable. The problem is as old as business itself and getting money to spend from a businessman has never been easy. Data warehousing projects are easily justifiable and all what they need for support is for people to push it with the proper reports.

The major problems with data warehousing are the problems that develop due to the popularity of the concept and the volume of data that has been built up. This is essentially a problem of quantity. There is also a question of diversity that has come up due to the different types of data storage systems that are being used and has been used in the past. The different types of system being used are a natural consequence of the competitive nature of our business. This multiplicity of systems will remain and the transfer of data between platforms will also be happening in the future. There can be no permanent solutions to this problem, and newer software will continually be developed. This only shows that data warehousing is still very important to organizations and the system problems are in a position that can be taken care of by the group of professionals within the profession.

Some of the required changes that are coming to light are simple. Attempts are being made to find out the ways in which data warehousing can be better used for the organization. The most important factor here is the speed of analysis of data. For this certain tools are available and existing. Experts in this matter have provided some guidelines, which have been noted. These are sets of important tips for data warehousing consultants entrusted with the responsibility of data warehousing in organizations or the consultants to them for solving their problems or suggesting on changes.

Data mining is a concept, which has existed for quite some time but was believed to be very expensive and time consuming to be carried out. This is because most people believed that the data had to be put in a separate data mart to be mined separately for each purpose. Recent advances in technology have made it possible to do parallel processing of the data within the data warehouse itself. This has reduced cost and time and the technique may today be well within the reach of organizations that have a proper data warehouse. This is a useful management tool for some decision and essential under certain other conditions. This seems to have become a bye product for data warehouses that store data in a form that permits parallel processing. This has only increased the importance of data warehousing for the organizations.

Business intelligence is a related tool that has been in existence for some time but not used properly because of the lack of support to the tool from the organizational IT staff as also the actual users who were supposed to benefit from it. This again suffered from proprietary architecture and separate operations. Today Information technology itself has advanced a great deal and systems have become much more attractive. Interaction with the computer has become much easier. As a result, this is another related activity, which may be promoted.

The concept of business Intelligence has gone a lot forward, and data warehouses are an essential part of the system. Certain new techniques have come up so that the extraction of information data has become simpler and possibly less expensive. These techniques will help the data to be accessed immediately and in the original location and remove the necessity for copying. This will make it possible for people to get the information almost immediately. This is an important development for data warehouses, as it will help it play a bigger role in the process of decision making by the executives of the organization.

A new concept has come to make data warehousing totally dynamic by coupling it with the business information system. This will then not only store the data as is done in data warehousing, but use the data, almost immediately to chart the future course of the organization. This will help this new system to guide the organization to bigger successes, or so it was felt. This is felt to be a view, which may be applicable to certain organizations, which are not data intensive - meaning that they do not need to store large amounts of data. A large amount of data may be very difficult to treat in this manner. Or, it may use only aggregates of certain data for this purpose and not the data itself. This is an interesting concept and needs more careful analysis for further development and application in real life organization. The concept of a new system of databases for e-business is quite esoteric and may be useful for some organizations specifically dealing with such business problems and do not seem to be of enough general importance yet.

DISCUSSION

The Origin of Data Warehousing and its current importance:

The importance of data warehousing to companies in organizing their information became clear in the beginning of the 1990s. Since then there has been a period of trial and error through which the initial premise on which data warehousing was started has been established. Today most people believe that improvements in information lead to improvements in business. In the beginning data warehousing was considered important, but not critical for business. Individual executives for improving the quality of their decisions used the collected data. These decisions were mainly regarding location of new factories, launching of new products, closing down some branches and similar decisions. The data was used to reinforce existing management practices, and not for fresh thought. The hardware and software of those days were less reliable than today and failed frequently, and this led to information not being available when needed on occasions. Sometimes the updated data did not match the old data, but at that time, this was not considered important. In the initial stages, data could not be fed into the computer systems where the persons were working and they had to be called out and given the information. For this purpose, beepers were often used, and the number of times a person's beeper rang, his perceived importance increased. Gradually, the beeps kept on increasing as data began to inform management decisions more and more.

Some of the stored data are critical to the operation of the business in terms of the revenue earned by the business. An example of this type of data is the state of reservations of seats in an airline. Each seat sold is revenue earned by the company, and if the system is down for any reason, it is likely that some seats, which could have been sold, will not be sold. For the company, this will result in a revenue loss and so; the data storage and retrieval may be considered critical for the company. This is a fairly simple example, but other cases of benefits may not be seen so directly. Let us take the case of the Bureau of Transportation Statistics in the U.S. Department of Transportation. They provide a lot of information in direct as also analyzed form for use by companies outside the U.S. Government. This portal has recently been upgraded to provide information more quickly. This enables the users to get more current data analyzed into different forms. Their users can now access more recent data to use for business planning or analysis. The flow of goods is related to a lot of connected industries. These may tell them the trends for their own business.

The other aspect is in the quality of the decisions taken by the people who run the operations side. Today, they are informed immediately of changes that have taken place. This information when it comes quickly helps them to take better decisions. They can analyze applications better, check on the legalities immediately if there is a need, and send out corporate alerts immediately through e-mail and other devices. This is enabling people even at lower levels to know the correct decision and not only the top management. The third big impact of the availability of data is to improve the general efficiency of the business. This is being viewed today as the most important aspect of business. Earlier, business was concentrating on improving customer satisfaction through improvement of the marketing process. Today, business is concentrating on the reduction of business expenses, as the world has become much more competitive. It is a question of survival for some businesses. For others, it is a question of improving their own business. Thus, data warehousing has become very important to many industries. (Hackathorn, 2003)

Relationship between new operating system and data warehousing:

The concept of a data warehouse arises from the concept that the data is best stored outside data being used in the operating systems. The reasons of this separation are in part historical. The old systems of storing data called the legacy systems used to shift the historical data onto tapes from the operating disks after it was no longer being used for operations. The analysis reports used to run from these historical data tapes and not the operational data. One of the reasons was that this treatment used to leave the minimum impact on the operating system. This reason continues to be one of the main reasons why the data continues to be separated even in the present day data warehouses, except that this is a decision need to be taken at the time of the starting of the system.

Many advances have taken place in technology and the analyzing processes are now much more sophisticated and complex. The decision makers today are not satisfied with only standard reports but demand many more reports requiring online and multi-dimensional analysis. Another important reason for the separation of data is that successful data warehouses normally combine data from many sources, or many operating systems. When the data is combined for the purpose of the data warehouse, it is natural to think of combining the data not at the operational place where it is used but at some other place. Earlier to the development of data warehouses, this data would have been taken from different sources and then combined into a single spreadsheet or database for use. The data warehouse of today may effectively combine data from the different functional areas of business like sales, marketing, finance, and production.

Some of the present day systems also permit the addition of this data in installments on a regular basis. The main reason for combining data from different sources is to be able to cross-reference all the data from these different sources. Basically all data in a typical data warehouse is benchmarked against time of origin of the data. Thus time becomes the main criterion for the main basis separating out the data. The queries answered from the data in the warehouse are normally given for every week, or month, or quarter, or year. Another popular analysis that is done is the comparison of the performances for months and quarters on a year-to-year basis. Examples of this would be to compare the data for July 2003 with the data for July 2002. This is expected to remove seasonal variations. This can also judge the effectiveness of any special efforts that have been made during specific periods like promotion campaigns or special incentives. Even general advertising campaigns have been attempted for evaluation in this manner. It is said that data warehouses give the ability to people within the organization to best understand the interplay of the activities of the different groups within the organization. This may be one of its biggest advantages.

The data warehouse not only combines data from different sources, but also makes them comparable. If the same operational data was earlier being stored in an old mainframe based, specially developed legacy application and is now stored in a standard business application, the data would normally be not comparable. When the data is filtered and taken into the warehouse, they may be stored in a manner that is comparable to each other, though the original data storage methods were different. Another important reason for the separation of the warehouse data from the operational data is the feeling that operational data can be affected due to the impact of the analytical processes. For operational data, it is most important to have quality performance and immediate response times. Slow responses on the previously defined transaction times may lead to loss of efficiency and other costs. It may seem that a five-second delay in a transaction process may not be material, but it has a compounded effect on a number of other processes that are taking place along with that delay. When the same data has reached the warehouse, any delay in analysis will not have great or comparable effect inefficiency and costs. This happens because of the difference in the designs of the two types of systems. For business operations, the expected peak loads can be estimated and the maximum permissible transaction times set to meet the organizational requirements.

For other types of loads also the acceptable delays can be determined. Then the cost of a system designed to give the acceptable response times for the peak loads can also be determined. It is for the decision makers then to decide whether such a system would be acceptable or the system changed. For analysis like this, costs of the operators, telecommunication costs, and the cost of the lost business may be considered. If there is a query running on the operational system then the query will also be pre-defined in terms of time assuming a certain predefined volume. The queries and reports coming from a data warehouse are also pre-defined, but the activity against the answers to such queries or reports are almost impossible to define. When the data in a warehouse is explored, the business analyst travels along uncharted paths many a time.

Sometime, they are also faced with a large number of queries asked by people who have not been able to fully appreciate the results. Unexpected results in performance may also generate large number of queries. Another aspect is that the processes of analysis of data in a warehouse tend to be very general, whereas in operations the processes may be specific and different. Sometimes the summary data will encourage users to start exploring the details from a particular report. This then could be linked up with some activity in sales during that month so that the user may understand the reason for the sales that are being reflected. Sometimes the users may end up asking queries that would be impossible to answer from operating data, but the data warehouse may in a position to deal with such queries.

Another important difference between operating data and the data in a warehouse is that data comes to a warehouse mostly after all the changes that could have taken place to the data have already taken place - in short, the data has become sterile or non-volatile. It is basically historical in nature and the order status in the data in the warehouse will not change, whereas in the operational data it can keep changing. Similarly, the inventory positions will not change. For this situation to have taken place, it is important to consider when the data should be brought to the warehouse. In the operating situation the status o an item has to go through many changes before it finished. An order that has been accepted may first be just an order, and then it has to be executed. For the data warehouse the order is only to be taken into account when it is nearly completed.

It need not take into account the in between stages of the product, the sales or the order. Sometimes even snapshot data of volatile data may be fed to the data warehouse. In practice, it is very difficult to maintain changing data in the warehouse. Many data warehousing projects have failed on this reason. Most operational systems send the data to the archives after the data has become inactive. For example, an order has no action to be taken after a pre-determined period from the completion of the order. A bank account that has been closed for a period of time is in effect inactive. The primary reason for archiving any inactive data is that it affects the performance of the operational system. Large amounts of inactive data mixed with operational live data slow down the process of handling the speed of a transaction that is only processing the active data. On the other hand, data warehouses are designed as the archives for data, and the data here can be saved for long periods.

In fact, a data warehouse project may not need any archive to store the data beyond the warehouse. The cost of maintaining the data once it is loaded in the data warehouse is very low. Most of the significant costs of any data warehouse are incurred in the transfer and cleaning of data. Storing data for more than five years is very common for data warehousing systems. In many industries the data-warehousing project has encouraged the managers to increase the length of time the data is stored in the data warehouse. Originally, it may have been planned store the data for two or three years. This period is often increased to five or more years once the wealth of business knowledge in the data warehouse is discovered.

The falling prices of hardware have also led to the increased storage of data for longer period in successful data warehousing projects. In short, the separation of operational data from the analysis data is one of the basic concepts in fundamental data warehousing. Here the data is stored in a structured manner outside the operational system. Many businesses are today using considerable resources to build data warehouses at the same time that new operational applications are started. Rather than archiving data to a tape as an afterthought of implementing an operational system, data warehousing systems have become the primary interface for operational systems. (System Services Corporation, 1997)

Developing Organizations through Data Warehousing:

Cabela's is well-known for its eight stores for outdoor gear, especially in the Mid- West where it has the largest store. To expand its reach and sales it has established four telemarketing centers. In the middle of the 1990's they felt that more information was needed from the customers regarding their individual tastes, purchase preferences, and other behavioral characteristics so that they could be categorized into the different groups that formed the company's client databases. This was a sort of a marketing research exercise to be done with the customers. This was to help Cabela's build individual mailing lists for all the catalogs and promotions they wanted to send out. The existing database that Cabela's had could not tackle the problem. Their sales progress and the data that was being collected or could be collected could not be properly tackled by the old database.

So, they chose a new database, namely IBM DB2 Universal Database Enterprise Edition, version 7.2. This helped their costs to come down by more than 20%. The cost reduction is in software, services and human resources. There has been a very big impact on their sales data loading. Every night they are able to handle more than 4 times the load they were able to handle previously. The responses received from their queries sent to customers have improved by more than 80%. Maintenance time now is reduced to half, as do the maintenance costs. The information taken out from the data warehouse they have is the reason for the improvement in responses to the queries sent out by them. This has enabled Cabela's to improve their catalogs and e-publications. The warehouse currently has 11 years of information amounting to nearly 700 gigabytes of data.

This includes sales statistics by the type of products, SKU number, and the specific catalog, which got the response. This helps Cabela's to continuously refine its catalog mailing lists. The data also tells Cabela's where to place what product among their stores, and in the catalogs. The new system has been in operation only for a few months, but the impact in terms of increasing sales is already visible in many of the company's market segments. Soon the company feels that it should go in for an online analysis and processing of its data. The company feels that this system helps it to understand the crucial relationships among its customers, markets, products, prices and geography. This is most important for the company's type of business, both in terms of sales as well as profits. (Rivkind, 2003)

Telephone and Data Warehousing:

Recently, one of the largest telephone companies in the world, France Telecom was looking for a system to store their data. They are one of the world's largest carriers of telephonic information with over 91 million customers in 220 countries and in five continents. Their system was upgraded in 2000 to improve the services to their customers. The plan was to have a system that could store all telecommunications traffic, help in the detection of frauds, improve customer service, general operations, the marketing of their services, and improve their network traffic analysis. The most important issue was the volume of data that had to be handled. They generated 500 million calls a day and the data to be recorded was 500 million call detail records (CDR) every day. Their existing system did not have enough flexibility and problems could happen with increased loads. Finally, a new system was chosen based on Oracle, on an HP V2500 and Cap Gemini organized the system.

The system is one of the largest in the world and has 180 billion CDR. In terms of both size of database and records, this is larger than the single largest database in the world as per a study conducted by Winter Corporation a year ago. This central data warehouse serves 8000 end users and is connected simultaneously to at least 600 of them at any time. The records have to be updated regularly, and 65 million CDR may be added every peak hour. In terms of data, the system picks up 100GB of data every day and this in turn is sent to the user and downstream units. The system itself is updated with the data every hour. In terms of operation, this system has been found to be very stable. It meets the benchmark of the France telecom standards that any online CDR enquiry be answered within 4 seconds, and this system is able to achieve that. These standard enquiries require a date and an outgoing or incoming telephone number.

This system has already integrated 50 existing databases and that has resulted in considerable cost savings. The system has been used to detect frauds by passing on information to the companies quickly, detailed information about telephone usage. The data is the main source of data marts for marketing, network requirements, finance companies, etc. The advantage of the new system is that the fresh data for every hour is loaded into a separate "table space" and is on a separate machine. The loading is completed at the end of the hour; it is then switched on to the online system. This is a time-based partitioning and very useful for manageability and query efficiency. The system also prevents the incoming new data from interfering with the existing data. New data is being continuously added to the system and France Telecom has also taken up new applications. This has also been successful. This shows that the system can accommodate further growth and development in future years, as also change in usage. (Winter, 2003)

Choose your own partner:

As already mentioned, choosing a database is like choosing a partner in United States. When you entrust your data to a new database system, you do it with all good intentions of having a long-term commitment. In business terms, it is a capital investment and like all good businessmen one expects the investment to keep up with production for a long time after the capital costs have been written off. That may not happen in the fast changing world of computers. So, like your life partner, you may develop differences of opinion about the utility of your warehousing system and shift to some other system. Luckily, in this divorce, the only alimony that you may have to pay has probably already been paid out, by the non-effective utilization of your data.

Carrying on the analogy further, not all companies end up choosing the same system for their needs. In Japan, a new telephone company has been born from the union of three existing companies. The new company is called KDDI and the merged companies were called KDD, IDO and DDI. This naturally increased the database and the accretion to the database by a large proportion. The obvious solution was taken and the company took on a new data warehousing system. In the new system, KDDI had to store the data of the three companies that existed prior to the merger. Before storing this data, it had to be crosschecked and errors weeded out. For this purpose they used a Trillium Software system. This consolidation of data was done quite successfully.

After the merger, KDDI has become the only company in Japan to offer the total range of telecommunication services. It has wireless services (AU), Internet connections (DION), both international and domestic telephone lines and business to business or specialized customer linkages. The joining together of three companies has reduced costs, and in the long run is expected to be beneficial for both the company and its customers. Again since now it is providing a number of services to the same customer, it must analyze the needs of the customer to be able to provide him with better quality of service. For this purpose, it must properly get proper records of the usage of each service by the customer.

For this the customers must be segmented according to the services they use. KDDI also must find out which customers use only part of the service and try to increase its business by convincing them to use the other services. The critical point in this is matching the customer data for the different services. The company had tried to do this in the previous data using existing manual and systems processes for the data of the three companies, but ultimately had to use Trillium Software. This software was installed on a UNIX server as a core component. The software identified and linked up the millions of records in a short period of three months.

This was the foundation for information that the company needed. This database has also been used for other purposes since that time. The software cut down the development time that the company would otherwise have required. The success at KDDI has encourages many other Japanese companies and KDDI has succeeded in establishing its image as a technology user. They want to use even more advanced systems now for their requirements. The business of KDDI is also expanding after the analysis of their database using Trillium software. (Takase, May 2003)

Data Warehousing for Societal Causes:

One of the recent data warehouses that have been recently updated is one for California Cancer registry. This is the storehouse for all cancer cases in the State of California. The data is used for funding of research and development against the disease over a large geographic area and maintaining the demographic details of the disease patients. The registry covers nine different geographic regions. Six of then had an existing records system of one type while the three others had a system of a different type. All the existing systems were supposed to store data similar to the data now being stored by the new system. The differences in the systems were not able to distribute the data to all the regions. The central repository for the data also ended up not having all the data from the regional systems.

Thus CCR felt the need for a new system so that all the data was available to it easily. Accordingly a project for this duly supported by Center for Diseases control (CDC) and National Institute for Health (NHI) was started. The job was given to CRI and they have developed a data warehouse with a web interface. The new system can provide instant information and access to all research personnel in all the regions. The system permits the research people to go into the complete depths and recover details that they specifically need. The data made available is current and complete. The system also took care of the processes of data accumulation, data quality control and assimilation of the data with the old data. The result was an increase of efficiency of the data entry personnel by 50% and the free time can be better utilized for the assistance of the research group. Today, the CCR staff themselves can make required changes to the system, send and receive file formats, change the rules for editing and logic or optimize pages. The system is essentially a Microsoft-based system with DNA architecture, SQL server, XML, Microsoft Transaction server, Active server pages, and Internet information server. (CRI Advantage, 2003)

Updating inaccessible data:

Sometimes being the early users of a system creates its own problems. Boise Building solutions had such a problem. The manufacturing division of theirs sells six families of forest products and this data was being stored for 35 years on IBM DB2 and DL/1 databases. These systems are outdated and the stored data could not be used by most of the people who required the data for projections of sales, forecasting, production plans, finance, etc. The solution found was to import the data into a SQL Server data using Microsoft DTS and SQL Server scripts. The new system is updated regularly from the existing systems that are in use. This helps in keeping the content current. This helps the company in forecasting sales more accurately while keeping the inventories low. This is a fairly simple operation by today's standards. (CRI Advantage, Boise, 2003)

Data warehousing for investors:

Today one of the main criteria by which the international companies are evaluated is the value of its sock in the open market. This value depends on the information about the company that potential investors can get. It is important that such large multinational companies store their data on a system, which can be easily accessed by outsiders interested in the company for whatever purpose it may be. Siemens AG is one of the world's largest companies in the field of electrical engineering and electronics. In 2000 it had some 460, 000 employees in 190 countries and a sales of over $78 billion. The company is a world leader in the fields of information and communication, automation and control, power, transportation, medical instruments and lighting. Every day there are 2700 users who log into the site through the Siemens intranet and try to find out details about the company. The users normally seek business and financial data from SAP solutions.

The company wanted to cut down the processing time for the data the persons sought, improve the ease of access to the site and enhance the quality of data supplied. This was to increase the attractiveness of the stock and to prove the soundness of the decisions taken by the company's management. To achieve the same objectives, earlier Siemens had adopted the Generally Accepted Accounting Principles (GAAP). This had the markets believe that Siemens was a good company to invest in and the report function earlier housed in SAP R/3 and the controlling system were not being able to cope up with the demand. This compelled Siemens to first start a SAP consolidation system. When this was in operation for about a year and the company was ready for the new system, it started with the SAP BW 2.1 which was live with the consolidation system. This was a part of the declared ESPIRIT project (Enhancement of Siemens in Reporting and Information Technology).

Today all the reporting and consolidation have been transferred to the new platform. Some interface problems and conflicts of interests had arisen earlier, and they have been solved. The earlier systems have all been transferred to the new system. The system is proving useful not only to potential investors, but also internally. The business data from the divisions, regions and separate companies are now being made easily available within the company. This is speeding up the overall consolidation process. The board can now react faster to changes in the equipment business, which is both high risk as well as high in volume. The location of the user is unimportant for getting the information, they can all view the same information as also download it or print it. The same data is used for management needs. The flexibility exists in preparing graphs and charts required by the user. The security of the data is protected as to enter the intranet the person must have an authorization and user ID. The company feels that the change has served its purpose. (Neuburger, 2003)

Usefulness of Data warehouses for fashion industry:

Some businesses are viewed as lightweight industries as compared to the "brick and mortar" types of industries that we have dealt with till this point. One of these industries is the fashion industry. Cutter and Buck is a renowned company in this segment manufacturing and marketing the famous Cutter and Buck range of sportswear and outerwear. These items are sold in the high class markets of golf pro-shops and resorts. They also sell directly to corporate sales accounts and in selected specialty retail stores. This is not a mass-market product. Recently they went through complete company reorganization through Business Objects. Previously, inventory management was centralized function, but after the reorganization the responsibility was transferred to six strategic business units or SBU. The strategic business units have been organized from the market point-of-view. Now all the senior managers in these new units need key business data to be able to control their inventories and for other purposes.

The senior management of the company knew this when the reorganization exercise started. They started developing much more powerful data analysis and reporting strengths. For this purpose they decided to use the old historical data they had. This data was not being used since it was in a form different to the form they had their current data in. The important question was of dividing the data up for the purpose of the newly established six strategic business units. This data had to be made available to all the leaders of the strategic business units and the analysts in those units. The form that the data was to be made available to them should enable them to easily access the data, the reports that had been built up from that data, and find out any key information they needed to find out.

This should help them make their own decisions and run the business in the best possible manner. It was felt that they should not be dependent on the IT people for occasional reports from the It people in the organization. In short, the data had to be made fit for regular use and in report forms that the users could understand directly. In terms of operation, the transfer has been quite successful. The Web-based browser is able to supply all types of reports to the executives - both corporate and personal reports. The reports are also of different frequencies - weekly, monthly and quarterly reports. In short, the system is working well in terms of the reports that it was expected to supply.

The business of Cutter and Buck is highly competitive. The system designed by their consultants, Business Objects is special software on the enterprise standard business intelligence system, otherwise called BI. This system tracks bookings, sales, margins and the status on inventories. It has an integrated system of queries, reporting and analysis solutions. The system will be able to analyze not only the data they generate but also outside data. The analysis will track large volumes of customer sales and purchase data. This data will be correlated with other factors like the channel of sales, the sales representative making the sale, geography of the sale and the season.

The inventory levels will be monitored for seeing to it that the correct apparel is always available to meet the demand from the various channels. As mentioned earlier, the system is already sending out many reports to the managers in time and the managers seem to be satisfied. Till now, all strategic business units have had their business intelligence needs met both by the software, as also the efforts of the IT staff within the company. The strategic business unit with the maximum need for information was the SBU dealing with golf. This unit sells most of the golf apparel to golf pro-shops and golf resorts. It is most important for this unit to follow closely on the bookings, margins, invoicing and stock.

Over a period of time the reports have changed in form. In the beginning the reports were very general, but the reports got more developed with time. The teams at the strategic business units' level as also the IT people had to first get a clear idea about the capabilities of the system. The business process had to be first clearly understood. Now key reports for the sales function include the daily bookings report. This tells the organization about the immediate sale and makes sure that the bookings are serviced. The other important report is the comparison of net bookings with net invoicing. The invoicing is the completion of the sale. Another important aspect studied is the providing of precise sales and operational details. The reports also provide a clear picture about returns and cancellations. The sales management knows clearly the area where an item was returned and the reason for the return.

Some of these factors can be accessed and analyzed online. These are the measuring factors concerning sales. The quarter-to-quarter bookings can be compared on the computer itself through the Web browser. This also permits the analysis from high level of sales to exact, specific data. It is easy to move from overall bookings to regions, and even to sales persons, even the specific account. The detailed facts are required to take management decisions like whether an account should be continued or not, or even whether the method of working with a particular account should be changed.

Apart from sales, the staffs also take a very personal interest in the computer-based information. The sales staff becomes entitled to a bonus based on their sales reaching a figure as compared to the previous year's sales. This used to take over a thousand hours of processing time earlier, and with the new system, it takes only 10 hours. The system had to run, analyze, format and distribute all the reporting for this initiative. The increase in faster processing and business information systems has also caused happiness in the staff. (Leech, 2003)

CONCLUSION

Data Warehouses must expand to meet user needs:

The concept of data warehouse is not new to all Nippon airways Ltd. It has been using large-scale data warehouse based on HP servers (V classes) since1999. In April 1999, The Brain, the company name for the data warehouse began operations in the first phase. This system had three major goals. The first was in its performance. The database server was to be able to search data entries within one minute in a database with three million data entries. It had to have the capacity to join one million data entries to another one million data entry. It had to have the capacity to move up rapidly in scale of operations. ANA already had 600GB of data just for its domestic passenger transportation services and the system was to have the capacity to handle ten times that volume. The third aspect was the most important for an operating airline, the system had to be fault tolerant -- it simply cannot be "down."

The HP server V class was selected, as it was a server capable of meeting and satisfying all of the above requirements. The data warehouse and the data mart were in a cluster configuration system, consisting of 2 HP server V class servers (V2250) and Oracle's DB. ANA installed 3 HP Lh4 net servers at the front end as Web servers for meeting the user needs. This method allowed the users to access data through the web browsers. This helped the analysis of the data in a simple and quick manner. The easier access brought in more sophisticated analysis from the existing users and new users. This increased the system load. The need to use the data more aggressively in developing business strategies and improving business processes was realized. The data analysis needs must have existed before The Brain project was implemented. The ANA users were very conscientious of their responsibilities and sensitive to problems. They regularly tried to improve tasks and processes. Before The Brain, they could not get the enough data for this purpose, but The Brain gave them this opportunity. Of course, the Information Technology Services existed for the analysis of data for business needs and published its reports. But, now that the new possibility was available to analyze the data themselves, the opportunity was exploited.

ANA decided to increase the amount of data stored in The Brain in December 1999. The original data stored in the Brain were the flight data, cargo data and FFP data or the international flight boarding data warehouse, domestic flight boarding data warehouse and data mart. Now accounting (cost) data mart was added. As a result the data volume exploded. In the first phase, the system had a 900GB data capacity (disk capacity was 1.8TB because of mirroring). But now four times the amount of data capacity was required. The international/domestic flight boarding data warehouses save the transaction data (entered when the passengers board the planes) as raw data and the number of records is huge. To increase the data storage area, ANA decided to expand The Brain. ANA increased the number of CPUs from 8 to 12 in the V2250s. Then it added one more 8 CPU V2250.

Finally, it incorporated a mutual back-up cluster type consisting of 3 servers. ANA also expanded the disks by adding the HP Surestore e disk array xp256, which was HP's high-end disk array system. A total disk space of 6TB was made available up from the 900 GB originally provided. This second phase system started operation in April 2000. The decision to add more CPUs in the V2250s was to meet the need to cope with an expected system load increase. The increase exceeded all expectations. The analysis now taking place were more sophisticated, and as a result used a large volume of data and in a longer range. The number of users was rapidly increasing. The number of users in the first phase was about 50 in the data warehouse and 250 in the data mart. After two years, the data warehouse had 300 users and the data mart 900. This meant an increase of about 4 times. As a result, the system load was also increasing drastically.

As a company, ANA had then increased the international flights and flight data. The capacity of the existing system was being used to a high level, and it was felt that it might be worthwhile to revalue the entire system. The result was to develop a new system infrastructure for this purpose and the plan was to upgrade to the HP Superdome. It was felt needed to start a large scale Storage Area Network (SAN) with a high-end disk array system, high speed back ups and all connected by Brocade fiber channel switches. The reason for upgrading the server was to deliver real time results to the users. It was felt that this required increases in processing power. The data access in online system placed the load on the CPU.

Due to this factor the CPU was not being able to service the users in the daytime and most of the batch processes for analysis was being done at night. The organization wanted to increase the online hours for the users during the daytime. This would also speed up the processes being undertaken at night. The system infrastructure was shifted in November 2001. The objective of managing more data centrally has been achieved and the processing speeds have increased. The data storage has increased from the 6TB earlier to 20TB now. This will hopefully meet future needs. For reestablishing international routes another server is being installed. This will take place in April 2002. This is hoped to be able to maximize data storage and analysis by the users.

Now, the Brain can provide the numbers of data marts that the users require for their analysis. These data marts are created every night in the nighttime processes. The number of batches processed in a night can be 1500 to 2000. It also takes a total of eight hours to completely back up the data. The importance of a CPU is not easily realized; most batch processes require a CPU to service them. The new process can support up to 16 CPUs. This helps in processes that may require more than one CPU. The transfer of the operations was done in two stages. First the storage systems were changed so that the servers could access them. This was done through SAN in place of direct connections. Older disks were replaced with high end disk arrays.

This improved access speeds and increased the capacity to more than 27TB. This helps the system store data from other systems that are connected via SAN but it ensures the holding of a 20 TB capacity for the Brain. Only after this was completed, the servers were replaced. Two Superdomes were partitioned into seven units. The three earlier servers are running on six of these partitions. The last partition is reserved for data recovery and other measures. To improve the accessibility by the users, the number of web servers was also increased to six servers. The planning for the shift started in June 2001. The methods were tested from July to September and the operation methods were finalized. The physical migration took place over a long weekend - from November 23rd through November 25th. Totally it was a six-month operation.

This is a relatively large company and started on the path of a corporate data warehouse in 1999. Within a period it understood that the provision they had made was inadequate and the capacities had to be increased by about 20 times. Machines had to be changed to speed up operations. Primarily the objective was to meet the needs of competition and update the organization. Once the system was made available, the decision makers joined in utilizing the system, and more and more jobs were being given. The management saw that the facility was being utilized and backed it up with enough funds for very rapid expansion. This shows that the need was probably underestimated when the warehouse was originally set up. This sort of underestimation also comes from a spirit of conservative management or to being able to understand the potential of the use. Certainly, there is no doubt that the data warehouse has been found useful by the organization. ((Tanaka, 2001)

Costs of IT projects:

One of the problems with any new IT project, including data warehousing is the cost of financing such projects. This is more so in the present times when money is increasingly difficult to come by. For any information technology project, the question most often asked is the expectations in terms of business value. For any successful data warehousing projects, a specific estimate is made of the targets in terms of business value and funded accordingly. This goal is met, and a suitable return on investment realized based on the partnerships between IT, business analysts and the executives in the business sponsoring the project. The return on investment or ROI aspect of IT projects has to be understood clearly. All projects start with the belief that a business value will be demonstrated, and this is often the case, but ROI can be achieved only if the business analysts are clear in their thinking. ROI is achieved only when the total benefits exceed the costs. The excess gained is the ROI. This can be roughly determined by totaling up the costs and the expected benefits. When totaling up the benefits, one should remember that the benefits will be made available over a period of time, and these benefits have to be brought down to today's value by proper discounting. Similarly, the costs also have to be broken down time wise and brought down to the value today. This is quite simple in theory, but the question is how one does it with reasonable accuracy.

Costs are normally known in advance and at least today's prices are known. These consist of 3 parts; labor, hardware and software. All the three are important, and one often neglects to take the cost of labor into account, assuming that the existing staff will do it, but any project has to pay its own costs. Software costs are quite obvious as they include licensing and support. In between the operation of the project additional costs may be incurred due to upgrades. In view of the large number of software vendors, there may be some fine print and costs like Meta data management, security management, and portal integration. These questions should be clarified. Hardware costs include the processing systems and the data storage in disks. Of these the disk costs tend to keep increasing as more data keeps piling up, and often turns out to be the most expensive component.

For these costs, the entire project period cost should be calculated, including the possible requirement for upgrading at a later date. Here open systems are preferred as they will give the project manager greater choice in competitive future pricing. Direct labor costs may be calculated, but the problem is to account the time of consulting that may be required from experts - both internal as well as external. Again as the new systems are implemented, the scope of the project keeps changing. This will change the costs. It may be better to look at it again at different stages of the project's implementation. This will have the benefit of some of the costs having been partially recovered through the business benefits.

In any Information Technology project there are huge benefits to often offset the largest expenses. Most often the benefits are not properly understood. This can be seen from the complaints made in business about the lack of identifying key business indicators or product benefits. The executives often address these problems. Some IT companies try to suggest projects which they say will lead to cost savings. These theories are like suggesting automatic report generation to replace manual reports. Data warehousing often results in faster, more accurate and better attention to queries. Other more advanced tools like data mining and OLAP also can be thought about. The value of better decisions are however very difficult to quantify, and the help of business analysts should be sought. Sometimes, they are clearly visible only after the project has been implemented.

The calculations are often pretty difficult and time consuming - and the end result may not even be very accurate. However, any project that will generate large expenses without showing a corresponding benefit is likely to be rejected outright by the management. Even upgrades may need to be justified. After all, any business has to first look at its main objective - generating profits. (Stackowiak, 2003)

Developing Software:

The importance of collecting and storage of data is today one of the most important functions of an organization. This technology and the developments in this technology are now one of the main reasons for the success of an organization. This gives a better understanding of the clients, and prospective clients. This understanding leads to better service to clients and increase in business that results from the better service. One also gets a clearer idea about the market and the supply chains to feed the market. It partners decision-making and thus helps the organization to gain market share. In short, it is a most useful asset of the organizations. Originally many data systems had been built on these ideas.

Companies to use their data more effectively were using it. When the data was to be used by the generating organization it was called a data warehouse. There were also other users of data like data marts and operational data stores. Today, these data loads have grown to very large sizes and terabyte sized databases are quite common. The other factor is that the data has got locked up in different types of storage systems. Thus accessing data has become difficult in certain cases. This problem is reflected in such new operations like Microsoft's Net, HP's Zero Latency Enterprise, and the Gartner Group's Business Activity Monitoring. This problem may be called the problem of making the organization a real time enterprise.

The problem can be defined as being able to access the different types of data present in the companies ERM systems, data warehouses, storage area networks and all other systems. This has to be done immediately, this requires application integration, data movement, and transformation while the business continues to operate. For many businesses the scale of such operations is huge. Sometimes temporary solutions of collecting and making data available to a particular point are done, but ideally the solution is to make the data available throughout the organization. A new system of Global Data synchronization called Golden Gate Global Data synchronization has come up to tackle this problem and distributes the data in real time with the delivered data at the database layer. This system is compatible with all major database platforms, hardware architectures, and operating systems. This is being used for secure data processing on the busiest shared ATM network in the world. This system has a very high transaction volume. A major teaching hospital uses it to provide continuous information to all doctors and nurses. A major payment solutions provider to provide regular up-to-date information to the customers through a Web site also uses it. This software can move data at sub-second speeds, handle terabyte-sized loads, and interface with a variety of databases and operating systems. It can also be used as a stand-alone platform. (Seashols, 2003)

As we have noted in the case of the telephone companies, the data need to be stored by data warehousing is huge. These volumes could not have been even dreamt of for proper storage in the age prior to the electronic data. One of the major challenges in managing such data is in the area of making such data quickly available. Like the old adage, here is a proof it. The data intensive applications have to be run really fast. Some tools for this purpose are already available. These problems may be in the area of the volumes of the database itself, the processing of batches, ETL, Web log processing, Legacy migration and data mart implementation. The important factor to consider when looking for a solution is not the volume of data to be handled, but the number of times the data will have to be read for processing. A tool can be found to reduce this by speeding up the passes or combing the key steps like summarizing, copying, extracting, filtering, joining, merging, ordering, pattern matching, partitioning, segmenting, transforming, and validating. These processes may be combined with sorting will speed up the loading of the data and this will give the results faster. This is one of the ways in which the data warehousing managers can meet the challenges they face. Some systems like Oracle and Sybase can speed up the loading of large relational databases. This process can cut down time by more than 505 from the time it would have otherwise taken.

Another common problem with databases is the difference in recorded formats and data types. This happens when the data comes not from one source, but from a variety of sources. The system has to assimilate the data and prepare the output in the form that is required for further operations. This happens when the data comes simply from different web sources, or from direct customer queries. Here the solution will be to use a prior aggregation engine. The data can then be used as aggregates. This is the general opinion among most experts for answering queries and other functions. Using the aggregates gives results in minutes and not hours as would have been the case if individual data were to be used.

You’re 80% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2003). Data Warehouse a Strategic Weapon of an Organization. PaperDue. https://www.paperdue.com/essay/data-warehouse-a-strategic-weapon-of-an-151671

Always verify citation format against your institution’s current style guide requirements.