Paper Example Undergraduate 3,527 words

Data Mining Evaluating Data Mining

Last reviewed: February 8, 2010 ~18 min read

Data Mining

Evaluating Data Mining as a Strategic Technology

The ability to quickly gain insights from a diverse and often incompatibles set of databases and data sets are possible when data mining techniques are used. Data mining is the process by which very large datasets are analyzed for trends, patterns, insights and intelligence not discernable from a cursory analysis of the data sets themselves through manual means (Osei-bryson, Rayward-smith, 2009). Data mining is the study of how to glean insights and intelligence from data sets which are often not integrated with each other in a common database, further adding a level of abstraction to the analysis, making its interpretation even more difficult (Buddhakulsomsiri, Zakarian, 2009). There is an exceptional level of insights that can be gained by evaluating data mining as a strategic technology. The use of data mining for auto warranties for example (Buddhakulsomsiri, Zakarian, 2009) where there is a massive amount of data to interpret in completing government reporting requirements, is a case in point. The intent of this analysis is to evaluate data mining as a strategic technology.

Evaluating Data Mining as a Strategic Technology

The continual refinement of data mining from a technology to platform on which solutions for analyzing, monitoring and defining are built continues at an accelerating pace (Osei-bryson, Rayward-smith, 2009). The levels of economic uncertainty and the need companies have to compete using intelligence is one of the primary factors driving its adoption and growth (Li, Wu, 2010). Global economic recessions tend to be the catalysts of information technologies that have the potential to deliver inordinately large increases in insight, competitive and market intelligence. The use of data mining is accelerating as a result of companies across all industries seeking to gain a competitive advantage through analysis of their channels, customers, suppliers and own internal processes as well.

Examples of data mining abound in industries that have an exceptionally large amount of information they have collected form customers. This includes but is not limited to aerospace and defense (Cressionnie, 2008), auto manufacturers including aftermarket auto warranty analysis and lifetime product quality of automobiles (Buddhakulsomsiri, Zakarian, 2009), customer relationship management (Sun, 2006), eduation (Velasquez, Gonzalez, 2010), healthcare (Li, Wu, D2010) and many others. Despite the diversity of these industries they all share a common need for gaining greater insights into the interrelationships hidden in structured and unstructured content in their organizations. All also share the need for using the data in their companies for getting an understanding of how strategies in place today will yield results in the future (Kuhn, Ducasse, Girba, 2007). Data mining also requires an intensive level of data integration across databases, legacy and often standalone systems, in addition to a redefining of the most critical processes used for accumulating information in the first place (da Cunha, Agard, Kusiak, 2010). The intensive nature of data, system and process integration however can yield significant insights and intelligence not capable of being captured before.

The intent of this analysis is to evaluate the essentials of data mining include its definitions, assess data mining as a technology trend, analyze how data mining and its many associated technologies are managed and used at Google, and assess the future direction of data mining as well. Data mining is also leading to the development of text mining applications that take in massive amounts of unstructured text and create linguistic models from the data so new insights can be found including the emerging field of customer sentiment analysis (Li, Wu, 2010). CRM-based implementations of data mining often include sentiment analysis which provide insights into branding and perceptions of companies obtained through social networks (Sun, 2006). The future of data mining is going to include sentiment analysis and the ability to ascertain attitudinal data from the massive amounts of data being generated from social networks (Lai, Liu, 2009).

Defining Data Mining

Definitions of data mining vary significantly in scope and inclusion or exclusion of key concepts. The most common definition includes the four types of relationships including classes, clusters, associations and sequential patterns (Han, Kamber, 2000). Data mining definitions also vary in their reliance on the level of insight and intelligence that these processes deliver, with the most recent concentrating on linguistic modeling being able to determine sentiment and attitudinal scaling based on social networks' unstructured content (Li, Wu, 2010). The more mainstream definition of data mining however concentrates on the integration of disparate, often non-integrated systems together so that a single system of record can be produced upon which analysis, queries and advanced extraction can be performed (Berry, 2004). The use of Extraction, Transfer & Load (ETL) technologies and Online Analytic Processing (OLAP) are often used for creating reporting and analytical frameworks that organizations use to streamline the analysis, reporting and continual updating of databases in a data warehouse, which is used for completing data mining tasks (Rutledge, 2009).

While there are major differences in these definitions of data mining, they all share the common mission of unifying the analytical, transaction and customer-based databases that are prevalent throughout organizations. Data mining applications are used for determining patterns, relationships and the relative strength or weaknesses of causality in data sets, often looking to bring greater intelligence to transaction-based records and databases (Maggioni, 2009). In many data mining systems the overarching objective is to find greater levels of insight into transactions so that more effective selling and CRM-based strategies (Sun, 2006) can be accomplished. Definitions of data modeling also vary in terms of their reliance to the underlying technologies for finding relationships in the data itself. Traditionally statistically-based analytics applications were used for looking at causality and the strength or weakness of interrelationships in the data itself (Cressionnie, 2008). There are also data mining applications that seek to create neural networks (Han, Kamber, 2000) that can interpolate the relationships between data elements and create causal-based models over time. Google is using data mining not only to determine how users are accessing their search engine, for the definition of personalization (Stamou, Ntoulas, 2009) and for the development of linguistic models through latent semantic indexing (Kuhn, Ducasse, Girba, 2007) which gives the search engine provider a better understanding of how to index the Internet.

Classes, clusters, associations and sequential patterns are the four types of relationships that data mining applications seek to discover and add insight to (Stamou, Ntoulas, 2009). Classes are as the name suggests stored data that provide segmentation-based insights, including the purchasing behavior of customers and their demographic characteristics. Classes are often used as segmentation criteria across all industries that rely on data mining. Clusters are the second type of relationship that data mining applications look for in analyzing data sets and systems of record (Stamou, Ntoulas, 2009). Clusters are data items that are grouped through previously defined customer relationships and preferences, and as a result these are also used in the development of market segments. The use of clustering has also been used in the development of linguistic modeling to determine customer audiences within segments including the definition of consumer affinities for given channels of communication and methods of learning about new products (Sun, 2006). Data modeling in this regard has been instrumental in the development of entirely new approaches to managing communications and the integration of social networking applications into the multichannel messaging strategies of companies as well. The third type of relationships that data mining applications look to capture, validate and report on is associations. The classic connection of husbands and young fathers who purchase beer and diapers in the same grocery store run is an example of this type of relationship (Li, Wu, 2010). The last type of relationship that data mining applications seek to find are sequential patterns that are used for predicting future behavior of a specific audience or customer segment including the development of mass customization selections for build-to-order products and services (da Cunha, Agard, Kusiak, 2010). The use of sequential patterns for the development of cross-sell and up-sell selections in e-commerce systems is becoming more prevalent as this type of data mining gains adoption and integration into e-commerce platforms. The development of mass customization product strategies is highly dependent on this ability to determine sequential associations between products as well. The use of linguistics modeling and latent semantic indexing within Google is another example of how this approach to discovering and analyzing sequential patterns over time (Stamou, Ntoulas, 2009). The use of these linguistic models to also determine specific personalization requirements for each search on Google is an example of data mining taken to a highly personalized level (Stamou, Ntoulas, 2009).

The foundation of all data mining definitions also include five major elements that illustrate the major process steps required for data mining applications to be successful (Li, Wu, 2010). These include the first stage of extract, transfer and load (ETL) of data into the data warehouse systems (Stamou, Ntoulas, 2009) so the data can be quickly queried and used to create models for continual analysis of data sets. The second process area of data mining is the storing, managing, and use data in the context of a multidimensional database system. The use of databases as the system of record is a common step across all data mining definitions and is critically important in creating a standardized set of query commands and data models for use. To the extent a system of record in a data mining application is stable and scalable is the extent to which a data mining application will be able to deliver the critical relationship data, predictive analytics and accurately reflect the associations most critical to companies (Kuhn, Ducasse, Girba, 2007). The uses of multidimensional database systems are essential for creating the system of record on which data mining applications are based on. Data warehouses are the system of record these data mining applications rely on for completing more extensive analysis of the data sets they have available. The third process is the development of user-based applications that make queries of the data sets possible, including role-based access of the data over time (Cressionnie, 2008). Role-based access of data mining application data is critically important in the development CRM-based strategies where reports are often used for planning marketing campaigns and strategies, predicting customer purchase patterns and response rates to specific promotions (Sun, 2006). Google uses the reporting layer of their data mining applications to provide their managers, directors and senior executives with insights into how their search engine, related products and services, and specific language sites are performing over time. This data is invaluable to Google in creating new online products and services that stand a higher probability of success given their being based on the needs of customers, discovered through data mining. The fourth process is the more advanced applications used for analyzing the data and presenting it in an application that can be used by line-of-business managers, directors and senior management. Advanced applications are critically important for data mining applications to be able to create and continually monitor the four types of relationships in data (da Cunha, Agard, Kusiak, 2010). These four associative models when combined also provide a rich set of insights and intelligence for creating predictive marketing, selling and service strategies (Sun, 2006). Analyzing the data through the use of application software is also going through a revolution of its own today as AJAX (Asynchronous JavaScript) and XML networks are also streamlining the use of Web-based applications that are used for intensive data mining tasks. The streamlined design of AJAX application is leading to Web Services that can scale to support more of the front-end analysis at the client level of networks (Nayak, 2008). The next generation of data mining applications, which will be discussed at the end of this analysis, is already being built on AJAX-based technology that integrates to XML networks that have been optimized for performance gains. The last process area is that of presenting data in useful and readable formats, another area being highly influenced by the adoption of AJAX development languages and tools for Web-based data mining applications (Nayak, 2008).

Assessing Data Mining as a Technology Trend

The catalyst of data mining's growth continues to be the unmet information needs within organizations that are seeking to gain a competitive advantage from the vast data they have accumulated. The convergence of hardware advances in virtualization of server technologies and their use for accelerating complex processing tasks (Luo, Lu, Huang, He, Shi, 2006) in conjunction with the development of text mining, clustering and relational analytics engines (Berry, 2004) is drastically re-ordering the data mining landscape. In addition the acceptance of AJAX as a programming language of choice for data-intensive applications has also served to accelerate the adoption of data mining throughout geographically dispersed organizations (Nayak, 2008). Software-as-a-Service (SaaS) platforms are also being created as a result of these trends including virtualization and AJAX or then client computing (Nayak, 2008). These technologies are making it possible to more quickly and thoroughly define the associations in data and also progress through the five process areas mentioned in the previous section of this analysis.

The more fundamental catalysts of this technological trend of data mining however are found in the unmet needs of organizations, both for-profit and non-profit, to gain greater insights and intelligence into their customers, operating and processes. The role of data mining has been one of creating greater analytical tools through the use of AJAX programming, .NET, Java (J2EE) and the development of Web Services (Nayak, 2008). There is a cycle of continuous innovation occurring today as a result. The technologies are continually fuelling greater flexibility and depth of analysis, while at the same time creating more efficient approaches to creating reports and online scorecards. The net result of these improvements in usability is a continual improvement in how the reports and analysis can be tailored to the needs of information users. For the first time this convergence of technologies and needs is leading to roles-based access of vast amounts of data analyzed through data mining engines and constraint-based modeling techniques (Sun, 2006). This is also fueling the use of data mining for more predictive analytics models in small and medium businesses as the applications are being delivered over the Internet (Nayak, 2008). Organizations are using data mining to also drive their strategies for Business Intelligence (BI) and advanced data warehousing (DW) platforms and programs that are making strategies more accomplishable through greater intelligence and more real-time feedback. In conclusion the needs of users are growing more complex and demanding in terms of analytics while data mining, business intelligence, and data warehouses are also evolving, further expanding the expectations. This cycle of innovation will continue to accelerate as technology gains are made while users of these systems devise creative new ways to use the data and capitalize on the insights they deliver.

Use of Data Mining at Google

Google's uses of data mining are both for the search services it delivers in addition to the extensive CRM platforms and systems used for targeting new corporate accounts, defining customer and audience segments, and devising new approaches to serving advertisers. Of all these customer groups, advertisers are the larger single source of revenue the company has due to their AdWords program. Google uses data mining to determine how effective their advertisers are with specific programs, to track trends of specific queries, determine how to improve the performance of their servers and virtualization routines, and also how to determine which are the best new potential products to launch. The Google latent semantic indexing technology is used for pattern matching (Buddhakulsomsiri, Zakarian, 2009) in addition to linguistics modeling and analysis. Google uses these technologies to create predictive linguistic models that assist the company in managing the search process more effectively. The use of latent semantic indexing actually creates more effective uses of computing time the company has on its servers, in addition to making the search models themselves more effective and streamlined in terms of linguistic associations made (Berry, 2004). Google has the goal of creating a data mining technology that is intelligent and self-learns patterns in data over time so that queries of their search engine and its associated products can be more efficient.

You’re 82% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime