This paper evaluates data mining as a strategic technology by examining its core definitions, foundational relationship types, and the five-process framework that underlies effective implementation. Drawing on applications across aerospace, healthcare, automotive warranty analysis, and customer relationship management, the paper demonstrates how data mining enables organizations to extract competitive intelligence from disparate data systems. It analyzes Google's use of data mining—including latent semantic indexing, search personalization, and process improvement—as an advanced real-world case. The paper also surveys technology trends driving adoption, including AJAX, XML networking, virtualization, and Software-as-a-Service platforms, and concludes by projecting a future in which data mining becomes a fully integrated, role-based Web Service.
The paper demonstrates concept triangulation: it introduces each major term (e.g., data mining, clustering, sequential patterns) using multiple scholarly sources with differing emphases, then synthesizes them into a unified working definition. This approach signals awareness that no single definition is authoritative and models how to build a defensible conceptual framework from a contested literature.
The paper follows a classic analytical essay structure: an introductory section that states scope and significance; a definitions section that builds the conceptual vocabulary; a trend-analysis section that situates the technology historically and competitively; a case study section (Google) that applies the framework; a forward-looking section on future directions; and a conclusion that synthesizes key findings. Each section feeds logically into the next, making the overall argument cumulative rather than episodic.
The ability to quickly gain insights from a diverse and often incompatible set of databases and data sets is made possible when data mining techniques are used. Data mining is the process by which very large datasets are analyzed for trends, patterns, insights, and intelligence not discernible from a cursory or manual examination of the data sets themselves (Osei-Bryson & Rayward-Smith, 2009). It is also the study of how to glean insights and intelligence from data sets that are often not integrated with each other in a common database, adding a further layer of abstraction to the analysis and making interpretation more difficult (Buddhakulsomsiri & Zakarian, 2009).
There is an exceptional level of insight to be gained by evaluating data mining as a strategic technology. The use of data mining for automotive warranties, for example—where a massive amount of data must be interpreted to meet government reporting requirements—illustrates the point concretely (Buddhakulsomsiri & Zakarian, 2009). The intent of this analysis is to evaluate data mining as a strategic technology by examining its definitions, assessing it as a technology trend, analyzing how data mining and its associated technologies are used at Google, and projecting its future direction.
The continual refinement of data mining from a technology into a platform on which solutions for analyzing, monitoring, and defining business intelligence are built continues at an accelerating pace (Osei-Bryson & Rayward-Smith, 2009). Economic uncertainty and the competitive pressure companies face are among the primary factors driving adoption and growth (Li & Wu, 2010). Global economic recessions tend to catalyze information technologies that have the potential to deliver disproportionately large increases in insight and market intelligence. Consequently, companies across all industries are accelerating their use of data mining to gain competitive advantages through the analysis of channels, customers, suppliers, and internal processes.
Examples of data mining abound in industries that have accumulated exceptionally large amounts of customer information. These include aerospace and defense (Cressionnie, 2008), automotive manufacturing including aftermarket warranty analysis and lifetime product quality (Buddhakulsomsiri & Zakarian, 2009), customer relationship management (Sun, 2006), education (Velasquez & Gonzalez, 2010), healthcare (Li & Wu, 2010), and many others. Despite their diversity, all of these industries share a common need to gain greater insight into the interrelationships hidden in structured and unstructured content held within their organizations. They also share the need to use existing data to understand how strategies in place today will yield results in the future (Kuhn, Ducasse, & Girba, 2007). Data mining requires intensive data integration across databases, legacy systems, and often standalone systems, in addition to a redefinition of the most critical processes used for accumulating information in the first place (da Cunha, Agard, & Kusiak, 2010). That intensive integration, however, can yield significant insights that were previously impossible to capture.
Data mining is also leading to the development of text mining applications that ingest massive amounts of unstructured text and construct linguistic models from the data, enabling new insights such as customer sentiment analysis (Li & Wu, 2010). CRM-based implementations of data mining often include sentiment analysis, which provides insights into branding and perceptions of companies gathered through social networks (Sun, 2006). The future of data mining will increasingly incorporate sentiment analysis and the ability to extract attitudinal data from the vast amounts of information being generated across social networks (Lai & Liu, 2009).
Definitions of data mining vary significantly in scope and in what key concepts they include or exclude. The most common definition encompasses four types of relationships: classes, clusters, associations, and sequential patterns (Han & Kamber, 2000). Definitions also vary in their emphasis on the level of insight these processes deliver, with the most recent concentrating on linguistic modeling capable of determining sentiment and attitudinal scaling based on unstructured social network content (Li & Wu, 2010). The more mainstream definition focuses on integrating disparate, often non-integrated systems so that a single system of record can be produced upon which analysis, queries, and advanced extraction can be performed (Berry, 2004). Technologies such as Extraction, Transfer & Load (ETL) and Online Analytic Processing (OLAP) are frequently used to create the reporting and analytical frameworks that organizations use to streamline analysis, reporting, and continual database updating within a data warehouse (Rutledge, 2009).
While these definitions differ, they all share the common mission of unifying the analytical, transactional, and customer-based databases prevalent throughout organizations. Data mining applications are used to identify patterns, relationships, and the relative strength or weakness of causality in data sets, often aiming to bring greater intelligence to transaction-based records (Maggioni, 2009). In many systems the overarching objective is to find greater insight into transactions so that more effective selling and CRM-based strategies can be accomplished (Sun, 2006). Definitions of data modeling also vary in their reliance on underlying technologies for finding relationships within data. Traditionally, statistically based analytics applications were used to examine causality and the strength of interrelationships (Cressionnie, 2008). There are also data mining applications that seek to create neural networks capable of interpolating relationships between data elements and building causal models over time (Han & Kamber, 2000). Google uses data mining not only to determine how users access its search engine and to define personalization (Stamou & Ntoulas, 2009), but also to develop linguistic models through latent semantic indexing (Kuhn, Ducasse, & Girba, 2007), giving the company a better understanding of how to index the Internet.
Classes, clusters, associations, and sequential patterns are the four types of relationships that data mining applications seek to discover and enrich with insight (Stamou & Ntoulas, 2009). Classes are stored data that provide segmentation-based insights, including the purchasing behavior and demographic characteristics of customers. They are frequently used as segmentation criteria across all industries that rely on data mining. Clusters are data items grouped through previously defined customer relationships and preferences; like classes, they are used in the development of market segments. Clustering has also been applied to linguistic modeling to identify customer audiences within segments, including the definition of consumer affinities for particular communication channels and methods of learning about new products (Sun, 2006). Data modeling in this regard has been instrumental in the development of entirely new approaches to managing communications and integrating social networking applications into multichannel messaging strategies.
The third type of relationship that data mining applications capture, validate, and report on is associations. The classic example of husbands and young fathers who purchase beer and diapers in the same grocery store trip illustrates this relationship type (Li & Wu, 2010). The fourth type is sequential patterns, which are used to predict the future behavior of a specific audience or customer segment, including the development of mass customization selections for build-to-order products and services (da Cunha, Agard, & Kusiak, 2010). The use of sequential patterns for cross-sell and up-sell selections in e-commerce systems is becoming more prevalent as this type of data mining gains adoption. Mass customization product strategies are highly dependent on the ability to determine sequential associations between products. Google's use of linguistic modeling and latent semantic indexing is another example of discovering and analyzing sequential patterns over time (Stamou & Ntoulas, 2009), and applying those models to define personalization requirements for each search is an example of data mining taken to a highly individualized level.
All data mining definitions also incorporate five major process steps required for successful implementation (Li & Wu, 2010). The first is the extract, transfer, and load (ETL) of data into the data warehouse so it can be quickly queried and used to build models for continual analysis (Stamou & Ntoulas, 2009). The second is the storing, managing, and use of data within a multidimensional database system. The stability and scalability of this system of record directly determines a data mining application's ability to deliver the relationship data, predictive analytics, and associative accuracy most critical to organizations (Kuhn, Ducasse, & Girba, 2007). The third process is the development of user-based applications that enable queries of data sets, including role-based access over time (Cressionnie, 2008). Role-based access is critically important for CRM-based strategies where reports are used to plan marketing campaigns, predict customer purchase patterns, and gauge response rates to specific promotions (Sun, 2006). Google uses the reporting layer of its data mining applications to provide managers, directors, and senior executives with insights into how its search engine, related products, and language-specific sites are performing, informing the development of new online products and services.
The fourth process involves more advanced applications for analyzing data and presenting it to line-of-business managers and senior management. These advanced applications are essential for creating and continuously monitoring the four types of data relationships (da Cunha, Agard, & Kusiak, 2010). When combined, these four associative models provide a rich set of insights for creating predictive marketing, selling, and service strategies (Sun, 2006). This analytical layer is itself undergoing a revolution as AJAX (Asynchronous JavaScript) and XML networks streamline the Web-based applications used for intensive data mining tasks (Nayak, 2008). The streamlined design of AJAX applications is leading to Web Services that can scale to support more front-end analysis at the client level. The next generation of data mining applications is already being built on AJAX-based technology integrated with XML networks optimized for performance. The fifth and final process is presenting data in useful, readable formats—an area also heavily influenced by AJAX development tools for Web-based data mining applications (Nayak, 2008).
The primary catalyst of data mining's growth continues to be the unmet information needs of organizations seeking competitive advantage from the vast data they have accumulated. The convergence of hardware advances—particularly virtualization of server technologies used to accelerate complex processing tasks (Luo et al., 2006)—with the development of text mining, clustering, and relational analytics engines (Berry, 2004) is radically reshaping the data mining landscape. The acceptance of AJAX as a programming language of choice for data-intensive applications has further accelerated adoption across geographically dispersed organizations (Nayak, 2008). Software-as-a-Service (SaaS) platforms are emerging from these trends, combining virtualization and thin-client computing to make the five process areas of data mining faster and more accessible (Nayak, 2008).
At a more fundamental level, the trend is driven by the unmet needs of both for-profit and non-profit organizations to gain greater intelligence about their customers, operations, and processes. Data mining has created increasingly powerful analytical tools through AJAX programming, .NET, Java (J2EE), and the development of Web Services (Nayak, 2008). A cycle of continuous innovation is underway: technologies continually fuel greater flexibility and depth of analysis while simultaneously creating more efficient approaches to building reports and online scorecards. The net result is continual improvement in how analysis can be tailored to the needs of information users. For the first time, this convergence is enabling role-based access to vast quantities of data analyzed through data mining engines and constraint-based modeling techniques (Sun, 2006). It is also extending the use of predictive analytics models to small and medium businesses as applications are delivered over the Internet (Nayak, 2008).
Organizations are using data mining to drive their Business Intelligence (BI) and advanced data warehousing (DW) strategies, making strategic goals more achievable through greater intelligence and more real-time feedback. User needs are growing more complex and demanding in terms of analytics while data mining, business intelligence, and data warehousing are simultaneously evolving, further raising expectations. This cycle of innovation will continue to accelerate as technological gains enable users of these systems to devise ever more creative ways to exploit the insights they deliver.
Data mining's initial development began as a series of applications for managing database queries and layering in analytics to better understand query results. Its growth is predicated on the four types of associations identified in this analysis and on the five-step process framework. There is now a greater focus on creating role-based data mining scenarios than in previous technology generations, driven by more powerful technological platforms and the development of streamlined XML-based networks connecting systems and databases.
The current generation of data mining systems relies on an extensive system-of-record concept that ensures scalability while providing insightful analysis tailored to each employee's and user's needs. The future of data mining is one where Web Services become more commonplace and where streamlined, browser-like interfaces are used to manage data and analytics. The creation of personalized taxonomies will make context-based and context-aware data mining applications possible. All of these capabilities will be built on virtualization-based platforms that will eventually lead to SaaS-based data mining applications, enabling queries to multiple systems of record from anywhere, at any time. The vision of entirely role-based data access focused on individual user needs will ultimately be attained, and the use of data mining for creating predictive models will become a standard business practice. These factors collectively drive rising user expectations for data mining and will continue to fuel the development of data mining applications for the foreseeable future.
You’re 81% through this paper. Sign up to read the remaining 2 sections.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.