Essay Doctorate 1,291 words

Data mining techniques and applications

Last reviewed: June 3, 2014 ~7 min read

Data Mining

Predictive analytics help companies to understand the behavior of consumers. The way that predictive analytics works is that data from the past is used to help refine predictions about the future (CGI, 2013). Companies basically analyzed demand in terms of a wide range of variables in order to arrive at a better estimate for future outcomes than otherwise would have been found. It is basically the same principle as predicting that a colder, snowier winter will help Wal-Mart sell more snowblowers, but with hard data, sophisticated algorithms and reliable outputs -- such as x number of snow days will equal y number of snow blowers sold.

One of the interesting elements of predictive analytics is with associations, and this has been used fairly extensively in retail. Associations discovery is where correlations between things are noted that might not have been apparent. So that link between snow blowers and snow days is an obvious association. Associations discovery might show links between snow blowers and unrelated products. Maybe sales of hot chocolate go up, because people want a hot chocolate after they've been out with the snow blower. The associations are not necessarily intuitive at first, but they say a lot about consumer behavior. Amazon does this when it gives you the "people who bought X also bought Y" prompt. Sometimes the Y is rather obvious -- an album by the same band -- but other times it is not obvious and that is the value of associations.

In recent years, companies have used the Internet as a major source of data gathering. Companies gather information from their customers, process this information for valuable associations and then use those associations to increase sales. The basic gathering of data is called mining, and then processing this data to derive useful information is known as business intelligence. Companies with a high level of access to information are the ones that can win a competitive advantage over their competitors, which highlights that the company with the superior ability to gather data is the company that will have the highest level of success (Nearing, 2013).

When the business intelligence is gathered and the associations are known, companies engage in a technique known as clustering. Clustering is defined as the "unsupervised classification of patterns into groups" (Jain, Murty & Flynn, 1999). Clustering allows for trends to be better identified. Typically the clustering process is automated, which allows for the clusters to be drawn without the interference of human classification. In marketing, this is particularly important. Businesses have frequently clustered their customers -- by demographic, psychographic and otherwise -- and typically this has occurred on the basis of intuition to derive a hypothesis and then test it. An automated clustering process essentially tests thousands of hypotheses at once, confirming back several different clusters that pass statistical significance.

To determine the "reliability" of data mining algorithms some clarification might be needed. The statistical reliability of any given analysis of a data set can be determined in the course of the calculation -- any decent stats program will provide reliability figures for any given analysis on any given data set. It just depends on the sample size, which in most cases is sufficiently large. Where there are errors, they are usually related to issues with the data. For instance, the data set might not be large enough -- no statistical analysis works well on a small data set. The data set also might have false positives -- for example a data set might conclude that people who buy Big Macs prefer Coke and people who buy Quarter Pounders prefer Sprite. While the sample size might be large enough, it might not be dispersed enough -- if all the data was collected in New York, it might not hold across the rest of the country. Or it might hold in most of the country, but not in Miami. Errors in interpreting the data can certainly come where the limitations of the data are not understood.

With data mining, there are considerable privacy concerns. These concerns are probably a little overblown with respect to businesses, because at this point the business generally neither knows nor cares about any particularly personal information. But there are concerns with government gathering personal data, and businesses could in future use personal data against you -- it increases their bargaining power. Olavsrud (2014) notes that privacy is a major issue for consumers, and businesses need to work to protect identities, and they also need to ensure that there is a high level of security with respect to their data sets. The latter is also a major concern for business, not just for their customers, since that data is proprietary and valuable.

4a.

Three concerns that have been raised by consumers with respect to privacy are that there are potential security breaches where their information could be leaked, that companies would know too much about them, and that government could learn too much about them if it used the courts to gain access to the information that companies held about them.

4b.

Each of these concerns is valid. The first is that there are obviously known security breaches, including ones affected credit card information. These are clearly a violation that will harm consumers. Companies knowing too much is probably the least of these concerns. With credit cards and loyalty programs companies actually know a lot about consumers already, so big data might add to that but not in a way that represents a major privacy shift. However, the more information about people that is in the public sphere, the easier it is for government to gain access to that information.

4c.

Concerns can be allayed at the user end. The problem is self-defining in that if people are not worried about something, then the concern simply ceases to exist. This is basically what happened with concerns that people had with respect to buying things over the Internet. Today, that behavior has been normalized and nobody really thinks twice about it. So part of allaying concerns is simply to build a track record of problem-free years that allows the behavior to be normalized.

You’re 85% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime

References

4 sources cited in this paper

CGI. (2013). Predictive analytics. CGI Retrieved June 2, 2014 from http://www.cgi.com/sites/default/files/white-papers/Predictive-analytics-white-paper.pdf ↗
Jain, A., Murty, M. & Flynn, P. (1999). Data clustering: A review. ACM Computing Systems. Vol. 31 (3) 264-323.
Nearing, B. (2013). Mining Internet for chunks of gold. Times Union. Retrieved June 2, 2014 from http://www.timesunion.com/business/article/Mining-Internet-for-chunks-of-gold-5056469.php ↗
Olavsrud, T. (2014). CIOs should push big data projects but prioritize privacy. CIO Magazine. Retrieved June 2, 2014 from http://www.cio.com/article/753612/CIOs_Should_Push_Big_Data_Projects_but_Prioritize_Privacy ↗

CGI. (2013). Predictive analytics. CGI Retrieved June 2, 2014 from http://www.cgi.com/sites/default/files/white-papers/Predictive-analytics-white-paper.pdf ↗
Jain, A., Murty, M. & Flynn, P. (1999). Data clustering: A review. ACM Computing Systems. Vol. 31 (3) 264-323.
Nearing, B. (2013). Mining Internet for chunks of gold. Times Union. Retrieved June 2, 2014 from http://www.timesunion.com/business/article/Mining-Internet-for-chunks-of-gold-5056469.php ↗
Olavsrud, T. (2014). CIOs should push big data projects but prioritize privacy. CIO Magazine. Retrieved June 2, 2014 from http://www.cio.com/article/753612/CIOs_Should_Push_Big_Data_Projects_but_Prioritize_Privacy ↗

CGI. (2013). Predictive analytics. CGI Retrieved June 2, 2014 from http://www.cgi.com/sites/default/files/white-papers/Predictive-analytics-white-paper.pdf ↗
Jain, A., Murty, M. & Flynn, P. (1999). Data clustering: A review. ACM Computing Systems. Vol. 31 (3) 264-323.
Nearing, B. (2013). Mining Internet for chunks of gold. Times Union. Retrieved June 2, 2014 from http://www.timesunion.com/business/article/Mining-Internet-for-chunks-of-gold-5056469.php ↗
Olavsrud, T. (2014). CIOs should push big data projects but prioritize privacy. CIO Magazine. Retrieved June 2, 2014 from http://www.cio.com/article/753612/CIOs_Should_Push_Big_Data_Projects_but_Prioritize_Privacy ↗

Cite This Paper

PaperDue. (2014). Data mining techniques and applications. PaperDue. https://www.paperdue.com/essay/predictive-analytics-189647

Always verify citation format against your institution’s current style guide requirements.