Paper Example Undergraduate 1,179 words

Preserving Privacy of Individuals in Data Mining

Last reviewed: October 20, 2017 ~6 min read

Introduction
There is exponential growth in the amount of data collections that contain person-specific information. The organizations that collect this data are entrusted to ensures that the data remains private and that no external entities have access to the data. However, there are instances that the data can be beneficial to researchers and analysts in their attempts to answer numerous questions. In many cases, organizations would like to share this data while protecting the privacy of the individuals. In an attempt to protect the privacy, it becomes hard for the organization to preserve the utility of the data, which would result in less accurate analytical outcomes (Sweeney, 2002). The data owner would like to have a way that they can transform datasets containing highly sensitive information into privacy-preserving records that they can easily share with other researchers or corporate partners. However, there have been numerous cases of organizations releasing datasets that they believe are anonymized only for the records to be re-identified. Therefore, it is vital for organizations to understand how the anonymizations techniques work and assess how they can be safely applied to datasets. This is where k-anonymity comes into play. K-anonymity is a privacy model that is applied in order to protect the data subjects' privacy when sharing data. A release of data is considered to have k-anonymity property if the data for each individual contained in the release cannot be distinguished from at least one k-1 individuals whose data also appears in the release. K-anonymity reduces the risk of re-identification of any anonymized data by ensuring that any linkages to other datasets are not possible. Using k-anonymity property one is able to make the dataset less precise and ambiguous in some way while preserving its usability for research or other purposes (Fung, Wang, Fu, & Philip, 2010).
The Article’s Proposed Method/Approach
The article being reviewed is titled “The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.” The article combines generalization and suppression in order to ensure that there is less likelihood of the dataset records being re-identified (Kohlmayer, Prasser, & Kuhn, 2015). The generalization method replaces individual values of attributes with a broader category thus preventing the re-identification of the individual values. For example, a value ‘19’ that is of the age attribute could be replaced with ‘? 20’. This would anonymize the values for age and make it hard for re-identification to occur. Suppression of values entails the replacement of certain values of the attributes with an asterisk. All or some of the values found in a column could be replaced by the asterisk. For example, the values of the attribute name could be all replaced with an asterisk or some of the values for zip code could be replaced with asterisks.
These two methods have limitations and combining the two methods into one decreases the risk of the data being re-identified. Kohlmayer et al. (2015) posit that combining the two techniques there is the preservation of the truthfulness of the information in the dataset. It is also possible for the dataset to preserve the privacy of the individuals when the two methods are used together. Any information that is left out by one of the methods can be easily eliminated by the other method and this will ensure that the released dataset does not violate the privacy of the individuals. It has been noted that many of the privacy problems that are monotonic when the data is only transformed using generalization become non-monotonic if there is a combination of generalization with tuple suppression. However, Kohlmayer et al. (2015) have noted that generalization might lead to merging if previously suppressed non-anonymous equivalence class with anonymous equivalence class. This would result in a non-anonymous class that can't be suppressed because of the suppression limit, thus rendering the overall dataset non-anonymous.
Addressing and Mitigating Privacy Concerns in Data Mining
According to Sweeney (2002) generalized data can over time be subject to temporal inference attack. This is possible if the subsequent release of information does not consider the initially released information. Failure to make this consideration will result in the revelation of sensitive information and this would compromise the k-anonymity protection. Therefore, to ensure that there is no inference attack one should consider joining other external information before releasing new information. This problem can be overcome if there is a combination of the two methods of generalization and suppression. It is not easy for organizations to recall all the information they might have released over time, and this makes it hard to when using generalization to ensure that the information released follows the same attributes. However, when there is a combination of the two approaches, the information released can always be guaranteed to protect the identity of the individuals.
Another concern with data mining is the loss of information that results from anonymizing the data with generalization. It has been noted that there is increased data loss, which makes the released information less useful and limiting for the purposes that it might have been intended for. In order to prevent this data loss, the use of generalization and suppression is proposed. Combining the two methods results in the reduction of data loss by up to 90 percent. This decrease means that the information released is privacy-protected and the data can be used by researchers without there been any missing information. Complementary release attack can also be prevented by ensuring that the released information conforms to a particular table. Eliminating the risk of linkage will ensure that the released data is privacy-protected and any subsequent releases are only based on the initially released information (Sweeney, 2002).
Summary
This paper looked at k-anonymity by considering how it has been used by other researchers. The combination of both generalization and suppression has been shown to be most effective in ensuring that information is privacy protected and there is no likelihood of the individuals being re-identified. Privacy-protection is a concern that many organizations have struggled with when they want to release information that would be beneficial for research purposes. However, by employing and using generalization and suppression, the organization can rest assured that it is not violating the privacy of the individuals whose information it holds. Using the two methods also reduces the limitation of each method by combining the strengths of the two methods. Eliminating the limitations also ensures that the information released conforms to privacy-protection strategies. Data loss due to the usage of one method is also reduced when there is a combination of the two methods. Data loss prevents the information released from being used for the intended purposes.


References
Fung, B. C., Wang, K., Fu, A. W.-C., & Philip, S. Y. (2010). Introduction to privacy-preserving data publishing: Concepts and techniques. Boca Raton, FL: CRC Press.
Kohlmayer, F., Prasser, F., & Kuhn, K. A. (2015). The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. Journal of biomedical informatics, 58, 37-48.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570.
 

You’re 100% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2017). Preserving Privacy of Individuals in Data Mining. PaperDue. https://www.paperdue.com/essay/preserving-privacy-of-individuals-in-data-2166259

Always verify citation format against your institution’s current style guide requirements.