Data re-Identification

From GM-RKB
Jump to navigation Jump to search

A Data re-Identification is a Data Anonymization that ...



References

2019

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Data_re-identification Retrieved:2019-11-15.
    • Data re-identification or de-anonymization is the practice of matching anonymous data (also known as de-identified data) with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process. The de-identification process involves masking, generalizing or deleting both direct and indirect identifiers; the definition of this process is not universal, however. Information in the public domain, even seemingly anonymized, may thus be re-identified in combination with other pieces of available data and basic computer science techniques. The Protection of Human Subjects ('Common Rule#Signatories'), a collection of multiple U.S. federal agencies and departments including the U.S. Department of Health and Human Services, speculate that re-identification is becoming gradually easier because of “big data” - the abundance and constant collection and analysis of information along the evolution of technologies and the advances of algorithms. However, others have claimed that de-identification is a safe and effective data liberation tool and do not view re-identification as a concern.

      More and more data are becoming publicly available over the Internet. These data are released after applying some anonymization techniques like removing personally identifiable information (PII) such as names, addresses and social security numbers to ensure the sources' privacy. This assurance of privacy allows the government to legally share limited data sets with third parties without requiring written permission. Such data has proved to be very valuable for researchers, particularly in health care.

      A 2000 study found that 87 percent of the U.S. population can be identified using a combination of their gender, birthdate and zip code. Others do not think that re-identification is a serious threat, and call it a "myth"; they claim that the combination of zip code, date of birth and gender is rare or partially complete, such as only the year and month birth without the date, or the county name instead of the specific zip code, thus the risk of such re-identification is reduced in many instances.