Do anonymous data exist?
Home » The GDPR » Main Concepts » Identification, Pseudonymization, and Anonymization » Anonymization » Do anonymous data exist?

The possibility of identifying individuals in presumed anonymous data has received ample attention under the names of “re-identification” or “de-anonymization”. It has been widely successful and sophisticated techniques have been developed. Overviews of techniques and well-known cases are given for example by Mark Lennox[1], Natasha Lomas[2], Rocher et al.[3] and Dwork et al.[4].

Some kinds of data have been found to be very difficult to anonymize. Most prominently, this holds for location data[5]. Here, even a generalization to country level may not be sufficient[6]. Also, to reduce the identification potential of data, transformation that reduces the level of detail and truthfulness of the data must be applied. The question poses itself of whether successfully anonymized data are still fit for the purposes of processing.

Many scholars have concluded that likely, anonymous data that are still useful may not exist. This was most prominently voiced by Ohm who expresses doubt about the existence of anonymous data in a legal context. He states: “This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention”[7]. From a more technical point of view, Cynthia Dwork, the co-inventor of differential privacy, has coined the phrase “de-identified data isn’t” (i.e., it isn’t de-identified or it isn’t useful data)[8].
References


1Mark Lennox, No such thing as anonymous data, dev.to, Oct 2, 2019, https://dev.to/mlennox/no-such-thing-as-anonymous-data-13kk (last visited 8/4/2021).

2Natasha Lomas, Researchers spotlight the lie of ‘anonymous’ data, TechCrunch, July 24, 2019, https://techcrunch.com/2019/07/24/researchers-spotlight-the-lie-of-anonymous-data/ (last visited 8/4/2021).

3Rocher, L., Hendrickx, J.M. & de Montjoye, YA. Estimating the success of re-identifications in incomplete datasets using generative models. Nat Commun 10, 3069 (2019). https://doi.org/10.1038/s41467-019-10933-3 (last visited 8/4/2021).

4Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, Exposed! A Survey of Attacks on Private Data, Annual Review of Statistics and Its Application 2017 4:1, 61-84, https://privacytools.seas.harvard.edu/files/privacytools/files/pdf_02.pdf (last visited 8/4/2021).

5See for example, de Montjoye, YA., Hidalgo, C., Verleysen, M. et al. Unique in the Crowd: The privacy bounds of human mobility. Sci Rep 3, 1376 (2013).https://doi.org/10.1038/srep01376 (last visited 9/4/2021).

6Ali Farzanehfar, Florimond Houssiau, Yves-Alexandre de Montjoye, The risk of re-identification remains high even in country-scale location datasets, Patterns, Volume 2, Issue 3, 2021, 100204, ISSN 666-3899, https://doi.org/10.1016/j.patter.2021.100204 (last visited 12/8/2021).

7Ohm, Paul. (2009). Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. UCLA Law Review.57. http://www.uclalawreview.org/pdf/57-6-3.pdf (last visited 4/8/2021).

8Cynthia Dwork, Introduction: The Definition of Differential Privacy, Institute for Advanced Study, Four Facets of Differential Privacy, November 12, 2016, https://youtu.be/lg-VhHlztqo?t=180 (last visited 8/4/2021).

Skip to content