Options to deal with presumed anonymous data?

The present section discusses how controllers can deal with the uncertainty of assessing the “success state” in terms of which (truly) anonymous is defined. It first briefly reflects on the sources of the uncertainty and then discusses the options that stand at the disposition of controllers.

Anonymous has been defined as a “success state” that no actor can identify data subjects in the data with means reasonably likely to be used. Whether the “success state” applies often depends on the possible external actors, their know-how about re-identification/de-anonymization methods, the additional information they have at their disposition, the resources they are likely to employ, and the state of technology potentially decades into the future. It is likely impossible for controllers to obtain sufficient information about these factors.

Consequently, the evaluation of “success states” is in many cases a highly difficult task for controllers and the resulting assessment is often plagued by a significant level of uncertainty. The following looks in more detail at how controllers can best manage this uncertainty and the resulting risks.

Controllers must decide even before the time of creation of a data set (through data collection from data subjects or by derivation from another data set) what kind of data they are dealing with. Even if the controller presumes that the data are anonymous, due to the uncertainty, one of the following cases could occur:

Identification is already possible,
identification will eventually be possible, or
the data is “truly” anonymous.

In the former two cases, the data are personal and the GDPR is applicable^[1]; in the latter case it isn’t. Considering the potentially significant uncertainty in the assessment of the type of data, the following two risks emerge:

Controllers erroneously classify personal data as anonymous and consequently fail to comply with the requirements of the GDPR, and
controllers, possibly out of prudence, treat anonymousdata as if they were personal and make an unnecessary effort of implementing the requirements of the GDPR.

Figure 23 gives an overview of all possible cases. The lines represent the possible actual data types; the columns show the decision by the controllers whether to treat the data as anonymous or personal data, respectively.

Data:	Treat as anonymous data	Treat as personal data
“truly” anonymous	[correct classification] GDPR-compliant Obligations according to GDPR: in some cases, a Data Protection Impact Assessment (DPIA) is required before anonymization^[2]	[incorrect classification] GDPR-compliant (extra effort is allowed) Obligations according to GDPR: none, but implementation of measures insures against consequences of classification error.
identification will eventually be possible	[incorrect classification] GDPR violation Potentially irreparable damage for data subjects Obligations according to GDPR: Mandatory damage control, possible termination of processing, consequences of GDPR violation and potential liability claims	[correct classification] GDPR-compliant Obligations according to GDPR: Implementation of technical and organizational measures.
*identification is already possible*	[incorrect classification] GDPR violation Potentially irreparable damage for data subjects Obligations according to GDPR: Mandatory damage control, possible termination of processing, consequences of GDPR violation and potential liability claims	[correct classification] GDPR-compliant Obligations according to GDPR: Implementation of technical and organizational measures.

Figure 23: The different options available to controllers to deal with presumed anonymous data.

The sequel describes in more detail the obligations facing a controller when it is discovered that the classification of data as anonymous was incorrect. It covers in particular the following:

What are examples for the potentially irreparable damage and disadvantages for data subjects?
What are the possible consequences of a GDPR violation?
In what consists the mandatory damage control?
How substantial is the effort of treating presumed anonymous data as being personal when there is any doubt?

Potential damage and disadvantage to data subjects

The very objective of the GDPR is to protect the rights and freedoms of data subjects when their personal data is being processes by controllers. When personal data is processed without observing the obligations of the GDPR, data subjects are therefore deprived of their rights and freedoms.

For example, when data is erroneously presumed to be anonymous, data subjects are typically not informed about the processing of their data (lack of transparency), and thus cannot exercise their rights, such as objecting to the processing on the basis of their specific situation. Beyond this, the data may not be managed with the safeguards prescribed by the GDPR. This deprives data subjects of the necessary protection and exposes them to increased risks of disadvantage or damage. Further, when controllers fail to have a legitimate legal basis, the power imbalance between controller and data subject is tilted all the way in favor of the controller.

It is evident that the above consequences cannot be remedied in retrospect.

Beyond the above impact on the rights and freedoms of data subjects, data subjects can experience irreparable damage. Assume for example that unsuccessfully anonymized medical data about some sensitive disease (such as HIV) get published and later, it is found out that some of the data subjects can be identified. As a result, these data subjects may suffer highly adverse consequences at their workplace, in their career, as well as their relationships.

It is also evident here that once such damage is done, it is irreversible and beyond remediation.

Consequences of a GDPR violation

In the options above, the GDPR was violated when personal data was treated as if it were anonymous. In this case, the controller typically assumed that the processing was not subject to the requirements of the GDPR and did not satisfy its requirements.

The extended version of this analysis (see https://uldsh.de/PseudoAnon) provides reasons why this situation could be considered to be a data breach according to Art. 4(12) GDPR. It is a cautious course of action for controllers to treat it as such.

According to Art. 33(1) GDPR, “[i]n the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority competent in accordance with Article 55, unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons.” The decision not to notify a personal data breach can thus only be made on the basis of a risk assessment.

Evidently, in any case, controllers have to take rapid actions to satisfy the GDPR requirements (which wouldn’t have been necessary for anonymous data). This is discussed in the following subsection.

Mandatory damage control when presumed anonymous data is discovered to be personal

The following looks in further detail what obligations of the GDPR were disregarded when data was wrongly assumed to be anonymous and what damage control is required. The following provides a short summary of the extended analysis of this topic (see link in section 2 above).

Since the processing needs to comply with the GDPR, all its requirements must be met as rapidly as possible or else any further processing has to be terminated.

The following summarizes the kinds of actions that are required to contain the damage. It looks at past and present processing operations:

Past processing operations:

Create retrospective compliance where possible (e.g., retrospectively finding a legal basis).
Implement retarded compliance (e.g., informing data subjects about the processing, processing of data subject right invocations).
Reverse effects of unlawful processing (e.g., deleting data and results).
Report irreversible effects of unlawful processing to the competent supervisory authority.
Inform possible third party recipients of the need for equivalent damage control action.

Present processing operations:

Stop processing until indispensable pre-requisite obligations are fulfilled (e.g., legal basis, DPIA).
Satisfy obligations as quickly as possible during processing (e.g., designate a DPO, create more efficient processes to handle data subject rights, implement additional and improved technical and organizational measures).

The most critical aspect of the damage control action is how to handle irreversible effects of unlawful processing. This includes (but is not necessarily limited to):

Unlawful transfer of data to third party recipients (possibly even in third countries),
unlawful publication of data, and
irreversible effects of unlawful processing on data subjects (such as decision-making affecting data subjects^[3])

Implementing GDPR requirements for presumed anonymous data

The previous two subsections have discussed the consequences when a controller falsely treats data as anonymous but finds out at a later point that it is personal after all. This subsection briefly looks at what exactly has to be done to “play it safe” and treat presumed anonymous data as personal data.

The effort is usually quite contained, at least for organizations who are already familiar with the requirements of data protection^[4].

The most significant difference as compared to treating the data as anonymous is that confidentiality is required. Publication of the data, i.e., disclosure to arbitrary third party recipients, is evidently the contrary of confidentiality. In fact, the disclosure to selected recipient is possible, when there is a valid legal basis for such disclosure.

In any case, the controller disclosing data to third parties must render it clear that the data are considered to be personal data and require the protections afforded to data subjects by the GDPR.

A best practice to propagate the necessary obligations and limitations to recipients is through the stipulation of a legal agreement. This has a similar role as a legal agreement that does the same for processors (see Art. 28(3) GDPR). An U.S. example^[5] of such an agreement from research practice with pseudonymous (and likely presumed anonymous) data is in common use by the Healthcare Cost and Utilization Project (HCUP)^[6]. Before the stipulating the contractual agreement, HCUP even vets recipients and requires, among other things, that they pass a test showing that they understand their responsibilities^[7].

Such a contractual agreement between a controller and a third party recipient could regulate the following:

Obligation to treat the data as personal data under the GDPR including implementing measures that guarantee confidentiality;
Potentially an obligation to report any breach of confidentiality to the controller;
Prohibition of any attempt of re-identification or de-anonymization;
Obligation to refrain from further disclosing the data to external recipient or, alternatively, to do so under the same contractual conditions;
Potentially the obligation to report any (successful or failed) attempt of re-identification or suitable emerging methodology thereof to the controller;
Potentially a limitation of the purposes for which the data can be used (e.g., in the case where the initial disclosure was based on consent);
Potentially, where the data permits this, a certain technical protocol for the notifications on the invocations of data subject right invocations according to Art. 19 GDPR.
Potentially an obligation to terminate processing and delete the data in presence of any violation of the agreement.

Avoiding publication and other forms of disclosure that are not bound to obligations removes the major issue of irreversible actions that was discussed during the damage control effort. Confidentiality and controlled disclosure are thus the most important component of an insurance against incorrect classification of the data.
References

¹This is because the concept of “the means reasonably likely to be used” is inherently a forward looking criterion. ↑

²For example, in Germany, in the private sector, the list according to Art. 35(4) GDPR of processing applications that require a Data Protection Impact Assessment, include the anonymization of special categories (according to Art. 9 GDPR) of personal data. See Nr. 15, page 4, https://www.lda.bayern.de/media/dsfa_muss_liste_dsk_de.pdf (last visited 12/8/2021). ↑

³An example for such decision making would be the refusal of a credit or service, or the denial of a right. ↑

⁴For example, such organizations already have knowledge of their obligations and have appointed a DPO (if required). ↑

⁵https://www.hcup-us.ahrq.gov/team/NationwideDUA.jsp (last visited 10/5/2021). ↑

⁶https://www.hcup-us.ahrq.gov/ (last visited 10/5/2021). ↑

⁷See https://aircloak.com/the-five-private-eyes-part-1-the-surprising-strength-of-de-identified-data/ under HCUP, (last visited 10/5/2021). ↑