Technical and organizational measures for pseudonymization

The following provides more detail on additional technical and organizational measures that a controller can consider to implement in the context of pseudonymization. It focuses on both, (i) measures to which the split-off additional information is subjected and that enforce the required separation and (ii) measures to prevent direct-identification of the strictly pseudonymous data.

(i) Measures to protect the additional information:
The following lists measures that implement the separation of split-off additional information from the processing of the strictly pseudonymous data. The additional information is necessary to re-identify the pseudonymous data and thus to exit the realm of pseudonymization. The following measures prevent or control such an exit.

Technical measures such as encryption of additional information, when it is data at rest, or access control, when it is data in use, are obviously necessary measures. Access control includes authentication, authorization and logging of access (creating an audit trail).
As recommended in Recital 29 GDPR, the controller should explicitly authorize the personnel who have access to the split-off additional information and can thus exit the realm of pseudonymization. It is good practice to document such authorizations and to keep them up to date following fluctuations in personnel.
The conditions under which access to the split-offadditional information (and thus re-identification) is authorized by the controller shall be explicitly specified and documented.
The procedures to be followed when accessing split-off additional information for re-identification could be authorized and documented by the controller. Such a procedure can for example ascertain that all the access conditions have been verified and that access is properly authorized.
Since the access to split-off additional information is typically the key to re-identification, a more comprehensive procedure that captures the complete re-identification could be defined. In addition to accessing split-off additional information, in such a procedure also strictly pseudonymous data has to be accessed. The procedure could then, for example, minimize the re-identified data by restricting the used additional information to that of a single data subject and limiting the associated pseudonymous data to just those data elements that are relevant for the purposes.
An audit trail could be created that documents the decision to access split-off additional information, its justification, and its responsible decision maker.
While Recital 29 states that it is possible that the additional information is kept by the same controller, instituting an independent internal entity or an external (trusted) third partyto guard and technically control access to the split-off additional information^[1] provides an even stronger separation. These entities can then better defend the interests of data subjects, potentially even against the interests of the controller.
Additional organizational measures can ensure that the personnel dealing with these tasks is aware of the correct behavior (e.g., via training) and is possibly legally bound (e.g., through a formal agreement to follow the above rules and procedures).

(ii) Measures to protect the strictly pseudonymous data:
While not explicitly stated in Art. 4(5) GDPR, controllers (and processors) shall also implement technical and organizational measures to protect the strictly pseudonymous data. These measures aim at preventing (direct) identification of data subjects in these pseudonymous data.

The key measure to prevent (direct) identification of data subjects in the pseudonymized data is a sufficientdata pseudonymization that is far-reaching enough to prevent direct identification. For example, a data pseudonymization that only removes unique handles from the data may be insufficient since direct identification of data subject is still possible based on unique values or combinations thereof.
Pseudonymous data are still personal data and therefore require confidentiality. This excludes any unauthorized external or internal party from accessing the data. Confidentiality measures typically include an access controlsystem that including authentication, authorization and maybe logging of access^[2].
The controller should generally keep the group of persons assignedto work on the pseudonymized datadistinct from those authorized to access the split-offadditional information. This helps to impose restrictions on re-identification: For example, this makes it possible to restrict the amount of pseudonymous data that is being re-identified to a necessary subset; or it permits to limit re-identification to only selected data subjects. If a single person had access to both, all the pseudonymized data and all the split-off additional information, such restrictions become very difficult or impossible to implement.
When determining the recipients to whom the pseudonymous data is disclosed, if necessary and possible, a controller could verify potential motivations to re-identify the pseudonymous data. Where recipients are persons, a close relationship with the data subjects could be an indication of a potential motivation, such as curiosity. For instance, the fact that employees are working with pseudonymous data about a group of persons to which they belong or once belonged to, could point to a motivation of finding out who is behind certain pseudonymous data.
Similarly, where the recipient is a commercial enterprise who could identify potential customers in the pseudonymous data, a controller may want to verify whether a particular motivation for re-identification exists.
Such vetting could also be used to identify personnel likely to possess specific knowledge about data subjects which permits to recognize (i.e., identify) persons in the data set. Again, a relationship between the personnel and data subjects could be an indicator.
Since it is probably unfeasible to determine what knowledge personnel could possibly possess about data subjects, a controller may consider to implement ways for employees to declare a possible “conflict of interest” and thus avoid to work with certain data records. These can then be processed by other employees who do not have such a conflict of interest. Such a conflict of interest may for example be recognized by the fact that a data subject resides in the same general area as the employee processing the data.
In a similar fashion, a controller can try to assign data to work on in a way to reduce the potential of employees recognizing data subjects. For example, a national enterprise can assign data records from one geographic region to be processed by personnel from another geographic region to render it less likely that data subjects are acquainted with personnel.
The controller should consider to specify a procedure to handle the case where an employee recognizes (i.e., identifies) a data subject in spite of the measures taken. The employee should report such a fact to the controller and be obliged to non-disclosure. The controller should then take steps to control possible damage arising from the identification to the data subject^[3]. Further, it may be considered to notify the concerned data subject of the “breach”^[4].
User interfaces used by personnel should be designed such as to show only those data elements that are necessary for the processing step at hand. By showing only a subset of a data elements, the probability of recognizing (i.e., identifying) a person is reduced. If processing steps can be completely automated without showing any data in the user interface, the possibility of recognition is eliminated all-together.
Personnel who has access to the pseudonymous data should be made aware that the identification of persons in the data is not permitted. This can be achieved, for example, by training or through a contractual obligation with the employees.
To separate the pseudonymous data from additional information^[5] that exists externally, measures shall prevent that:
- pseudonymous data can leave the (controlled) premises of the controller (e.g., by personnel taking copies home on a USB stick),
- external data (i.e., additional information suited to identify data subjects) can be accessed on or copied to the computing systems where the pseudonymous data resides, and
- software suitable for linking the pseudonymous data to other data sets (i.e., additional information) can be installed or used^[6] on the computing systems where pseudonymous data reside.

References

¹Note that this does not necessarily mean that the third party actually stores the additional information. It may suffice that the third party holds a key that is necessary to decrypt the additional information. This could for example be achieved by the controller encrypting the additional information with the public key of the third party. ↑

²Note that a logging that becomes a surveillance of personnel can also be problematic from a data protection point of view, here with the data subjects being the employees. ↑

³An obvious example is that the concerned employee stops any further access to the personal data record as soon as the identification is suspected or recognized. This may limit the amount of information learned from the identification. ↑

⁴At the time of writing (January 2021), the European Data Protection Board is expected to pronounce itself on the topic of these kinds of “breaches”–at least in the context of anonymization. ↑

⁵Note that this is different from the split-off additional information that is created as an output of data pseudonymization. ↑

⁶Note that so called “portable” software does not require installation but can be directly used for example from a USB stick. ↑