Definition of Identification
Home » The GDPR » Main Concepts » Identification, Pseudonymization, and Anonymization » Identification » Definition of Identification

(Direct and indirect) identification takes a central role in the definition and understanding of pseudonymization and anonymization. It thus requires a precise analysis.

For this purpose, the following provides a technical interpretation of what happens when a person is identified in a data set. This is done in terms of actions that successful identification enables, rather than in terms of which data elements are necessary to achieve direct identification.

This approach attempts to be more precise and general than most texts on the argument, including the Article 29 Data Protection Working Party’s Opinion 4/2007 on the concept of personal data[1]. The latter states that “[…] in practice, the notion of ‘identified person’ implies most often a reference to the person’s name.” [2]

Defining the meaning of successful identification of a person (equivalent to direct identification) in terms of elements contained in the data set, often the name, represents the probably most common case. This approach fails to explain why the name leads to identification, however. Nor does it answer the question of exactly which other data elements can also lead to identification, under what circumstances, and why.

In an attempt to understand the concept more deeply, the model proposes actions that become available to an actor only if a person has been successfully identified in the data. This model can explain why the name, in many common circumstances, leads to identification. Beyond just the name, the model is also applicable to other data elements.

The model of identification is illustrated in Figure 3 that is described in the sequel.

Figure 3: Identification of a data subject.

Identification is about relating a record of a data set, shown on the right, with a data subject[3], shown on the left. Identification requires an actor who attempts the identification. Actors have certain assets at their direct disposition relative to which the identification is performed.

These assets include information assets consisting of

  • knowledge of data subjects in the mind of the actor, and
  • data records representing data subjects in some processing system.

These information assets can be seen as a virtual model of the world that includes representation of real persons.

In addition to information assets, actors also have access to systems that permit to interact with persons. The most common examples of such system may be communication systems such as telephone, e-mail, messaging, or postal mail. Actors can also interact physically with persons by meeting with them.

Actors have successfully identified a data subject in the data record when they are enabled to perform certain actions, namely the following:

  • Actors are able to consult and/or manipulate the representation of the data subject in their virtual model of the world represented by data or knowledge contained in their information assets.
  • Actors are able to interact with the physical person through a system of interaction that is available to them.

The former kinds of actions are for example enabled when the matching virtual representation of the data subject in the information assets can be established through lookup[4] (e.g., based on a unique handle contained in the data record) or recognition[5] (e.g., based on a unique combination of identity-relevant properties). It is evident that a name is in many cases a suitable handle to look up persons in information assets. It is also clear that this is only the case when the name is contained in the information assets. Furthermore, information elements different from a name can enable lookup or recognition.

The latter kinds of actions are typically enabled when the data record contains an address that identifies a data subject in a given system of interaction. Addresses are in most cases unique handles in the identity domain defined by the system of interaction. It can also be a time and place, however, that permits to meet and physically interact with a person. In some cases, it may be necessary to add additional information elements to an imprecise time and place that allow for the recognition of the person with whom to interact. For example, a description or picture of the person may serve this purpose. Other examples are unique properties like a description of what a person wears or carries[6].

The relevant concepts of identification are formalized in the following definitions.

Definition: identified

A data subject described by a data record is considered to be identified when a whole data record, a subset thereof, or data elements that are derived from it can be linked to a unique handle for persons

  • used in a model of the world (i.e., knowledge) in the mind of a human actor,
  • used in a virtual model of the world (i.e., data) available to the actor, or
  • used as address in some real-world interaction system accessible to the actor.

The linking can be deterministic or probabilistic. For a data subject to be identified, deterministic linking needs to be unique and probabilistic linking must single out exactly one person with sufficiently high probability.

The direction in which identification is achieved is irrelevant: Either identification yields the person described by a given set of data elements, or it yields the data elements belonging to a given person.

The linking can be based on the comparison of unique handles, quasi-identifiers, identity-relevant properties, or (unique) combinations thereof. It results in the association between the data record and a mental representation, data record, or interaction address of the related person in the domain of the actor.

Identification is considered to be direct if it happens solely based on the assets that are directly available to an actor. These include the knowledge and data the actor possesses, as well as other assets that are at ready disposition such as those resulting from a simple internet search or phone book lookup.

Definition: directly available assets

An asset is considered directly available to an actor if the actor knows about its existence and can access it with contained effort. Most prominently, this is the case for assets that are under the direct control of the actor.

Definition: direct identification

Direct identification is based on linking between the data record and a unique handle contained in directly available assets.

Identification is considered to be indirect if it is only possible with assets that the actor cannot readily access. Such assets are typically called additional information. Information is considered additional information for example if the actor is initially unaware of its existence and can find it only with a significant search effort.

Definition: not directly available assets; additional information

An asset is considered not directly available to an actor if the actor initially does not knows about its existence or can access it only with significant effort. Not directly available information assets are typically called additional information.

Definition: indirect identification

Indirect identification is based on multi-step linking between the data record via not directly available assets to a unique handle contained in directly available assets. In most cases, the initial data record is first linked to additional information and from there to directly available assets.

Definition: identifiable

A data subject described by a data record is considered to be identifiable if any actor exists at present or in the future who is able to identify (i.e., render identified) the data subject by using any realistically available additional information and linking methodology[7].

Note that the concept of identifiable is not easy to evaluate since the evaluator may not know about all possible actors and the additional information and linking methodology available to them. In addition, such actors, additional information, and linking methodology may not yet exist at the present time but only materialize in the future.
 

References


1Article 29 Data Protection Working Party, WP136, Opinion 4/2007 on the concept of personal data, https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2007/wp136_en.pdf (last visited 28/6/2021).

2Idem, 2nd paragraph on page 13.

3Note that in the general case, instead of an individual data subject, the relation could also be made to a class of data subjects or to a session. The model is thus also applicable to c- and s-Identification proposed by Leenes in: R. Leenes, ‘Do They Know Me? Deconstructing Identifiability’ (2008) 4(1&2) University of Ottawa Law & Technology Journal 135, 141-142, https://pure.uvt.nl/ws/portalfiles/portal/1310856/Leenes_Do_they_know_me_110216_publishers_immediately.pdf (last visited 29/6/2021).

4This corresponds to l-identification proposed by Leenes.

5This corresponds to r-identification proposed by Leenes.

6Such as “a red carnation in the buttonhole a copy of the times under the left arm”.

7This definition aims at being in line with the sentence 3 and 4 of Recital 26 GDPR.

 

Skip to content