Defining data storage policies
Home » AI » Step by step » Business understanding » Defining data storage policies

According to Article 5(1)(e) of the GDPR, personal data should be “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed”. This requisite is twofold. On one hand, it relates to identification: data should be stored in a form which permits identification of data subjects for no longer than necessary. Consequently, AI developers should implement policies devoted to avoiding identification as soon as it is not necessary for processing. This involves the adoption of adequate measures to ensure that at any moment, only the minimal degree of identification that is necessary to fulfil the purposes must be used (see the “Storage limitation” section in “Principles” chapter).

On the other hand, data storage implies that data can only be stored for a limited period: the time that is strictly necessary for the purposes for which the data are processed. However, the GDPR permits ‘storage for longer periods’ if the sole purpose is scientific research (or archiving in the public interest, historical research or statistical purposes) (see the “Data protection and scientific research” section in the “Concepts” chapter).

In the case of AI development, this exception raises the risk that developers decide to keep the data longer than strictly needed, so as to ensure that they are available for reasons other than the original purposes they were collected for. The controllers should be aware that even though the GDRP might allow storage for longer periods, they should have a good reason to opt for such an extended period (see the “Temporal aspect” subsection in the “Storage limitation” section of the “Principles” chapter). Provided that a real risk comes from the lack of respect of the purpose limitation principle, the compatibility test should be part of any potential reuse of the data.

The intention of the lawmaker appears to have been to dissuade unlimited storage even in this special regime, and guards against scientific research as a pretext for prolonged storage for other, private, purposes. If in doubt, the controller should consider whether a new legal basis is appropriate (see “Lawfulness” section in the corresponding part of General Exposition in AI). Therefore, storage periods should be proportionate to the aims of the processing: “In order to define storage periods (timelines), criteria such as the length and the purpose of the research should be taken into account. It has to be noted that national provisions may stipulate rules concerning the storage period as well.”[1]

Thus, if controllers do not need the data, and there are no compulsory legal reasons that oblige them to preserve the data, they should better anonymize or delete them. Researchers should consult their DPOs if they are willing to storage data for a long-lasting period and be aware of the applicable national regulation. This could also be an excellent moment to envisage time limits for erasure of the different categories of data and document these decisions (see the “Accountability” section in the “Principles” chapter).

1EDPS (2020) Guidelines 03/2020 on the processing of data concerning health for the purpose of scientific research in the context of the COVID-19 outbreak Adopted on 21 April 2020. European Data Protection Supervisor, Brussels, p.10. Available at (accessed 23 April 2020).


Skip to content