The minimization principle states that personal data shall be adequate, relevantand limited to what is necessary in relation to the purposes for which they are processed. (see the “Data minimization” section in the Principles part of these Guidelines). According to this principle, controllers should be aware of the goal that is to be reached through the processing, so as to avoid using more data than need be. Furthermore, controllers should also try to avoid using special categories of personal data if they are not strictly necessary.
When researchers/innovators gather data from social networks, they might end up processing far more personal and sensitive data than they really need for the specific purposes of the research. There are some ways through which such a scenario might be avoided. In principle, controllers should promote the use of anonymized data (see the “Identification, Pseudonymization, and Anonymization”subsection in the Main Concepts section of the General Part of these Guidelines). Indeed, avoiding the identification of specific individuals from big data analytics, or the re-identification of data users whose data has been pseudonymized, is a fundamental safeguard to prevent undue impact on data subjects caused by data processing[1]. If they do not need personal data, they could ask the social network to provide them with anonymized data. Of course, they could also anonymize the data once gathered, but, in this case, they should not forget that anomymization involves data processing and, thus, they would need to have a legal basis that legitimates it (see the “Identification, Pseudonymization, and Anonymization”subsection in the Main Concepts section of the General Part of these Guidelines).
Furthermore, researchers/innovators should keep in mind that anonymization might be hard to reach. Quite often, aggregation and inferring data practices can easily de-anonymize datasets. Thus, controllers should not presume that their anonymization processes will serve well to preserve data subjects’ privacy. Indeed, they should perform DPIAs and risk assessments to ensure such a belief (see accountability in this part of the Guidelines)
An alternative to anonymization as such is the use of aggregated data. In the context of data protection, two kinds of aggregation have to be distinguished (see the “Data minimization” section in the Principles part of these Guidelines):
- Single Person: Aggregation of data elements pertaining to a single person: Taking for example a person’s average monthly income over a year reduces the information content pertaining to that person.
- Multiple Persons: Aggregation of data elements pertaining to a multitude of persons: Taking for example the average yearly income over group of persons also reduces the overall information content (data minimization). In addition, it also weakens the degree of association between a data element and a given person. This kind of aggregation is therefore also pertinent to storage limitation
When the purpose of the processing can be achieved using aggregated data, this is recommendable (see the “Data minimization principle” subsection of the Main principles section of the General Part of these Guidelines). Under such circumstances, no one but the data subject should access the raw data (obtained or observed data), unless an extremely relevant reason applies (for example, national security issues interpreted restrictively). Indeed, sometimes a specific research only needs aggregated data and has no need of the raw data collected in the social networks. Therefore, controllers must delete raw data as soon as they have extracted the data required for their data processing. As a principle, deletion should take place at the nearest point of data collection of raw data (e.g. on the same device after processing).
References
1WP29 Guidelines 3/2013 on purpose limitation (p. 3) highlight the adoption of safeguards to prevent undue impacts on data subjects as a key factor to take into account when evaluating the compatible further uses of data. ↑