The use of data gathered by social networks raises, in itself, certain challenges related to data processing that should be taken into account. These challenges may be even more peculiar when the purpose of the processing is related to research. The main issues involved in the use of data gathered through social networks for research purposes are the following:

  • Social networks favor and enhance the constant reuse of data, which poses risks related to
    • the application of principles such as purpose limitation (Art. 5.1.b), retention period limitation (Art. 5.1.e), integrity and confidentiality (Art. 5.1.f); etc.
    • or the legal status of personal profiles and other derived data, in particular whether they remain personal data and whether they are also works of intellectual property (IP) (the question whether or not inferred personal data are personal data or just the IP of their producers).
  • The choice and correct use of alegal basis for the collection of data from social networks, which requires an adequate understanding and fulfilment of the requirements of their Developer Policies
  • The choice of a legal basis for the re-use of data obtained through social networks and the adequate use of those data according to the basis selected:
    • Consent (and the possibility to obtain “altruistic consent” especially in light of the proposed Data Governance Act).
    • Legitimate interest
    • Public interest
    • Research exception
  • The identification of risks arising from research with social media data, among which the following stand out:
    • harm to individual privacy through mass analysis of personal or non-personal data (group privacy), e.g. due to the identification (or re-identification) of data subjects through personal profiles (this clearly involves an extremely high risk due to the intention to promote massive analysis of data that could lead to profiling);
    • or damage to the honour, privacy or image of individuals or groups, for example, by publishing raw data without going through a correct aggregation or pseudonymization process.
  • The expansive nature of personal data, which makes it advisable to assume by default that personal data are being processed, even though at first sight this may not appear to be the case.[1]
  • Although on many occasions, and increasingly so, research through social networks is born as research, it is also frequent that the researcher’s social network profiles do not have this initial purpose and only acquire it after some time.
  • The common assumption that data made public through social media can be used freely. This is clearly untrue unless data are actually published in fully public profiles (“manifestly made public by the data subject”)and must be carefully avoided.
  • Finally, the opacity of data processing algorithms can have a negative impact on users and discourage research (see the section “General Exposition” in Part III AI of these Guidelines)




1The Historic Graves project is a community focused grassroots heritage project. Local community groups are trained in low-cost high-tech field survey of historic graveyards and recording of their own oral histories. They build a multi-media online record of the historic graves in their own areas and unite to form a national resource. Since this is a project that collects data from graveyards, one might think that it is data of the deceased and therefore the GDPR does not apply (Recital 27). However, the data about graveyards and tombs are provided by the relatives of the deceased, who are obviously not deceased, and by providing the data of their deceased relatives they are also providing their own personal data.

