Business understanding - Guidelines Panelfit

Description

“The initial business understanding phase focuses on understanding the project objectives from a business perspective, converting this knowledge into a data mining problem definition, and then developing a preliminary plan designed to achieve the objectives. In order to understand which data should later be analyzed, and how, it is vital for data mining practitioners to fully understand the business for which they are finding a solution. The business understanding phase involves several key steps, including determining business objectives, assessing the situation, determining the data mining goals, and producing the project plan.”^[1]

In the context of R&D on crime prediction and prevention technologies conducted in the framework ofH2020, the general description and structure of tasks must be adjusted accordingly. This may imply that both the terminology and the concrete contents of the task needs to be interpreted and modified to fit the particular objectives.

The above-mentioned general objectives involve four main tasks:

Determine the Project Objectives. This means:
- Uncover the primary objectives as well as the related questions the project (envisaged solution) would like to address
- Determine the measure of success.
Assess the Situation
- Identify the resources available to the project, both material and personal.
- Identify what data is available to meet the primary goal.
- List the assumptions made in the project.
- List the project risks, list potential solutions to those risks, create a glossary of project and data processing terms, and construct a cost-benefit analysis for the project.
Determine the Data Processing Goals: decide what level of predictive accuracy is expected to consider the project successful.
Produce a Project Plan: Describe the intended plan for achieving the data processing goals, including outlining specific steps and a proposed timeline. Provide an assessment of potential risks and an initial assessment of the tools and techniques needed to support the project.

Main actions that need to be addressed

Defining project objectives

For our scenario, the general objectives are defined by the respective call.The projects mentioned above pertain to the SEC-12-FCT-2016-2017 call: Technologies for prevention, investigation, and mitigation in the context of the fight against crime and terrorism.^[2] The Specific Challenge is described as “Organized crime and terrorist organizations are often at the forefront of technological innovation in planning, executing and concealing their criminal activities and the revenues stemming from them. Law Enforcement agencies (LEAs) are often lagging behind when tackling criminal activities supported by “advanced” technologies”.

The scope of this call comprises:

New knowledge and targeted technologies for fighting both old and new forms of crime and terrorist behaviors supported by advanced technologies;
Test and demonstration of newly developed technology by LEAs involved in proposals;
Innovative curricula, training and (joint) exercises to be used to facilitate the EU-wide take-up of these new technologies, in particular in the fields of the following sub-topics:

1. cyber-crime: virtual/crypto currencies des-anonymization/tracing/ impairing where they support underground markets in the darknet.

2. detection and neutralization of rogue/suspicious light drone/UAV flying over restricted areas, and involving as beneficiaries, where appropriate, the operators of infrastructure

3. video analysis in the context of legal investigation

and a fourth open sub-topic.

The conditions set in this call allow for some, although limited, discretion to design the project. The applicants are free to choose the type of technologies; however, non-technical solutions strategiesappear not to be eligible for funding. Even though the range of technologies remains open, the call clearly demands technical solutions, thus excluding approaches to solve the addressed specific security problems without the involvement of potentially highly intrusive technologies. The term advanced technologies at least suggests investigating into developing and using artificial intelligence and machine learning technologies. Limited choice also exists regarding the objective, e.g. on which forms of crime or terrorist behaviors the project targets. Therefore, it is essential to involve end-users, i.e. LEAs (law enforcement agencies), already in the decision-making phase on objectives and the means to achieve them.

The selection of specific technologies, or in a more general context, of specific methods, also influences the range of ethics or legal compliance issues involved by the project. In the case of security research, specifically selected technologies, in our case particular AI or machine learning approaches, may, apart from usual ethics issues like the processing of personal data,raise in additionethics concerns related to dual-use, the exclusive focus of the research on civil applications or to misuse, requiring to consider related particular regulations accordingly.

Opting for technical solutionswith explainability and transparency

Whereas explainability and transparency constitute generic requirements for AI tools, they form mandatory obligations in the case of AI technologies applied to or having consequences for humans (see also the “Lawfulness, fairness and transparency” section in “Principles” chapter). In the case of AI used for profiling or decision support in a security context, these principles are fundamental. AI tools are prone to bias; explainability and transparency can help detect and remove biases of algorithms created by such methods. Technologies supporting crime prevention, detection and prosecution need to provide provable and attestable results as valid evidence, also in front of courts. Inaccurate findings may have severe consequences for individuals, particularly in the form of false positives or fatal outcomes in the case of false negatives. Therefore it may be necessary to implement the AI tool as a support for decisions by humans, together with mandatory measures accompanying theemployment. Thus making sure that people in charge do not just make the system’s suggestion to their own decision, but understand that they have to justify their decision, when following the suggestion as well as when objecting a suggestion by the system. To enable humans to understand the suggestion of an AI tool, these systems need to be very transparent regarding the factors influencing the outcome of a calculation. In the end, humans need to take responsibility as well as the liability for a decision. Transparency is also essential to ensure sufficient understanding of the model and data used and the results produced, particularly in the case of complaints or need of proof of evidence.

Developers of AI tools used in this context could facilitate the implementation by programming supporting applications for the whole decision process, like having a mandatory field to fill in when a decision is made upon the suggestion by the system before the outcome could be processed further.

Implementing a training program

In our case, “training and (joint) exercises to be used to facilitate the EU-wide take-up of these new technologies” are already included in the call description. Such training exercises must not be restricted to the use of the developed technologies, but start at the very beginning of research activities and in particular, comprise all persons involved in the design of AI technologies (e.g., algorithm designers, developers, programers, coders, data scientists, engineers). This action is one of the essential pieces of advice to be considered from the very first moment of a crime prediction and prevention project.Algorithm designers, who occupy the first link in the algorithmic chain, are likely to be unaware of the ethical and legal implications of their actions. One of the main problemsof AI tools devoted to dealing with crime and terrorism is that they often use personal data that are included in large datasets, comprising large fractions of citizens, e.g. users of specific social networks. Whereas the analysis of mass surveillance data by AI tools may be permissible under specific national jurisdictions or transpositions of the Data Protection Law Enforcement Directive (Directive 2016/680), it is still very problematic for several reasons. First, legal compliance may be necessary condition for conformitywith ethics principles, but never can be regarded as a sufficient condition. An information document provided by the European commission on “Ethics and data protection”^[3]clearly states that “The fact that some data are publicly available does not mean that there are no limits to their use” (see Box 4 on page 13). Second, compliance with national or EU legislation does not necessarily imply legal compliance with fundamental rights. The Data Retention Directive^[4] is a prominent related example as it was annulled by Court of Justice of European Union (CJEU) in a ruling of 8 April 2014^[5]because the Court considered that the directive ‘entails a wide-ranging and particularly serious interference with the fundamental rights to the respect for private life and to the protection of personal data, without that interference being limited to what is strictly necessary’. Third, public opinion and acceptability by citizens must be respected. Large-scale citizen consultations on surveillance technologies revealed that citizens in general accept serious intrusions into their privacy if they are based on concrete and plausible suspicion but reject untargeted mass surveillance measures.^[6]Applying data mining to detect criminal or terroristic activities can be compared to finding the needle in the haystack^[7]. This also means that the processing will include personal data of data subjects that are not currently or have not been in the past involved in any criminal or terrorist activities. Depending on the targeting of the data analyzed, the data processed may predominantly or almost exclusively concern innocent individuals. Such data processing violates the presumption of innocence, changes the relationship between citizens and state and may have grave societal and individual (in case of false positives) consequences.

You, as an algorithm designer, must therefore be able to understand the implications of your actions, both for individuals and society, and be aware of your responsibilities by learning to show continued attention and vigilance. Following this advice may help you in avoidingor mitigating many ethical and legal issues. In that sense, an optimal training for all subjects involved in the project even before it starts could be one of the most efficient tools to save time and resources in terms of compliance with data protection, ethics, EU and national law or societal acceptability. This also implies the participation of ethical and legal experts both in training activities and in the execution of the project. Adequate measures to ensure confidentiality also deserve particular attention (see “Measures in support of confidentiality” subsection in the “Integrity and confidentiality” section in “Principles” chapter). Security and confidentiality of processed data, on the one hand, is essential; general knowledge about the types of mined data, persons concerned or algorithms applied, on the other hand, is mandatory to guarantee compliance with human rights and European values. Compliance with the most restrictive member state also supports business objectives, allowing the implementation and use of developed systems without the need for individual adjustments.

Using legal framework applicable for data processing

For security-related R&D projects this step is particularly complex and challenging. For the research project as such GDPR regulations apply; for later implementations the rules and provisions of the Data Protection Law Enforcement Directive (Directive 2016/680) must be followed. In addition, possibly diverging legislation of involved (member) states need to be taken into account. Therefore, the developed technologies and systems must at least provide for adjustability and flexibility to cope with different regulations. From a human rights and ethics perspective, compliance with the most restrictive should be incorporated in the created technologies, thus supporting maximum respect for fundamental rights and related values, at the same time, as already mentioned, reducing or eliminating the need for modifications if applied in countries with diverging regulations.

According to article 5(1)(a) of the GDPR, personal data shall be “collected for specific, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes”. The concept of legitimacy is not well defined in the GDPR, but the Article 29 Working Party stated that legitimacy involves that data must be processed “in accordance with the law”, and “law” should be understood as a broad concept that includes “all forms of written and common law, primary and secondary legislation, municipal decrees, judicial precedents, constitutional principles, fundamental rights, other legal principles, as well as jurisprudence, as such ‘law’ would be interpreted and taken into account by competent courts”.^[8]

Therefore, it is a wider concept than lawfulness. It involves compliance with the main values of applicable regulations and the main ethical principles at stake. For instance, some concrete AI tools will need the intervention of an ethics committee. In other cases, guidelines or any other kind of soft regulation might be applicable. You should ensure adequate compliance with this requirement by designing a plan for this preliminary stage of the lifecycle of the tool (see “Legitimacy and lawfulness” part in “Lawfulness, fairness and transparency” in “Principles” chapter). To this purpose, you should be particularly aware of the requirements posed by the applicable regulation at the national level. Developing algorithms related to crime prediction and prevention clearly requires the involvement of Ethics Committees from an early stage on and according to Art. 35 GDPR to carry out a Data protection impact assessment. As already mentioned, Art. 10 GDPR requires checking whether the processing is authorized by Union or Member State law in the case of processing personal data relating to criminal convictions and offences or related security measures. Make sure that your research plan fits well with all these requirements for both phases, the conduction of the research project and future implementations of the developed systems.

The ethics guidance provided for EU funded research (see footnote 426) constitutes a comprehensive framework for checking ethics compliance, which should be consulted in addition to institutional ethics regulations or codes of conduct, regardless whether your research actually receives funding from the EC. Please be aware that ethics evaluation is not a checklist activity but always also comprises a weighing of potentially conflicting norms. In particular the application of emerging ICT and having in mind privacy by design in such a sensitive area, requires forward-thinking on both sides, involved researchers and the ethics evaluators.

Even if your project or your research institution is not subject to specific ethic regulations, observation of and compliance with relevant national or EU is essential. As soon as you bring the developed technologies and systems to the market, compliance is essential for both, implementation within the EU and for getting export licences for non-EU commercial exploitation.

Adopting a risk-based thinking approach

The creation of your algorithm will probably involve the use of several special categories of personal data, e.g. political opinions, religious or philosophical beliefs or data concerning a natural person’s sex life or sexual orientation in the case of data mining of social networks. Therefore you must ensure that you implement appropriate measures to minimize the risks to data subjects’ rights, interests, and freedoms (see “Integrity and confidentiality” in “Principles” chapter). To this purpose, you must assess the risks to the rights and freedoms of individuals participating in the research and development process and judge what is appropriate to protect them. In all cases, you need to ensure compliance with data protection requirements.

In the context of crime prediction, prevention, detection or investigation technologies a risk-based approach makes a DPIA (Data Protection Impact Assessment) obligatory as at least one of the three specific conditions of Art. 35(3) GDPR necessarily will apply:

“3. A data protection impact assessment referred to in paragraph 1 shall in particular be required in the case of:

(a) a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person;

(b) processing on a large scale of special categories of data referred to in Article 9(1), or of personal data relating to criminal convictions and offences referred to in Article 10; or

(c) a systematic monitoring of a publicly accessible area on a large scale.”

The risk-based analysis should also include potential ethics issues related to misuses^[9] of the developed technologies and to dual use^[10] related export restrictions that may apply to the developed systems.

Consider also that the risks are not limited to data protection and privacy violating impacts of the developed systems. Constitutional rights and other human rights such as the presumption of innocence, equal access to justice, non-discrimination or freedom of expression may also be violated or impaired. Moreover, these effects are not limited to potential suspects, but affect society as a whole. They are exacerbated by a lack of transparency and human controllability of many AI tools.

Preparing the documentation of processing

Whoever processes personal data (including both, controllers and processors) needs to document their activities primarily for the use by qualified/relevant Supervisory Authorities. You must do this through records of processingthat aremaintained centrally by your organization across all its processing activities, and additional documentation that pertains to individual data processing activities (see “Documentation of Processing” section in “Main tools and actions” chapter). This preliminary stage is the perfect moment to set up a systematic way of collecting the necessary documentation, since it will be the time when you can conceive and plan the processing activity.

The development of your AI tool might involve the use of different datasets. The records must ensure the traceability of the processing, the information about possible reuse of data, and the use of data pertaining to different datasets in different or in the same stages of the life cycle.

For systems used for law enforcement purposes, the documentation of processing must also comprise the documentation of access to the system once implemented in order to prevent and to detect possible misuses, e.g. non-authorized access to generated results.

As stated in the Requirements and acceptance tests for the purchase and/or development of the employed software, hardware, and infrastructure (subsection of the Documentation of Processing section), the risk evaluation and the decisions taken “have to be documented in order to comply with the requirement of data protection by design (of Art. 25 GDPR). Practically, this can take the form of:

Data protection requirements specified for the purchase (e.g., a tender) or development of software, hardware and infrastructure,

Acceptance tests that verify that the chosen software, systems and infrastructure are fit for purpose and provide adequate protection and safeguards.

Such documentation should be an integral part of the DPIA.”

Finally, you should always be aware that, according to Art. 32(1)(d) of the GDPR, data protection is a process. Therefore, you should test, assess, and evaluate the effectiveness of technical and organizational measures regularly. This stage is a perfect moment to build a strategy aimed at facing these challenges.

Checking regulatory framework

The GDPR includes specific rules regarding processing for the purposes of scientific research (see “Data protection and scientific research” section in “Concepts” chapter).^[11] Your AI tool might be classified as scientific research, irrespective of whether it is created for profit or not. “Union or Member State law may provide for derogations from the rights referred to in Articles 15, 16, 18 and 21 subject to the conditions and safeguards referred to in paragraph 1 of this Article in so far as such rights are likely to render impossible or seriously impair the achievement of the specific purposes, and such derogations are necessary for the fulfilment of those purposes” (Art. 89(2) GDPR). Furthermore, according to article 5 (b) “further processing of the data gathered, in accordance with Article 89(1), would not be considered to be incompatible with the initial purposes (‘purpose limitation’). Some other particular exceptions to the general framework applicable to processing for research purposes (such as storage limitation) should also be considered”.

Possibly you might profit from this favorable framework, depending on the countries where the research is conducted and on the legal form of the involved partners, e.g. whether they are academic or commercial entities. Nevertheless, you must be aware of the concrete (national) regulations that apply to this research (mainly, the safeguards to be implemented). They might include specific requirements, depending on respective national laws.

To be careful also implies that you have to consider both legal and ethical limitations to the planned research. Just because specific (national) regulations allow for the intended data processing does not imply that it is also acceptable or compliant from an ethics perspective. In analogy ethics compliance must not be misused as an escape^[12] from regulations.

Defining data storage policies

According to Article 5(1)(e) GDPR, personal data should be “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed” (see “Storage limitation” section in “Principles” chapter). This requisite is twofold. On the one hand, it relates to identification: data should be stored in a form which permits identification of data subjects for no longer than necessary. Consequently, you should implement policies devoted to avoiding identification as soon as it is not necessary for processing. Such policies involve the adoption of adequate measures to ensure that at any moment, only the minimal degree of identification that is necessary to fulfil the purposes must be used (see “Temporal aspect” subsection in “Storage limitation” section in “Principles” chapter).

On the other hand, data storage implies that data can only be stored for a limited period: the time that is strictly necessary for the purposes for which the data are processed. However, the GDPR permits ‘storage for longer periods if the sole purpose is scientific research (which might be the case for the R&D phase).

The scientific research exception raises the risk that you decide to keep the data longer than strictly needed. You must be aware that even though the GDPR might allow storage for longer periods, you should have justifiable reasons to opt for such an extended period. For the developed systems you must include organisational and technical precautions to be able to comply with different national legal regulations concerning the maximum data storage periods. This could also be an excellent moment to envisage time limits for (automatic) erasure of different categories of data and to document these decisions (see Accountability Principle in Principles chapter).

Appointing a Data Protection Officer

According to Art. 37(1) GDPR you must appoint a DPO:

“1. The controller and the processor shall designate a data protection officer in any case where:

(a) the processing is carried out by a public authority or body, except for courts acting in their judicial capacity;

(b) the core activities of the controller or the processor consist of processing operations which, by virtue of their nature, their scope and/or their purposes, require regular and systematic monitoring of data subjects on a large scale; or

(c) the core activities of the controller or the processor consist of processing on a large scale of special categories of data pursuant to Article 9 and personal data relating to criminal convictions and offences referred to in Article 10.”

References

¹Shearer, Colin, The CRISP-DM Model: The New Blueprint for Data Mining, p. 14. ↑

²https://ec.europa.eu/research/participants/data/ref/h2020/wp/2016_2017/main/h2020-wp1617-security_en.pdf ↑

³https://ec.europa.eu/info/sites/default/files/5._h2020_ethics_and_data_protection_0.pdf ↑

⁴Directive 2006/24/EC of the European Parliament and of the Council of 15 March 2006 on the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks and amending Directive 2002/58/EC, Official Journal of the European Union<https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32006L0024&from=en> ↑

⁵https://www.europarl.europa.eu/legislative-train/theme-area-of-justice-and-fundamental-rights/file-data-retention-directive ↑

⁶Strauß, S. (2015). D 6.10–Citizen Summits on Privacy, Security and Surveillance: Synthesis Report. <http://surprise-project.eu/wp-content/uploads/2015/02/SurPRISE-D6.10-Synthesis-report.pdf> ↑

⁷Which also means that searching more data just increases the haystack, not necessarily the number of needles. ↑

⁸Article 29 Working Party (2013) Opinion 03/2013 on purpose limitation Adopted on 2 April 2013, WP203. European Commission, Brussels, p.20. Available at: https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2013/wp203_en.pdf ↑

⁹See https://ec.europa.eu/research/participants/data/ref/h2020/other/hi/guide_research-misuse_en.pdf ↑

¹⁰See https://ec.europa.eu/research/participants/data/ref/h2020/other/hi/guide_research-dual-use_en.pdf ↑

¹¹This specific framework also includes historical research purposes or statistical purposes. However, ICT research is not usually related to these purposes. Therefore, we will not analyse them here. ↑

¹²Wagner, B. (2018). Ethics as an Escape from Regulation: From ethics-washing to ethics-shopping? In E. BayamlioĞLu, I. Baraliuc, L. Janssens, & M. Hildebrandt (Eds.), Being Profiled (pp. 84-89): Amsterdam University Press. ↑