Modeling (Training) - Guidelines Panelfit

“In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Typically, several techniques exist for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase may be necessary. Modeling steps include the selection of the modeling technique, the generation of test design, the creation of models, and the assessment of models.”^[1]

This phase involves several key tasks. Overall, the developer must do the following:

Select the modeling technique that will be used. Depending on the type of technique, consequences such as data inference, obscurity or biases are more or less likely to happen.
Make a decision on the training tool to be used. This enables the developer to measure how well the model can predict history before using it to predict the future. Training always involves running empirical testing with personal data. Sometimes, developers test the model with data that are different from those used to generate it. Therefore, at this stage one might talk about different types of datasets. Sometimes identifying the individuals that the training data relates to might be difficult. This creates issues for fulfilling individuals’ rights that should be addressed appropriately.

These are the main actions that need to be addressed in this stage

References

¹Shearer, C. (2000) ‘The CRISP-DM model: the new blueprint for data mining’, Journal of Data Warehousing 5(4): 13-23, p.17. Available at: https://mineracaodedados.files.wordpress.com/2012/04/the-crisp-dm-model-the-new-blueprint-for-data-mining-shearer-colin.pdf (accessed 15 May 2020). ↑

Checklist: Modelling (training)

☐ The controllers have determined the purpose of the AI system’s use at the outset of its training or deployment, and performed a re-assessment of this determination if the system’s processing threw up unexpected results

☐ The controllers have purged the data used during the training phase of all information not strictly necessary for training of the model.

☐ The controllers have considered implementing technical tools that might serve well to detect biases, such as the Algorithmic Impact Assessment

☐ The controllers have considered conducting a DPIA at this stage

☐ The controllers have ensured that they are able to respond to data subjects’ requests to exceptions to the right to access apply.

☐ The controllers can guarantee the right of rectification of the data, especially those generated by the inferences and profiles drawn up by the AI development.

☐ The controllers are able to respond to requests for erasure, unless a relevant exemption applies and provided the data subject has appropriate grounds.