Biases can be caused by a number of different issues, and when data is gathered, it may contain socially constructed biases, inaccuracies, errors and mistakes. There are multiple reasons that explain these biases. Sometimes, it might happen that datasets are biased due to malicious actions. Feeding malicious data into an AI system may change its behavior, particularly with self-learning systems. For instance, in the case of the chatbot Tay, developed by Microsoft, a huge number of internet users started posting racist and sexist comments that served to feed the algorithm. As a result, Tay started sending racist and sexist tweets after just a few hours of operation. In other cases, data are simply of poor quality and this creates bias. For example, data taken from social media platform present serious risks for researchers, due to the characteristics of the online environment, which does not guarantee the accuracy and representativeness of the data.
Another reason for biases is imbalanced training data (see Box 8), which arises when the proportion of different categories in the training data is not balanced. For instance, in the context of clinical trials, there might be much more data from males than females. In such cases, females are likely to be discriminated against by the resulting AI model. Therefore, issues related to the composition of the databases used for training raise crucial ethical and legal issues, not only issues related to efficiency or of a technical nature.
|Box 8. Biases caused by imbalanced data training
The Beauty.AI case
Launched in 2016, the Beauty.AI tool was created to select “the First Beauty Queen or King Judged by Robots”, using age and facial recognition algorithms. Seven thousand people sent in their pictures through an app, but most of the 44 winners were white; only a handful were Asian, and only one had dark skin. This was despite the fact that, although the majority of contestants were white, many people of color submitted photos, including large groups from Africa and India. This was immediately considered a racist result, due to poor selection of the training dataset. The main problem was that the data the project used to establish standards of beauty were mainly composed by white people. Although the developers did not build the algorithm to treat light skin as a sign of beauty, the input data effectively led the robot judges to reach that conclusion.
The Amazon recruiting tool
In December 2018, Amazon scrapped its AI recruiting tool when the company discovered that the AI system showed bias against women. Amazon had been building computer programs since 2014 to review job applicants’ resumes, with the aim of mechanizing the search for top talent. The tool used AI to score job candidates from one to five stars. In 2015, however, Amazon discovered that the tool was not rating candidates for software developer jobs and other technical posts in a gender-neutral way. This was because Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry.
Thirdly, the training data may reflect past discrimination produced by societal trends (see Box 9).If controllers use historical data, they should be aware of the probable differences between social contexts compared to the present day. Otherwise, biases will be unavoidable. Sometimes the biases come from the different social contexts of the community that provided the data and the community that is meant to use the algorithm. If the controller does not pay careful attention to this, again biases will probably be present in the tool.
|Box 10. Biases produced by societal trends
In the past, loan applications from women were rejected more frequently than those from men, due to prejudice. In this case, any AI model trained on historical data is likely to reproduce the same pattern of discrimination. These issues can occur even if the training data does not contain any protected characteristics, such as gender or race. A variety of features in the training data are often closely correlated with protected characteristics (e.g. occupation, race, etc.). These ‘proxy variables’ enable the model to reproduce patterns of discrimination associated with those characteristics, even if its designers did not intend this.
These problems can occur in any statistical model. However, they are more likely to occur in AI systems because they can include a greater number of features, and may identify complex combinations of features that are proxies for protected characteristics. Many modern machine-learning methods are more powerful than traditional statistical approaches because they are better at uncovering non-linear patterns in high dimensional data. However, these also include patterns that reflect discrimination.
Finally, it is possible that biases are caused by a poorly designed AI tool (see Box 11). It might happen that the designer introduces correlations by proxies that do not work well with reality. If this is the case, the model will make inaccurate predictions, since its conceptual basis are not solid.
|Box 11. Bias caused by a poorly designed AI tool: algorithms
The US healthcare system uses commercial algorithms to guide health decisions. Obermeyer et al.found evidence of racial bias in one widely used algorithm, which meant that, among black and white patients assigned the same level of risk by the algorithm, the black patients were sicker than the white ones. The authors estimated that this racial bias reduced the number of black patients identified for extra care by more than half. Bias occurred because the algorithm used health costs as a proxy for health needs. Less money was spent on black patients with the same level of need as white patients, and the algorithm thus falsely concluded that black patients were healthier than equally sick white patients. In reality, the minor expenditure was caused by a number of racially biased factors, such as different access to treatment, levels of trust in the system, imbalances caused by healthcare givers, etc.
1High-Level Expert Group on AI (2019) Ethics guidelines for trustworthy AI. European Commission, Brussels, p.17. Available at: https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai (accessed 20 May 2020). ↑
2Levin, s. (2016) ‘a beauty contest was judged by AI and the robots didn’t like dark skin’, The Guardian, 8 September. Available at: www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-black-people (accessed 26 may 2020).↑
3Dastin, J. (2018)‘Amazon scraps secret AI recruiting tool that showed bias against women’, Reuters, 10 October. At: www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G ↑
4ICO (2020) AI auditing framework: draft guidance for consultation, p.54.Information Commissioner’s Office, Wilmslow.Available at: https://ico.org.uk/media/about-the-ico/consultations/2617219/guidance-on-the-ai-auditing-framework-draft-for-consultation.pdf (accessed 26 May 2020). ↑
5Obermeyer, Z. et al. (2019) ‘Dissecting racial bias in an algorithm used to manage the health of populations’, Science, 25 October, 447-453. ↑