Editing Machine learning (section)

=== Bias ===
{{Main|Algorithmic bias}}

Different machine learning approaches can suffer from different data biases. A machine learning system trained specifically on current customers may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on human-made data, machine learning is likely to pick up the constitutional and unconscious biases already present in society.<ref name="Garcia-2016">{{Cite journal |last=Garcia |first=Megan |date=2016 |title=Racist in the Machine |journal=World Policy Journal |language=en |volume=33 |issue=4 |pages=111–117 |doi=10.1215/07402775-3813015 |issn=0740-2775 |s2cid=151595343}}</ref>

Systems that are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitising cultural prejudices.<ref>{{Cite web |last=Bostrom |first=Nick |date=2011 |title=The Ethics of Artificial Intelligence |url=http://www.nickbostrom.com/ethics/artificial-intelligence.pdf |archive-url=https://web.archive.org/web/20160304015020/http://www.nickbostrom.com/ethics/artificial-intelligence.pdf |archive-date=4 March 2016 |access-date=11 April 2016}}</ref> For example, in 1988, the UK's [[Commission for Racial Equality]] found that [[St George's, University of London|St. George's Medical School]] had been using a computer program trained from data of previous admissions staff and that this program had denied nearly 60 candidates who were found to either be women or have non-European sounding names.<ref name="Garcia-2016" /> Using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants by similarity to previous successful applicants.<ref name="Edionwe Outline">{{cite web |last1=Edionwe |first1=Tolulope |title=The fight against racist algorithms |url=https://theoutline.com/post/1571/the-fight-against-racist-algorithms |url-status=live |archive-url=https://web.archive.org/web/20171117174504/https://theoutline.com/post/1571/the-fight-against-racist-algorithms |archive-date=17 November 2017 |access-date=17 November 2017 |website=The Outline}}</ref><ref name="Jeffries Outline">{{cite web |last1=Jeffries |first1=Adrianne |title=Machine learning is racist because the internet is racist |url=https://theoutline.com/post/1439/machine-learning-is-racist-because-the-internet-is-racist |url-status=live |archive-url=https://web.archive.org/web/20171117174503/https://theoutline.com/post/1439/machine-learning-is-racist-because-the-internet-is-racist |archive-date=17 November 2017 |access-date=17 November 2017 |website=The Outline}}</ref> Another example includes predictive policing company [[Geolitica]]'s predictive algorithm that resulted in "disproportionately high levels of over-policing in low-income and minority communities" after being trained with historical crime data.<ref name="Silva-2018">{{Cite journal |last1=Silva |first1=Selena |last2=Kenney |first2=Martin |date=2018 |title=Algorithms, Platforms, and Ethnic Bias: An Integrative Essay |url=https://brie.berkeley.edu/sites/default/files/brie_wp_2018-3.pdf |url-status=live |journal=Phylon |volume=55 |issue=1 & 2 |pages=9–37 |issn=0031-8906 |jstor=26545017 |archive-url=https://web.archive.org/web/20240127200319/https://brie.berkeley.edu/sites/default/files/brie_wp_2018-3.pdf |archive-date=27 January 2024}}</ref>

While responsible [[Data collection|collection of data]] and documentation of algorithmic rules used by a system is considered a critical part of machine learning, some researchers blame lack of participation and representation of minority population in the field of AI for machine learning's vulnerability to biases.<ref>{{Cite journal |last=Wong |first=Carissa |date=30 March 2023 |title=AI 'fairness' research held back by lack of diversity |url=https://www.nature.com/articles/d41586-023-00935-z |url-status=live |journal=Nature |language=en |doi=10.1038/d41586-023-00935-z |pmid=36997714 |s2cid=257857012 |archive-url=https://web.archive.org/web/20230412120505/https://www.nature.com/articles/d41586-023-00935-z |archive-date=12 April 2023 |access-date=9 December 2023|url-access=subscription }}</ref> In fact, according to research carried out by the Computing Research Association (CRA) in 2021, "female faculty merely make up 16.1%" of all faculty members who focus on AI among several universities around the world.<ref name="Zhang">{{Cite journal |last=Zhang |first=Jack Clark |title=Artificial Intelligence Index Report 2021 |url=https://aiindex.stanford.edu/wp-content/uploads/2021/11/2021-AI-Index-Report_Master.pdf |url-status=live |journal=Stanford Institute for Human-Centered Artificial Intelligence |archive-url=https://web.archive.org/web/20240519121545/https://aiindex.stanford.edu/wp-content/uploads/2021/11/2021-AI-Index-Report_Master.pdf |archive-date=19 May 2024 |access-date=9 December 2023}}</ref> Furthermore, among the group of "new U.S. resident AI PhD graduates," 45% identified as white, 22.4% as Asian, 3.2% as Hispanic, and 2.4% as African American, which further demonstrates a lack of diversity in the field of AI.<ref name="Zhang" />

Language models learned from data have been shown to contain human-like biases.<ref>{{Cite journal |last1=Caliskan |first1=Aylin |last2=Bryson |first2=Joanna J. |last3=Narayanan |first3=Arvind |date=14 April 2017 |title=Semantics derived automatically from language corpora contain human-like biases |journal=Science |language=en |volume=356 |issue=6334 |pages=183–186 |arxiv=1608.07187 |bibcode=2017Sci...356..183C |doi=10.1126/science.aal4230 |issn=0036-8075 |pmid=28408601 |s2cid=23163324}}</ref><ref>{{Citation |last1=Wang |first1=Xinan |title=An algorithm for L1 nearest neighbor search via monotonic embedding |date=2016 |work=Advances in Neural Information Processing Systems 29 |pages=983–991 |editor-last=Lee |editor-first=D. D. |url=http://papers.nips.cc/paper/6227-an-algorithm-for-l1-nearest-neighbor-search-via-monotonic-embedding.pdf |access-date=20 August 2018 |archive-url=https://web.archive.org/web/20170407051313/http://papers.nips.cc/paper/6227-an-algorithm-for-l1-nearest-neighbor-search-via-monotonic-embedding.pdf |archive-date=7 April 2017 |url-status=live |publisher=Curran Associates, Inc. |last2=Dasgupta |first2=Sanjoy |editor2-last=Sugiyama |editor2-first=M. |editor3-last=Luxburg |editor3-first=U. V. |editor4-last=Guyon |editor4-first=I.}}</ref> Because human languages contain biases, machines trained on language ''[[Text corpus|corpora]]'' will necessarily also learn these biases.<ref>{{cite arXiv |eprint=1809.02208 |class=cs.CY |author=M.O.R. Prates |author2=P.H.C. Avelar |title=Assessing Gender Bias in Machine Translation – A Case Study with Google Translate |date=11 March 2019 |author3=L.C. Lamb}}</ref><ref>{{cite web |last=Narayanan |first=Arvind |date=24 August 2016 |title=Language necessarily contains human biases, and so will machines trained on language corpora |url=https://freedom-to-tinker.com/2016/08/24/language-necessarily-contains-human-biases-and-so-will-machines-trained-on-language-corpora/ |url-status=live |archive-url=https://web.archive.org/web/20180625021555/https://freedom-to-tinker.com/2016/08/24/language-necessarily-contains-human-biases-and-so-will-machines-trained-on-language-corpora/ |archive-date=25 June 2018 |access-date=19 November 2016 |website=Freedom to Tinker}}</ref> In 2016, Microsoft tested [[Tay (chatbot)|Tay]], a [[chatbot]] that learned from Twitter, and it quickly picked up racist and sexist language.<ref>{{Cite news |last=Metz |first=Rachel |date=24 March 2016 |title=Why Microsoft Accidentally Unleashed a Neo-Nazi Sexbot |url=https://www.technologyreview.com/s/601111/why-microsoft-accidentally-unleashed-a-neo-nazi-sexbot/ |url-access=limited |url-status=live |archive-url=https://web.archive.org/web/20181109023754/https://www.technologyreview.com/s/601111/why-microsoft-accidentally-unleashed-a-neo-nazi-sexbot/ |archive-date=9 November 2018 |access-date=20 August 2018 |work=MIT Technology Review |language=en}}</ref>

In an experiment carried out by [[ProPublica]], an [[investigative journalism]] organisation, a machine learning algorithm's insight into the recidivism rates among prisoners falsely flagged "black defendants high risk twice as often as white defendants".<ref name="Silva-2018" /> In 2015, Google Photos once tagged a couple of black people as gorillas, which caused controversy. The gorilla label was subsequently removed, and in 2023, it still cannot recognise gorillas.<ref>{{Cite news |last1=Vincent |first1=James |date=12 January 2018 |title=Google 'fixed' its racist algorithm by removing gorillas from its image-labeling tech |url=https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai |url-status=live |archive-url=https://web.archive.org/web/20180821031830/https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai |archive-date=21 August 2018 |access-date=20 August 2018 |work=The Verge}}</ref> Similar issues with recognising non-white people have been found in many other systems.<ref>{{Cite news |last1=Crawford |first1=Kate |date=25 June 2016 |title=Opinion {{!}} Artificial Intelligence's White Guy Problem |url=https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html |url-access=subscription |url-status=live |archive-url=https://web.archive.org/web/20210114220619/https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html |archive-date=14 January 2021 |access-date=20 August 2018 |work=[[New York Times]] |language=en}}</ref>

Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.<ref>{{Cite news |last=Simonite |first=Tom |date=30 March 2017 |title=Microsoft: AI Isn't Yet Adaptable Enough to Help Businesses |url=https://www.technologyreview.com/s/603944/microsoft-ai-isnt-yet-adaptable-enough-to-help-businesses/ |url-status=live |archive-url=https://web.archive.org/web/20181109022820/https://www.technologyreview.com/s/603944/microsoft-ai-isnt-yet-adaptable-enough-to-help-businesses/ |archive-date=9 November 2018 |access-date=20 August 2018 |work=MIT Technology Review |language=en}}</ref> Concern for [[Fairness (machine learning)|fairness]] in machine learning, that is, reducing bias in machine learning and propelling its use for human good, is increasingly expressed by artificial intelligence scientists, including [[Fei-Fei Li]], who said that "[t]here's nothing artificial about AI. It's inspired by people, it's created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility."<ref>{{Cite news |last=Hempel |first=Jessi |date=13 November 2018 |title=Fei-Fei Li's Quest to Make Machines Better for Humanity |url=https://www.wired.com/story/fei-fei-li-artificial-intelligence-humanity/ |url-status=live |archive-url=https://web.archive.org/web/20201214095220/https://www.wired.com/story/fei-fei-li-artificial-intelligence-humanity/ |archive-date=14 December 2020 |access-date=17 February 2019 |magazine=Wired |issn=1059-1028}}</ref>