Dissecting racial bias in an algorithm used to manage the health of populations
Racial bias in health algorithms
The U.S. health care system uses commercial algorithms to guide health decisions. Obermeyer et al. find evidence of racial bias in one widely used algorithm, such that Black patients assigned the same level of risk by the algorithm are sicker than White patients (see the Perspective by Benjamin). The authors estimated that this racial bias reduces the number of Black patients identified for extra care by more than half. Bias occurs because the algorithm uses health costs as a proxy for health needs. Less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients. Reformulating the algorithm so that it no longer uses costs as a proxy for needs eliminates the racial bias in predicting who needs extra care.
Abstract
Health systems rely on commercial prediction algorithms to identify and help patients with complex health needs. We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias: At a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than illness, but unequal access to care means that we spend less money caring for Black patients than for White patients. Thus, despite health care cost appearing to be an effective proxy for health by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.
Get full access to this article
View all available purchase options and get full access to this article.
Already a subscriber or AAAS Member?Log In
Supplementary Material
Summary
Materials and Methods
Figs. S1 to S5
Tables S1 to S4
Resources
File (aax2342_obermeyer_sm.pdf)
References and Notes
1
J. Angwin, J. Larson, S. Mattu, L. Kirchner, “Machine Bias,” ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
2
S. Barocas, A. D. Selbst, Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
3
A. Chouldechova, A. Roth, The frontiers of fairness in machine learning. arXiv:1810.08810 [cs.LG] (20 October 2018).
4
A. Datta, M. C. Tschantz, A. Datta, Automated experiments on ad privacy settings. Proc. Privacy Enhancing Technol. 2015, 92–112 (2015).
5
L. Sweeney, Discrimination in online ad delivery. Queue 11, 1–19 (2013).
6
M. Kay, C. Matuszek, S. A. Munson, in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (ACM, 2015), pp. 3819–3828.
7
B. F. Klare, M. J. Burge, J. C. Klontz, R. W. Vorder Bruegge, A. K. Jain, Face Recognition Performance: Role of Demographic Information. IEEE Trans. Inf. Forensics Security 7, 1789–1801 (2012).
8
J. Buolamwini, T. Gebru, in Proceedings of the Conference on Fairness, Accountability and Transparency (PMLR, 2018), pp. 77–91.
9
A. Caliskan, J. J. Bryson, A. Narayanan, Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
10
S. Corbett-Davies, S. Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. arXiv:1808.00023 [cs.CY] (31 July 2018).
11
M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, A. T. Kalai, Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. arXiv:1901.09451 [cs.IR] (27 January 2019).
12
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, in Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2015), pp. 259–268.
13
J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, S. Mullainathan, Human decisions and machine predictions. Q. J. Econ. 133, 237–293 (2018).
14
C. S. Hong, A. L. Siegel, T. G. Ferris, Caring for high-need, high-cost patients: What makes for a successful care management program? Issue Brief (Commonwealth Fund) 19, 1–19 (2014).
15
N. McCall, J. Cromwell, C. Urato, “Evaluation of Medicare Care Management for High Cost Beneficiaries (CMHCB) Demonstration: Massachusetts General Hospital and Massachusetts General Physicians Organization (MGH)” (RTI International, 2010).
16
J. Hsu, M. Price, C. Vogeli, R. Brand, M. E. Chernew, S. K. Chaguturu, E. Weil, T. G. Ferris, Bending The Spending Curve By Altering Care Delivery Patterns: The Role Of Care Management Within A Pioneer ACO. Health Aff. 36, 876–884 (2017).
17
L. Nelson, “Lessons from Medicare’s demonstration projects on disease management and care coordination” (Working Paper 2012-01, Congressional Budget Office, 2012).
18
C. Vogeli, A. E. Shields, T. A. Lee, T. B. Gibson, W. D. Marder, K. B. Weiss, D. Blumenthal, Multiple chronic conditions: Prevalence, health consequences, and implications for quality, care management, and costs. J. Gen. Intern. Med. 22 (suppl. 3), 391–395 (2007).
19
D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, G. Escobar, Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33, 1123–1131 (2014).
20
J. Kleinberg, J. Ludwig, S. Mullainathan, Z. Obermeyer, Prediction Policy Problems. Am. Econ. Rev. 105, 491–495 (2015).
21
G. Hileman, S. Steele, “Accuracy of claims-based risk scoring models” (Society of Actuaries, 2016).
22
J. Kleinberg, S. Mullainathan, M. Raghavan, Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv:1609.05807 [cs.LG] (19 September 2016).
23
A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 153–163 (2017).
24
V. de Groot, H. Beckerman, G. J. Lankhorst, L. M. Bouter, How to measure comorbidity. a critical review of available methods. J. Clin. Epidemiol. 56, 221–229 (2003).
25
J. J. Gagne, R. J. Glynn, J. Avorn, R. Levin, S. Schneeweiss, A combined comorbidity score predicted mortality in elderly patients better than existing scores. J. Clin. Epidemiol. 64, 749–759 (2011).
26
A. K. Parekh, M. B. Barton, The challenge of multiple comorbidity for the US health care system. JAMA 303, 1303–1304 (2010).
27
D. Ettehad, C. A. Emdin, A. Kiran, S. G. Anderson, T. Callender, J. Emberson, J. Chalmers, A. Rodgers, K. Rahimi, Blood pressure lowering for prevention of cardiovascular disease and death: a systematic review and meta-analysis. Lancet 387, 957–967 (2016).
28
K.-T. Khaw, N. Wareham, R. Luben, S. Bingham, S. Oakes, A. Welch, N. Day, Glycated haemoglobin, diabetes, and mortality in men in Norfolk cohort of European Prospective Investigation of Cancer and Nutrition (EPIC-Norfolk). BMJ 322, 15 (2001).
29
K. Fiscella, P. Franks, M. R. Gold, C. M. Clancy, Inequality in quality: Addressing socioeconomic, racial, and ethnic disparities in health care. JAMA 283, 2579–2584 (2000).
30
N. E. Adler, K. Newman, Socioeconomic disparities in health: Pathways and policies. Health Aff. 21, 60–76 (2002).
31
N. E. Adler, W. T. Boyce, M. A. Chesney, S. Folkman, S. L. Syme, Socioeconomic inequalities in health. No easy solution. JAMA 269, 3140–3145 (1993).
32
M. Alsan, O. Garrick, G. C. Graziani, “Does diversity matter for health? Experimental evidence from Oakland” (National Bureau of Economic Research, 2018).
33
K. Armstrong, K. L. Ravenell, S. McMurphy, M. Putt, Racial/ethnic differences in physician distrust in the United States. Am. J. Public Health 97, 1283–1289 (2007).
34
M. Alsan, M. Wanamaker, Tuskegee and the health of black men. Q. J. Econ. 133, 407–455 (2018).
35
M. van Ryn, J. Burke, The effect of patient race and socio-economic status on physicians’ perceptions of patients. Soc. Sci. Med. 50, 813–828 (2000).
36
K. M. Hoffman, S. Trawalter, J. R. Axt, M. N. Oliver, Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites. Proc. Natl. Acad. Sci. U.S.A. 113, 4296–4301 (2016).
37
J. J. Escarce, F. W. Puffer, “Black-White Differences in the Use of Medical Care by the Elderly: A Contemporary Analysis” in Racial and Ethnic Differences in the Health of Older Americans (National Academies Press, 1997), chap. 6; www.ncbi.nlm.nih.gov/books/NBK109841/.
38
S. Passi, S. Barocas, Problem Formulation and Fairness. arXiv:1901.02547 [cs.CY] (8 January 2019).
39
S. Mullainathan, Z. Obermeyer, Does Machine Learning Automate Moral Hazard and Error? Am. Econ. Rev. 107, 476–480 (2017).
40
K. E. Joynt Maddox, M. Reidhead, J. Hu, A. J. H. Kind, A. M. Zaslavsky, E. M. Nagasako, D. R. Nerenz, Adjusting for social risk factors impacts performance and penalties in the hospital readmissions reduction program. Health Serv. Res. 54, 327–336 (2019).
41
K. E. Joynt Maddox, M. Reidhead, A. C. Qi, D. R. Nerenz, Association of Stratification by Dual Enrollment Status With Financial Penalties in the Hospital Readmissions Reduction Program. JAMA Intern. Med. 179, 769–776 (2019).
42
K. Lum, W. Isaac, To predict and serve? Significance 13, 14–19 (2016).
43
I. Ajunwa, “The Paradox of Automation as Anti-Bias Intervention,” available at SSRN (2016); https://ssrn.com/abstract=2746078.
44
S. DellaVigna, M. Gentzkow, “Uniform pricing in US retail chains” (National Bureau of Economic Research, 2017).
45
C. A. Gomez-Uribe, N. Hunt, The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manag. Inf. Syst. 6, 13 (2016).
46
Z. Obermeyer, S. Mullainathan, in Proceedings of the Conference on Fairness, Accountability, and Transparency (ACM, 2019), p. 89.
47
G. Weiss, L. T. Goodnough, Anemia of chronic disease. N. Engl. J. Med. 352, 1011–1023 (2005).
48
M. Tonelli, N. Wiebe, B. Culleton, A. House, C. Rabbat, M. Fok, F. McAlister, A. X. Garg, Chronic kidney disease and mortality risk: A systematic review. J. Am. Soc. Nephrol. 17, 2034–2047 (2006).
49
H. Ujiie, M. Kawasaki, Y. Suzuki, M. Kaibara, Influence of age and hematocrit on the coagulation of blood. J. Biorheol. 23, 111–114 (2009).
50
M. G. Silverman, B. A. Ference, K. Im, S. D. Wiviott, R. P. Giugliano, S. M. Grundy, E. Braunwald, M. S. Sabatine, Association Between Lowering LDL-C and Cardiovascular Risk Reduction Among Different Therapeutic Interventions: A Systematic Review and Meta-analysis. JAMA 316, 1289–1297 (2016).
51
B. Nowok, G. M. Raab, C. Dibben, synthpop: Bespoke creation of synthetic data in R. J. Stat. Softw. 74, 1–26 (2016).
Information & Authors
Information
Published In

Science
Volume 366 | Issue 6464
25 October 2019
25 October 2019
Copyright
Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
This is an article distributed under the terms of the Science Journals Default License.
Submission history
Received: 8 March 2019
Accepted: 4 October 2019
Published in print: 25 October 2019
Acknowledgments
We thank S. Lakhtakia, Z. Li, K. Lin, and R. Mahadeshwar for research assistance and D. Buefort and E. Maher for data science expertise. Funding: This work was supported by a grant from the National Institute for Health Care Management Foundation. Author contributions: Z.O. and S.M. designed the study, obtained funding, and conducted the analyses. All authors contributed to reviewing findings and writing the manuscript. Competing interests: The analysis was completely independent: None of the authors had any contact with the algorithm’s manufacturer until after it was complete. No authors received compensation, in any form, from the manufacturer or have any commercial interests in the manufacturer or competing entities or products. There were no confidentiality agreements that limited reporting of the work or its results, no material transfer agreements, no oversight in the preparation of this article (besides ethical oversight from the approving IRB, which was based at a non-profit academic health system), and no formal relationship of any kind between any of the authors and the manufacturer. Data and materials availability: Because the data used in this analysis are protected health information, they cannot be made publicly available. We provide instead a synthetic dataset (using the R package synthpop) and all code necessary to reproduce our analyses at https://gitlab.com/labsysmed/dissecting-bias.
Authors
Funding Information
National Institute for Health Care Management Foundation
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
Cited by
- Ethics, Health, and AI in a COVID-19 World, Ethical Implications of Reshaping Healthcare With Emerging Technologies, (1-24), (2022).https://doi.org/10.4018/978-1-7998-7888-9.ch001
- Big Data and the Threat to Moral Responsibility in Healthcare, Datenreiche Medizin und das Problem der Einwilligung, (11-25), (2022).https://doi.org/10.1007/978-3-662-62987-1
- Anticipating antibiotic resistance, Science, 375, 6583, (818-819), (2022)./doi/10.1126/science.abn9969
- Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity, Science Advances, 8, 11, (2022)./doi/10.1126/sciadv.abj1812
- Achieving fairness in medical devices, Science, 372, 6537, (30-31), (2021)./doi/10.1126/science.abe9195
- undefined, Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, (368-378), (2021).https://doi.org/10.1145/3461702.3462610
- undefined, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, (489-503), (2021).https://doi.org/10.1145/3442188.3445912
- Machine Learning and Clinical Informatics for Improving HIV Care Continuum Outcomes, Current HIV/AIDS Reports, 18, 3, (229-236), (2021).https://doi.org/10.1007/s11904-021-00552-3
- Predictive Solutions in Learning Health Systems: The Critical Need to Systematize Implementation of Prediction to Action to Intervention, NEJM Catalyst, 2, 5, (2021).https://doi.org/10.1056/CAT.20.0650
- Designing deep learning studies in cancer diagnostics, Nature Reviews Cancer, 21, 3, (199-211), (2021).https://doi.org/10.1038/s41568-020-00327-9
- See more
Loading...
View Options
Get Access
Log in to view the full text
AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your AAAS ID
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Register for free to read this article
As a service to the community, this article is available for free. Login or register for free to read this article.
Buy a single issue of Science for just $15 USD.
View options
PDF format
Download this article as a PDF file
Download PDF