Racial bias in health algorithms
The U.S. health care system uses commercial algorithms to guide health decisions. Obermeyer et al. find evidence of racial bias in one widely used algorithm, such that Black patients assigned the same level of risk by the algorithm are sicker than White patients (see the Perspective by Benjamin). The authors estimated that this racial bias reduces the number of Black patients identified for extra care by more than half. Bias occurs because the algorithm uses health costs as a proxy for health needs. Less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients. Reformulating the algorithm so that it no longer uses costs as a proxy for needs eliminates the racial bias in predicting who needs extra care.
Health systems rely on commercial prediction algorithms to identify and help patients with complex health needs. We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias: At a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than illness, but unequal access to care means that we spend less money caring for Black patients than for White patients. Thus, despite health care cost appearing to be an effective proxy for health by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.
Materials and Methods
Figs. S1 to S5
Tables S1 to S4
References and Notes
S. Barocas, A. D. Selbst, Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
A. Datta, M. C. Tschantz, A. Datta, Automated experiments on ad privacy settings. Proc. Privacy Enhancing Technol. 2015, 92–112 (2015).
L. Sweeney, Discrimination in online ad delivery. Queue 11, 1–19 (2013).
M. Kay, C. Matuszek, S. A. Munson, in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (ACM, 2015), pp. 3819–3828.
B. F. Klare, M. J. Burge, J. C. Klontz, R. W. Vorder Bruegge, A. K. Jain, Face Recognition Performance: Role of Demographic Information. IEEE Trans. Inf. Forensics Security 7, 1789–1801 (2012).
J. Buolamwini, T. Gebru, in Proceedings of the Conference on Fairness, Accountability and Transparency (PMLR, 2018), pp. 77–91.
A. Caliskan, J. J. Bryson, A. Narayanan, Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, A. T. Kalai, Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. arXiv:1901.09451 [cs.IR] (27 January 2019).
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, in Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2015), pp. 259–268.
J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, S. Mullainathan, Human decisions and machine predictions. Q. J. Econ. 133, 237–293 (2018).
C. S. Hong, A. L. Siegel, T. G. Ferris, Caring for high-need, high-cost patients: What makes for a successful care management program? Issue Brief (Commonwealth Fund) 19, 1–19 (2014).
N. McCall, J. Cromwell, C. Urato, “Evaluation of Medicare Care Management for High Cost Beneficiaries (CMHCB) Demonstration: Massachusetts General Hospital and Massachusetts General Physicians Organization (MGH)” (RTI International, 2010).
J. Hsu, M. Price, C. Vogeli, R. Brand, M. E. Chernew, S. K. Chaguturu, E. Weil, T. G. Ferris, Bending The Spending Curve By Altering Care Delivery Patterns: The Role Of Care Management Within A Pioneer ACO. Health Aff. 36, 876–884 (2017).
L. Nelson, “Lessons from Medicare’s demonstration projects on disease management and care coordination” (Working Paper 2012-01, Congressional Budget Office, 2012).
C. Vogeli, A. E. Shields, T. A. Lee, T. B. Gibson, W. D. Marder, K. B. Weiss, D. Blumenthal, Multiple chronic conditions: Prevalence, health consequences, and implications for quality, care management, and costs. J. Gen. Intern. Med. 22 (suppl. 3), 391–395 (2007).
D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, G. Escobar, Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33, 1123–1131 (2014).
J. Kleinberg, J. Ludwig, S. Mullainathan, Z. Obermeyer, Prediction Policy Problems. Am. Econ. Rev. 105, 491–495 (2015).
G. Hileman, S. Steele, “Accuracy of claims-based risk scoring models” (Society of Actuaries, 2016).
A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 153–163 (2017).
V. de Groot, H. Beckerman, G. J. Lankhorst, L. M. Bouter, How to measure comorbidity. a critical review of available methods. J. Clin. Epidemiol. 56, 221–229 (2003).
J. J. Gagne, R. J. Glynn, J. Avorn, R. Levin, S. Schneeweiss, A combined comorbidity score predicted mortality in elderly patients better than existing scores. J. Clin. Epidemiol. 64, 749–759 (2011).
A. K. Parekh, M. B. Barton, The challenge of multiple comorbidity for the US health care system. JAMA 303, 1303–1304 (2010).
D. Ettehad, C. A. Emdin, A. Kiran, S. G. Anderson, T. Callender, J. Emberson, J. Chalmers, A. Rodgers, K. Rahimi, Blood pressure lowering for prevention of cardiovascular disease and death: a systematic review and meta-analysis. Lancet 387, 957–967 (2016).
K.-T. Khaw, N. Wareham, R. Luben, S. Bingham, S. Oakes, A. Welch, N. Day, Glycated haemoglobin, diabetes, and mortality in men in Norfolk cohort of European Prospective Investigation of Cancer and Nutrition (EPIC-Norfolk). BMJ 322, 15 (2001).
K. Fiscella, P. Franks, M. R. Gold, C. M. Clancy, Inequality in quality: Addressing socioeconomic, racial, and ethnic disparities in health care. JAMA 283, 2579–2584 (2000).
N. E. Adler, K. Newman, Socioeconomic disparities in health: Pathways and policies. Health Aff. 21, 60–76 (2002).
N. E. Adler, W. T. Boyce, M. A. Chesney, S. Folkman, S. L. Syme, Socioeconomic inequalities in health. No easy solution. JAMA 269, 3140–3145 (1993).
M. Alsan, O. Garrick, G. C. Graziani, “Does diversity matter for health? Experimental evidence from Oakland” (National Bureau of Economic Research, 2018).
K. Armstrong, K. L. Ravenell, S. McMurphy, M. Putt, Racial/ethnic differences in physician distrust in the United States. Am. J. Public Health 97, 1283–1289 (2007).
M. Alsan, M. Wanamaker, Tuskegee and the health of black men. Q. J. Econ. 133, 407–455 (2018).
M. van Ryn, J. Burke, The effect of patient race and socio-economic status on physicians’ perceptions of patients. Soc. Sci. Med. 50, 813–828 (2000).
K. M. Hoffman, S. Trawalter, J. R. Axt, M. N. Oliver, Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites. Proc. Natl. Acad. Sci. U.S.A. 113, 4296–4301 (2016).
J. J. Escarce, F. W. Puffer, “Black-White Differences in the Use of Medical Care by the Elderly: A Contemporary Analysis” in Racial and Ethnic Differences in the Health of Older Americans (National Academies Press, 1997), chap. 6; www.ncbi.nlm.nih.gov/books/NBK109841/.
S. Mullainathan, Z. Obermeyer, Does Machine Learning Automate Moral Hazard and Error? Am. Econ. Rev. 107, 476–480 (2017).
K. E. Joynt Maddox, M. Reidhead, J. Hu, A. J. H. Kind, A. M. Zaslavsky, E. M. Nagasako, D. R. Nerenz, Adjusting for social risk factors impacts performance and penalties in the hospital readmissions reduction program. Health Serv. Res. 54, 327–336 (2019).
K. E. Joynt Maddox, M. Reidhead, A. C. Qi, D. R. Nerenz, Association of Stratification by Dual Enrollment Status With Financial Penalties in the Hospital Readmissions Reduction Program. JAMA Intern. Med. 179, 769–776 (2019).
K. Lum, W. Isaac, To predict and serve? Significance 13, 14–19 (2016).
S. DellaVigna, M. Gentzkow, “Uniform pricing in US retail chains” (National Bureau of Economic Research, 2017).
C. A. Gomez-Uribe, N. Hunt, The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manag. Inf. Syst. 6, 13 (2016).
Z. Obermeyer, S. Mullainathan, in Proceedings of the Conference on Fairness, Accountability, and Transparency (ACM, 2019), p. 89.
G. Weiss, L. T. Goodnough, Anemia of chronic disease. N. Engl. J. Med. 352, 1011–1023 (2005).
M. Tonelli, N. Wiebe, B. Culleton, A. House, C. Rabbat, M. Fok, F. McAlister, A. X. Garg, Chronic kidney disease and mortality risk: A systematic review. J. Am. Soc. Nephrol. 17, 2034–2047 (2006).
H. Ujiie, M. Kawasaki, Y. Suzuki, M. Kaibara, Influence of age and hematocrit on the coagulation of blood. J. Biorheol. 23, 111–114 (2009).
M. G. Silverman, B. A. Ference, K. Im, S. D. Wiviott, R. P. Giugliano, S. M. Grundy, E. Braunwald, M. S. Sabatine, Association Between Lowering LDL-C and Cardiovascular Risk Reduction Among Different Therapeutic Interventions: A Systematic Review and Meta-analysis. JAMA 316, 1289–1297 (2016).
B. Nowok, G. M. Raab, C. Dibben, synthpop: Bespoke creation of synthetic data in R. J. Stat. Softw. 74, 1–26 (2016).
Volume 366 | Issue 6464
25 October 2019
25 October 2019
Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
This is an article distributed under the terms of the Science Journals Default License.
Received: 8 March 2019
Accepted: 4 October 2019
Published in print: 25 October 2019
Request permissions for this article.
We thank S. Lakhtakia, Z. Li, K. Lin, and R. Mahadeshwar for research assistance and D. Buefort and E. Maher for data science expertise. Funding: This work was supported by a grant from the National Institute for Health Care Management Foundation. Author contributions: Z.O. and S.M. designed the study, obtained funding, and conducted the analyses. All authors contributed to reviewing findings and writing the manuscript. Competing interests: The analysis was completely independent: None of the authors had any contact with the algorithm’s manufacturer until after it was complete. No authors received compensation, in any form, from the manufacturer or have any commercial interests in the manufacturer or competing entities or products. There were no confidentiality agreements that limited reporting of the work or its results, no material transfer agreements, no oversight in the preparation of this article (besides ethical oversight from the approving IRB, which was based at a non-profit academic health system), and no formal relationship of any kind between any of the authors and the manufacturer. Data and materials availability: Because the data used in this analysis are protected health information, they cannot be made publicly available. We provide instead a synthetic dataset (using the R package synthpop) and all code necessary to reproduce our analyses at https://gitlab.com/labsysmed/dissecting-bias.
National Institute for Health Care Management Foundation
Select the format you want to export the citation of this publication.
Download this article as a PDF fileDownload PDF