Advertisement

Racial bias in health algorithms

The U.S. health care system uses commercial algorithms to guide health decisions. Obermeyer et al. find evidence of racial bias in one widely used algorithm, such that Black patients assigned the same level of risk by the algorithm are sicker than White patients (see the Perspective by Benjamin). The authors estimated that this racial bias reduces the number of Black patients identified for extra care by more than half. Bias occurs because the algorithm uses health costs as a proxy for health needs. Less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients. Reformulating the algorithm so that it no longer uses costs as a proxy for needs eliminates the racial bias in predicting who needs extra care.
Science, this issue p. 447; see also p. 421

Abstract

Health systems rely on commercial prediction algorithms to identify and help patients with complex health needs. We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias: At a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than illness, but unequal access to care means that we spend less money caring for Black patients than for White patients. Thus, despite health care cost appearing to be an effective proxy for health by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.

Get full access to this article

View all available purchase options and get full access to this article.

Already a subscriber or AAAS Member?

Supplementary Material

Summary

Materials and Methods
Figs. S1 to S5
Tables S1 to S4
References (4651)

Resources

File (aax2342_obermeyer_sm.pdf)

References and Notes

1
J. Angwin, J. Larson, S. Mattu, L. Kirchner, “Machine Bias,” ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
2
S. Barocas, A. D. Selbst, Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
3
A. Chouldechova, A. Roth, The frontiers of fairness in machine learning. arXiv:1810.08810 [cs.LG] (20 October 2018).
4
A. Datta, M. C. Tschantz, A. Datta, Automated experiments on ad privacy settings. Proc. Privacy Enhancing Technol. 2015, 92–112 (2015).
5
L. Sweeney, Discrimination in online ad delivery. Queue 11, 1–19 (2013).
6
M. Kay, C. Matuszek, S. A. Munson, in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (ACM, 2015), pp. 3819–3828.
7
B. F. Klare, M. J. Burge, J. C. Klontz, R. W. Vorder Bruegge, A. K. Jain, Face Recognition Performance: Role of Demographic Information. IEEE Trans. Inf. Forensics Security 7, 1789–1801 (2012).
8
J. Buolamwini, T. Gebru, in Proceedings of the Conference on Fairness, Accountability and Transparency (PMLR, 2018), pp. 77–91.
9
A. Caliskan, J. J. Bryson, A. Narayanan, Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
10
S. Corbett-Davies, S. Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. arXiv:1808.00023 [cs.CY] (31 July 2018).
11
M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, A. T. Kalai, Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. arXiv:1901.09451 [cs.IR] (27 January 2019).
12
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, in Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2015), pp. 259–268.
13
J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, S. Mullainathan, Human decisions and machine predictions. Q. J. Econ. 133, 237–293 (2018).
14
C. S. Hong, A. L. Siegel, T. G. Ferris, Caring for high-need, high-cost patients: What makes for a successful care management program? Issue Brief (Commonwealth Fund) 19, 1–19 (2014).
15
N. McCall, J. Cromwell, C. Urato, “Evaluation of Medicare Care Management for High Cost Beneficiaries (CMHCB) Demonstration: Massachusetts General Hospital and Massachusetts General Physicians Organization (MGH)” (RTI International, 2010).
16
J. Hsu, M. Price, C. Vogeli, R. Brand, M. E. Chernew, S. K. Chaguturu, E. Weil, T. G. Ferris, Bending The Spending Curve By Altering Care Delivery Patterns: The Role Of Care Management Within A Pioneer ACO. Health Aff. 36, 876–884 (2017).
17
L. Nelson, “Lessons from Medicare’s demonstration projects on disease management and care coordination” (Working Paper 2012-01, Congressional Budget Office, 2012).
18
C. Vogeli, A. E. Shields, T. A. Lee, T. B. Gibson, W. D. Marder, K. B. Weiss, D. Blumenthal, Multiple chronic conditions: Prevalence, health consequences, and implications for quality, care management, and costs. J. Gen. Intern. Med. 22 (suppl. 3), 391–395 (2007).
19
D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, G. Escobar, Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33, 1123–1131 (2014).
20
J. Kleinberg, J. Ludwig, S. Mullainathan, Z. Obermeyer, Prediction Policy Problems. Am. Econ. Rev. 105, 491–495 (2015).
21
G. Hileman, S. Steele, “Accuracy of claims-based risk scoring models” (Society of Actuaries, 2016).
22
J. Kleinberg, S. Mullainathan, M. Raghavan, Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv:1609.05807 [cs.LG] (19 September 2016).
23
A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 153–163 (2017).
24
V. de Groot, H. Beckerman, G. J. Lankhorst, L. M. Bouter, How to measure comorbidity. a critical review of available methods. J. Clin. Epidemiol. 56, 221–229 (2003).
25
J. J. Gagne, R. J. Glynn, J. Avorn, R. Levin, S. Schneeweiss, A combined comorbidity score predicted mortality in elderly patients better than existing scores. J. Clin. Epidemiol. 64, 749–759 (2011).
26
A. K. Parekh, M. B. Barton, The challenge of multiple comorbidity for the US health care system. JAMA 303, 1303–1304 (2010).
27
D. Ettehad, C. A. Emdin, A. Kiran, S. G. Anderson, T. Callender, J. Emberson, J. Chalmers, A. Rodgers, K. Rahimi, Blood pressure lowering for prevention of cardiovascular disease and death: a systematic review and meta-analysis. Lancet 387, 957–967 (2016).
28
K.-T. Khaw, N. Wareham, R. Luben, S. Bingham, S. Oakes, A. Welch, N. Day, Glycated haemoglobin, diabetes, and mortality in men in Norfolk cohort of European Prospective Investigation of Cancer and Nutrition (EPIC-Norfolk). BMJ 322, 15 (2001).
29
K. Fiscella, P. Franks, M. R. Gold, C. M. Clancy, Inequality in quality: Addressing socioeconomic, racial, and ethnic disparities in health care. JAMA 283, 2579–2584 (2000).
30
N. E. Adler, K. Newman, Socioeconomic disparities in health: Pathways and policies. Health Aff. 21, 60–76 (2002).
31
N. E. Adler, W. T. Boyce, M. A. Chesney, S. Folkman, S. L. Syme, Socioeconomic inequalities in health. No easy solution. JAMA 269, 3140–3145 (1993).
32
M. Alsan, O. Garrick, G. C. Graziani, “Does diversity matter for health? Experimental evidence from Oakland” (National Bureau of Economic Research, 2018).
33
K. Armstrong, K. L. Ravenell, S. McMurphy, M. Putt, Racial/ethnic differences in physician distrust in the United States. Am. J. Public Health 97, 1283–1289 (2007).
34
M. Alsan, M. Wanamaker, Tuskegee and the health of black men. Q. J. Econ. 133, 407–455 (2018).
35
M. van Ryn, J. Burke, The effect of patient race and socio-economic status on physicians’ perceptions of patients. Soc. Sci. Med. 50, 813–828 (2000).
36
K. M. Hoffman, S. Trawalter, J. R. Axt, M. N. Oliver, Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites. Proc. Natl. Acad. Sci. U.S.A. 113, 4296–4301 (2016).
37
J. J. Escarce, F. W. Puffer, “Black-White Differences in the Use of Medical Care by the Elderly: A Contemporary Analysis” in Racial and Ethnic Differences in the Health of Older Americans (National Academies Press, 1997), chap. 6; www.ncbi.nlm.nih.gov/books/NBK109841/.
38
S. Passi, S. Barocas, Problem Formulation and Fairness. arXiv:1901.02547 [cs.CY] (8 January 2019).
39
S. Mullainathan, Z. Obermeyer, Does Machine Learning Automate Moral Hazard and Error? Am. Econ. Rev. 107, 476–480 (2017).
40
K. E. Joynt Maddox, M. Reidhead, J. Hu, A. J. H. Kind, A. M. Zaslavsky, E. M. Nagasako, D. R. Nerenz, Adjusting for social risk factors impacts performance and penalties in the hospital readmissions reduction program. Health Serv. Res. 54, 327–336 (2019).
41
K. E. Joynt Maddox, M. Reidhead, A. C. Qi, D. R. Nerenz, Association of Stratification by Dual Enrollment Status With Financial Penalties in the Hospital Readmissions Reduction Program. JAMA Intern. Med. 179, 769–776 (2019).
42
K. Lum, W. Isaac, To predict and serve? Significance 13, 14–19 (2016).
43
I. Ajunwa, “The Paradox of Automation as Anti-Bias Intervention,” available at SSRN (2016); https://ssrn.com/abstract=2746078.
44
S. DellaVigna, M. Gentzkow, “Uniform pricing in US retail chains” (National Bureau of Economic Research, 2017).
45
C. A. Gomez-Uribe, N. Hunt, The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manag. Inf. Syst. 6, 13 (2016).
46
Z. Obermeyer, S. Mullainathan, in Proceedings of the Conference on Fairness, Accountability, and Transparency (ACM, 2019), p. 89.
47
G. Weiss, L. T. Goodnough, Anemia of chronic disease. N. Engl. J. Med. 352, 1011–1023 (2005).
48
M. Tonelli, N. Wiebe, B. Culleton, A. House, C. Rabbat, M. Fok, F. McAlister, A. X. Garg, Chronic kidney disease and mortality risk: A systematic review. J. Am. Soc. Nephrol. 17, 2034–2047 (2006).
49
H. Ujiie, M. Kawasaki, Y. Suzuki, M. Kaibara, Influence of age and hematocrit on the coagulation of blood. J. Biorheol. 23, 111–114 (2009).
50
M. G. Silverman, B. A. Ference, K. Im, S. D. Wiviott, R. P. Giugliano, S. M. Grundy, E. Braunwald, M. S. Sabatine, Association Between Lowering LDL-C and Cardiovascular Risk Reduction Among Different Therapeutic Interventions: A Systematic Review and Meta-analysis. JAMA 316, 1289–1297 (2016).
51
B. Nowok, G. M. Raab, C. Dibben, synthpop: Bespoke creation of synthetic data in R. J. Stat. Softw. 74, 1–26 (2016).

Information & Authors

Information

Published In

Science
Volume 366 | Issue 6464
25 October 2019

Submission history

Received: 8 March 2019
Accepted: 4 October 2019
Published in print: 25 October 2019

Permissions

Request permissions for this article.

Acknowledgments

We thank S. Lakhtakia, Z. Li, K. Lin, and R. Mahadeshwar for research assistance and D. Buefort and E. Maher for data science expertise. Funding: This work was supported by a grant from the National Institute for Health Care Management Foundation. Author contributions: Z.O. and S.M. designed the study, obtained funding, and conducted the analyses. All authors contributed to reviewing findings and writing the manuscript. Competing interests: The analysis was completely independent: None of the authors had any contact with the algorithm’s manufacturer until after it was complete. No authors received compensation, in any form, from the manufacturer or have any commercial interests in the manufacturer or competing entities or products. There were no confidentiality agreements that limited reporting of the work or its results, no material transfer agreements, no oversight in the preparation of this article (besides ethical oversight from the approving IRB, which was based at a non-profit academic health system), and no formal relationship of any kind between any of the authors and the manufacturer. Data and materials availability: Because the data used in this analysis are protected health information, they cannot be made publicly available. We provide instead a synthetic dataset (using the R package synthpop) and all code necessary to reproduce our analyses at https://gitlab.com/labsysmed/dissecting-bias.

Authors

Affiliations

School of Public Health, University of California, Berkeley, Berkeley, CA, USA.
Department of Emergency Medicine, Brigham and Women’s Hospital, Boston, MA, USA.
Brian Powers
Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA.
Christine Vogeli
Mongan Institute Health Policy Center, Massachusetts General Hospital, Boston, MA, USA.
Booth School of Business, University of Chicago, Chicago, IL, USA.

Funding Information

National Institute for Health Care Management Foundation

Notes

*
These authors contributed equally to this work.
†Corresponding author. Email: [email protected]

Metrics & Citations

Metrics

Article Usage
Altmetrics

Citations

Export citation

Select the format you want to export the citation of this publication.

Cited by

  1. Ethics, Health, and AI in a COVID-19 World, Ethical Implications of Reshaping Healthcare With Emerging Technologies, (1-24), (2022).https://doi.org/10.4018/978-1-7998-7888-9.ch001
    Crossref
  2. Big Data and the Threat to Moral Responsibility in Healthcare, Datenreiche Medizin und das Problem der Einwilligung, (11-25), (2022).https://doi.org/10.1007/978-3-662-62987-1
    Crossref
  3. Anticipating antibiotic resistance, Science, 375, 6583, (818-819), (2022)./doi/10.1126/science.abn9969
    Abstract
  4. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity, Science Advances, 8, 11, (2022)./doi/10.1126/sciadv.abj1812
    Abstract
  5. Achieving fairness in medical devices, Science, 372, 6537, (30-31), (2021)./doi/10.1126/science.abe9195
    Abstract
  6. undefined, Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, (368-378), (2021).https://doi.org/10.1145/3461702.3462610
    Crossref
  7. undefined, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, (489-503), (2021).https://doi.org/10.1145/3442188.3445912
    Crossref
  8. Machine Learning and Clinical Informatics for Improving HIV Care Continuum Outcomes, Current HIV/AIDS Reports, 18, 3, (229-236), (2021).https://doi.org/10.1007/s11904-021-00552-3
    Crossref
  9. Predictive Solutions in Learning Health Systems: The Critical Need to Systematize Implementation of Prediction to Action to Intervention, NEJM Catalyst, 2, 5, (2021).https://doi.org/10.1056/CAT.20.0650
    Crossref
  10. Designing deep learning studies in cancer diagnostics, Nature Reviews Cancer, 21, 3, (199-211), (2021).https://doi.org/10.1038/s41568-020-00327-9
    Crossref
  11. See more
Loading...

View Options

Get Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.
Log in via Shibboleth.
More options

Register for free to read this article

As a service to the community, this article is available for free. Login or register for free to read this article.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Media

Figures

Multimedia

Tables

Share

Share

Share article link

Share on social media