Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium
Science Translational Medicine • 20 Apr 2011 • Vol 3, Issue 79 • p. 79re1 • DOI: 10.1126/scitranslmed.3001807
Where Electronic Records and Genomics Meet
There has been a surge of interest in using electronic medical records in hospitals and clinics to capture information about patients that is normally buried in doctors’ handwritten notes. Indeed, the U.S. government has made the implementation of electronic medical records a priority area and has instigated standards for the recording and use of these records. The clinical data captured in electronic medical records including diagnoses, medical tests, and medications provide accurate clinical information that will improve patient care. With the ability to sequence the genomes of individuals faster and cheaper than ever before, it may be possible in the future to include the genome sequences of patients in their electronic medical records. A consortium called the Electronic Medical Records and Genomics Network (eMERGE) has set out to investigate whether clinical data captured in electronic medical records could be used to accurately identify patients with particular diseases for inclusion in genome-wide association studies (GWAS). GWAS scrutinize the genomes of individuals with particular diseases to identify tiny genetic variations that are associated with the risk of developing that disease. Here, the eMERGE consortium reports its study of the electronic medical records from five clinical centers and how accurately it identified patients with one of five diseases: dementia, cataracts, peripheral arterial disease, type 2 diabetes, and cardiac conduction defects. The investigators show that even though the electronic medical records were of different types and did not all use natural language processing to extract information from the records, they were able to obtain robust positive and negative values for identifying patients with these diseases with sufficient accuracy for use in GWAS. They conclude that widespread adoption of electronic medical records will provide real-world clinical data that will be valuable for GWAS and other types of genetic research.
Abstract
Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
Get full access to this article
View all available purchase options and get full access to this article.
Already a Subscriber?Sign In
References and Notes
1
Chaudhry B., Wang J., Wu S., Maglione M., Mojica W., Roth E., Morton S. C., Shekelle P. G., Systematic review: Impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144, 742–752 (2006).
2
Linder J. A., Ma J., Bates D. W., Middleton B., Stafford R. S., Electronic health record use and the quality of ambulatory care in the United States. Arch. Intern. Med. 167, 1400–1405 (2007).
3
Walsh M. N., Yancy C. W., Albert N. M., Curtis A. B., Stough W. G., Gheorghiade M., Heywood J. T., McBride M. L., Mehra M. R., O’Connor C. M., Reynolds D., Fonarow G. C., Electronic health records and quality of care for heart failure. Am. Heart J. 159, 635–642.e1 (2010).
4
Baron R. J., Quality improvement with an electronic health record: Achievable, but not automatic. Ann. Intern. Med. 147, 549–552 (2007).
5
Jha A. K., DesRoches C. M., Campbell E. G., Donelan K., Rao S. R., Ferris T. G., Shields A., Rosenbaum S., Blumenthal D., Use of electronic health records in U.S. hospitals. N. Engl. J. Med. 360, 1628–1638 (2009).
6
DesRoches C. M., Campbell E. G., Rao S. R., Donelan K., Ferris T. G., Jha A., Kaushal R., Levy D. E., Rosenbaum S., Shields A. E., Blumenthal D., Electronic health records in ambulatory care—a national survey of physicians. N. Engl. J. Med. 359, 50–60 (2008).
7
U. S. Congress, American Recovery and Reinvestment Act, 2009.
8
Blumenthal D., Stimulating the adoption of health information technology. N. Engl. J. Med. 360, 1477–1479 (2009).
9
Shea S., Hripcsak G., Accelerating the use of electronic health records in physician practices. N. Engl. J. Med. 362, 192–195 (2010).
10
Meaningful use criteria—final rule; http://edocket.access.gpo.gov/2010/pdf/2010-17207.pdf [accessed 8 October 2010].
11
Blumenthal D., Tavenner M., The “meaningful use” regulation for electronic health records. N. Engl. J. Med. 363, 501–504 (2010).
12
Nadler J. J., Downing G. J., Liberating health data for clinical research applications. Sci. Transl. Med. 2, 18cm6 (2010).
13
Church G. M., Genomes for all. Sci. Am. 294, 46–54 (2006).
14
Burke W., Psaty B. M., Personalized medicine in the era of genomics. JAMA 298, 1682–1684 (2007).
15
Cortese D. A., A vision of individualized medicine in the context of global health. Clin. Pharmacol. Ther. 82, 491–493 (2007).
16
Ginsburg G. S., Willard H. F., Genomic and personalized medicine: Foundations and applications. Transl. Res. 154, 277–287 (2009).
17
Edwards B. J., Haynes C., Levenstien M. A., Finch S. J., Gordon D., Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet. 6, 18 (2005).
18
Electronic Medical Records and Genomics (eMERGE); https://www.gwas.net/ [accessed 8 October 2010].
19
Murphy S., Churchill S., Bry L., Chueh H., Weiss S., Lazarus R., Zeng Q., Dubey A., Gainer V., Mendis M., Glaser J., Kohane I., Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res. 19, 1675–1681 (2009).
20
Friedman C., Alderson P. O., Austin J. H., Cimino J. J., Johnson S. B., A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1, 161–174 (1994).
21
Denny J. C., Spickard A., Johnson K. B., Peterson N. B., Peterson J. F., Miller R. A., Evaluation of a method to identify and categorize section headers in clinical documents. J. Am. Med. Inform. Assoc. 16, 806–815 (2009).
22
Savova G. K., Ogren P. V., Duffy P. H., Buntrock J. D., Chute C. G., Mayo Clinic NLP system for patient smoking status identification. J. Am. Med. Inform. Assoc. 15, 25–28 (2008).
23
Xu H., Stenner S. P., Doan S., Johnson K. B., Waitman L. R., Denny J. C., MedEx: A medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17, 19–24 (2010).
24
Wilke R. A., Berg R. L., Peissig P., Kitchner T., Sijercic B., McCarty C. A., McCarty D. J., Use of an electronic medical record for the identification of research subjects with diabetes mellitus. Clin. Med. Res. 5, 1–7 (2007).
25
Denny J. C., Miller R. A., Waitman L. R., Arrieta M. A., Peterson J. F., Identifying QT prolongation from ECG impressions using a general-purpose natural language processor. Int. J. Med. Inform. 78 (Suppl. 1), S34–S42 (2009).
26
McCarty C. A., Peissig P., Caldwell M. D., Wilke R. A., The Marshfield Clinic Personalized Medicine Research Project: 2008 scientific update and lessons learned in the first 6 years. Personalized Med. 5, 529–542 (2008).
27
Roden D. M., Pulley J. M., Basford M. A., Bernard G. R., Clayton E. W., Balser J. R., Masys D. R., Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).
28
The NUgene Project, http://www.nugene.org/ [accessed 8 October 2010].
29
Kukull W. A., Higdon R., Bowen J. D., McCormick W. C., Teri L., Schellenberg G. D., van Belle G., Jolley L., Larson E. B., Dementia and Alzheimer disease incidence: A prospective cohort study. Arch. Neurol. 59, 1737–1746 (2002).
30
Kullo I. J., Fan J., Pathak J., Savova G. K., Ali Z., Chute C. G., Leveraging informatics for genetic studies: Use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J. Am. Med. Inform. Assoc. 17, 568–574 (2010).
31
Ritchie M. D., Denny J. C., Crawford D. C., Ramirez A. H., Weiner J. B., Pulley J. M., Basford M. A., Brown-Gentry K., Balser J. R., Masys D. R., Haines J. L., Roden D. M., Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010).
32
Denny J. C., Ritchie M. D., Crawford D. C., Schildcrout J. S., Ramirez A. H., Pulley J. M., Basford M. A., Masys D. R., Haines J. L., Roden D. M., Identification of genomic predictors of atrioventricular conduction: Using electronic medical records as a tool for genome science. Circulation 122, 2016–2021 (2010).
33
Open Health Natural Language Processing (OHNLP) Consortium, http://www.ohnlp.org/
34
Denny J. C., Peterson J. F., Choma N. N., Xu H., Miller R. A., Bastarache L., Peterson N. B., Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J. Am. Med. Inform. Assoc. 17, 383–388 (2010).
35
Friedlin J., Grannis S., Overhage J. M., Using natural language processing to improve accuracy of automated notifiable disease reporting. AMIA Annu. Symp. Proc. 207–211 (2008).
36
Love T. J., Cai T., Karlson E. W., Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing. Semin. Arthritis Rheum. 40, 413–420 (2011).
37
Liao K. P., Cai T., Gainer V., Goryachev S., Zeng-treitler Q., Raychaudhuri S., Szolovits P., Churchill S., Murphy S., Kohane I., Karlson E. W., Plenge R. M., Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 62, 1120–1127 (2010).
38
Provisional guidance on the implementation of the 1997 standards for federal data on race and ethnicity; http://minorityhealth.hhs.gov/templates/browse.aspx?lvl=2&lvlID=172 [accessed 8 October 2010].
39
Wynia M. K., Ivey S. L., Hasnain-Wynia R., Collection of data on patients’ race and ethnic group by physician practices. N. Engl. J. Med. 362, 846–850 (2010).
40
Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
41
Melton G. B., Raman N., Chen E. S., Sarkar I. N., Pakhomov S., Madoff R. D., Evaluation of family history information within clinical documents and adequacy of HL7 clinical statement and clinical genomics family history models for its representation: A case report. J. Am. Med. Inform. Assoc. 17, 337–340 (2010).
42
Feero W. G., Bigley M. B., Brinner K. M. Family Health History Multi-Stakeholder Workgroup of the American Health Information Community, New standards and enhanced utility for family health history information in the electronic health record: An update from the American Health Information Community’s Family Health History Multi-Stakeholder Workgroup. J. Am. Med. Inform. Assoc. 15, 723–728 (2008).
43
Surgeon General’s Family Health History Initiative; http://www.hhs.gov/familyhistory/ [accessed 8 October 2010].
44
Allen R. W., Criqui M. H., Diez Roux A. V., Allison M., Shea S., Detrano R., Sheppard L., Wong N. D., Stukovsky K. H., Kaufman J. D., Fine particulate matter air pollution, proximity to traffic, and aortic atherosclerosis. Epidemiology 20, 254–264 (2009).
45
Diez-Roux A. V., On genes, individuals, society, and epidemiology. Am. J. Epidemiol. 148, 1027–1032 (1998).
46
Diez Roux A. V., Merkin S. S., Arnett D., Chambless L., Massing M., Nieto F. J., Sorlie P., Szklo M., Tyroler H. A., Watson R. L., Neighborhood of residence and incidence of coronary heart disease. N. Engl. J. Med. 345, 99–106 (2001).
47
Mujahid M. S., Diez Roux A. V., Morenoff J. D., Raghunathan T. E., Cooper R. S., Ni H., Shea S., Neighborhood characteristics and hypertension. Epidemiology 19, 590–598 (2008).
48
Patel C. J., Bhattacharya J., Butte A. J., An environment-wide association study (EWAS) on type 2 diabetes mellitus. PLoS One 5, e10746 (2010).
49
Schildcrout J. S., Basford M. A., Pulley J. M., Masys D. R., Roden D. M., Wang D., Chute C. G., Kullo I. J., Carrell D., Peissig P., Kho A., Denny J. C., An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J. Biomed. Inform. 43, 914–923 (2010).
50
Consolidated Health Informatics Initiative; http://www.hhs.gov/healthit/chiinitiative.html/ [accessed 4 October 2010].
51
Logical Observations Identifiers Names and Codes (LOINC), http://loinc.org/ [accessed 8 October 2010].
52
RxNorm; http://www.nlm.nih.gov/research/umls/rxnorm/ [accessed 12 October 2010].
53
Breitner J. C. , Haneuse S. J. , Walker R., Dublin S., Crane P. K., Gray S. L., Larson E. B., Risk of dementia and AD with prior exposure to NSAIDs in an elderly community-based cohort. Neurology 72, 1899–1905 (2009).
54
Larson E. B., Wang L., Bowen J. D., McCormick W. C., Teri L., Crane P., Kukull W., Exercise is associated with reduced risk for incident dementia among persons 65 years of age and older. Ann. Intern. Med. 144, 73–81 (2006).
Information & Authors
Information
Published In

Science Translational Medicine
Volume 3 | Issue 79
April 2011
April 2011
Copyright
Copyright © 2011, American Association for the Advancement of Science.
Submission history
Received: 14 October 2010
Accepted: 1 April 2011
Acknowledgments
Funding: The eMERGE Network was initiated and funded by the National Human Genome Research Institute, with additional funding from the National Institute of General Medical Sciences through grants U01-HG-004610 (Group Health Cooperative), U01-HG-004608 (Marshfield Clinic), U01-HG-04599 (Mayo Clinic), U01HG004609 (Northwestern University), and U01-HG-04603 (Vanderbilt University, also serving as the Coordinating Center), and the State of Washington Life Sciences Discovery Fund award to the Northwest Institute of Medical Genetics. The Vanderbilt BioVU and the Synthetic Derivative were supported in part by Clinical and Translational Research Award grant 1 UL1 RR024975 from the National Center for Research Resources, NIH. Funding for the Northwestern Enterprise Data Warehouse (EDW) was supported in part by Clinical and Translational Research grant UL1RR025741 from the National Center for Research Resources, NIH. Author contributions: All authors participated in the design and interpretation of the experiments and results. A.N.K., J.A.P., P.L.P., L.R., K.M.N., N.W., P.K.C., J.P., C.G.C., S.J.B., R.L.C., and J.C.D. participated in the acquisition and analysis of data. A.N.K., J.A.P., P.L.P., C.G.C., K.M.N., N.W., I.J.K., J.C.D., and P.K.C. performed statistical analysis. A.N.K., P.L.P., K.M.N., J.C.D., and C.G.C. led data collection and validation from each participating site. All authors contributed toward writing and editing the manuscript. Competing interests: The authors declare that they have no competing interests.
Authors
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
Cited by
- Medical records-based chronic kidney disease phenotype for clinical care and “big data” observational and genetic studies, npj Digital Medicine, 4, 1, (2021).https://doi.org/10.1038/s41746-021-00428-1
- Translational Bioinformatics, Biomedical Informatics, (867-911), (2021).https://doi.org/10.1007/978-3-030-58721-5
- Lessons learned from the eMERGE Network: balancing genomics in discovery and practice, Human Genetics and Genomics Advances, 2, 1, (100018), (2021).https://doi.org/10.1016/j.xhgg.2020.100018
- Analyzing Collective Knowledge Towards Public Health Policy Making, Artificial Intelligence Applications and Innovations. AIAI 2021 IFIP WG 12.5 International Workshops, (171-181), (2021).https://doi.org/10.1007/978-3-030-79157-5_15
- Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort, Journal of Biomedical Informatics, 117, (103777), (2021).https://doi.org/10.1016/j.jbi.2021.103777
- Development of Personalized Medicine, Textbook of Personalized Medicine, (603-624), (2021).https://doi.org/10.1007/978-3-030-62080-6
- Longitudinal cohorts for harnessing the electronic health record for disease prediction in a US population, BMJ Open, 11, 6, (e044353), (2021).https://doi.org/10.1136/bmjopen-2020-044353
- The phenotypic legacy of admixture between modern humans and Neandertals, Science, 351, 6274, (737-741), (2021)./doi/10.1126/science.aad2149
- Biobanks and Electronic Medical Records: Enabling Cost-Effective Research, Science Translational Medicine, 6, 234, (234cm3-234cm3), (2021)./doi/10.1126/scitranslmed.3008604
- Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records, Science Translational Medicine, 6, 234, (234ra57-234ra57), (2021)./doi/10.1126/scitranslmed.3007191
- See more
Loading...
View Options
Get Access
Log in to view the full text
AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your AAAS ID
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Register for free to read this article
As a service to the community, this article is available for free. Login or register for free to read this article.
View options
PDF format
Download this article as a PDF file
Download PDF





