An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People
A Deep Look Into Our Genes
Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit). Tennessen et al. (p. 64, published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100, published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health.
Abstract
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
Get full access to this article
View all available purchase options and get full access to this article.
Already a Subscriber?Sign In
Supplementary Material
Summary
Materials and Methods
Supplementary Text
Figs. S1 to S15
Tables S1 to S17
Databases S1 to S3
Resources
References and Notes
1
Pritchard J. K., Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124 (2001).
2
Kryukov G. V., Pennacchio L. A., Sunyaev S. R., Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727 (2007).
3
Marth G. T., et al.1000 Genomes Project, The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011).
4
Manolio T. A., et al., Finding the missing heritability of complex diseases. Nature 461, 747 (2009).
5
Eichler E. E., et al., Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446 (2010).
6
Asimit J., Zeggini E., Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 44, 293 (2010).
7
Gravel S., et al.1000 Genomes Project, Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. U.S.A. 108, 11983 (2011).
8
Coventry A., et al., Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1, 131 (2010).
9
Ohta T., Slightly deleterious mutant substitutions in evolution. Nature 246, 96 (1973).
10
Williamson S. H., et al., Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. U.S.A. 102, 7882 (2005).
11
Muller H. J., Our load of mutations. Am. J. Hum. Genet. 2, 111 (1950).
12
Russ A. P., Lampel S., The druggable genome: An update. Drug Discov. Today 10, 1607 (2005).
13
Materials and methods are available as supplementary materials on Science Online.
14
M. A. Jobling, M. Hurles, C. Tyler-Smith, Human Evolutionary Genetics: Origins, Peoples and Disease (Garland Science, 2003).
15
M. Livi-Bacci, A Concise History of World Population (Wiley-Blackwell, ed. 2, 2007), pp. 1 to 250.
16
Wakeley J., Takahashi T., Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20, 208 (2003).
17
Awadalla P., et al., Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am. J. Hum. Genet. 87, 316 (2010).
18
Conrad D. F., et al.1000 Genomes Project, Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712 (2011).
19
Messer P. W., Measuring the rates of spontaneous mutation from deep and large-scale polymorphism data. Genetics 182, 1219 (2009).
20
Price A. L., et al., Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832 (2010).
21
Kotowski I. K., et al., A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. Am. J. Hum. Genet. 78, 410 (2006).
22
Hindorff L. A., et al., Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U.S.A. 106, 9362 (2009).
23
Salmela E., et al., Genome-wide analysis of single nucleotide polymorphisms uncovers population structure in Northern Europe. PLoS ONE 3, e3519 (2008).
24
Bustamante C. D., Burchard E. G., De la Vega F. M., Genomics for the world. Nature 475, 163 (2011).
25
Lao O., et al., Correlation between genetic and geographic structure in Europe. Curr. Biol. 18, 1241 (2008).
26
Lander E. S., Schork N. J., Genetic dissection of complex traits. Science 265, 2037 (1994).
27
Akey J. M., et al., Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2, e286 (2004).
28
SeattleSNPs, http://pga.gs.washington.edu (2012).
29
Ahituv N., et al., Medical sequencing at the extremes of human body mass. Am. J. Hum. Genet. 80, 779 (2007).
30
Durbin R. M., et al., A map of human genome variation from population-scale sequencing. Nature 467, 1061 (2010).
31
Firmann M., et al., The CoLaus study: A population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc. Disord. 8, 6 (2008).
32
Preisig M., et al., The PsyCoLaus study: Methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).
33
Kooner J. S., et al., Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nat. Genet. 40, 149 (2008).
34
Ling H., et al., Genome-wide linkage and association analyses to identify genes influencing adiponectin levels: the GEMS Study. Obesity (Silver Spring) 17, 737 (2009).
35
Wyszynski D. F., et al., Relation between atherogenic dyslipidemia and the Adult Treatment Program-III definition of metabolic syndrome (Genetic Epidemiology of Metabolic Syndrome Project). Am. J. Cardiol. 95, 194 (2005).
36
Assimes T. L., et al.Myocardial Infarction Genetics Consortium,Wellcome Trust Case Control Consortium,Cardiogenics, Lack of association between the Trp719Arg polymorphism in kinesin-like protein-6 and coronary artery disease in 19 case-control studies. J. Am. Coll. Cardiol. 56, 1552 (2010).
37
Kraus V. B., et al., The Genetics of Generalized Osteoarthritis (GOGO) study: Study design and evaluation of osteoarthritis phenotypes. Osteoarthritis Cartilage 15, 120 (2007).
38
Vignal C., et al., Genetic association of the major histocompatibility complex with rheumatoid arthritis implicates two non-DRB1 loci. Arthritis Rheum. 60, 53 (2009).
39
Baranzini S. E., et al., Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis. Hum. Mol. Genet. 18, 767 (2009).
40
Oksenberg J. R., et al., Mapping multiple sclerosis susceptibility to the HLA-DR locus in African Americans. Am. J. Hum. Genet. 74, 160 (2004).
41
Cree B. A., et al., Modification of multiple sclerosis phenotypes by African ancestry at HLA. Arch. Neurol. 66, 226 (2009).
42
Patterson N., et al., Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979 (2004).
43
Heinzen E. L., et al., Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707 (2010).
44
Kasperaviciūte D., et al., Common genetic variation and susceptibility to partial epilepsies: a genome-wide association study. Brain 133, 2136 (2010).
45
Li H., et al., Candidate single-nucleotide polymorphisms from a genomewide association study of Alzheimer disease. Arch. Neurol. 65, 45 (2008).
46
Muglia P., et al., Genome-wide association study of recurrent major depressive disorder in two European case-control cohorts. Mol. Psychiatry 15, 589 (2010).
47
Francks C., et al., Population-based linkage analysis of schizophrenia and bipolar case-control cohorts identifies a potential susceptibility locus on 19q13. Mol. Psychiatry 15, 319 (2010).
48
Vestbo J., et al.ECLIPSE investigators, Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE). Eur. Respir. J. 31, 869 (2008).
49
Pillai S. G., et al.ICGN Investigators, A genome-wide association study in chronic obstructive pulmonary disease (COPD): Identification of two major susceptibility loci. PLoS Genet. 5, e1000421 (2009).
50
Ashburn T. T., Thor K. B., Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673 (2004).
51
Lussier Y. A., Chen J. L., The emergence of genome-based drug repositioning. Sci. Transl. Med. 3, 96ps35 (2011).
52
Harrow J., et al., GENCODE: Producing a reference annotation for ENCODE. Genome Biol. 7, (Suppl 1), S4, 1 (2006).
53
Ashburner M., et al.The Gene Ontology Consortium, Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25 (2000).
54
Li R., Li Y., Kristiansen K., Wang J., SOAP: Short oligonucleotide alignment program. Bioinformatics 24, 713 (2008).
55
Li R., et al., SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124 (2009).
56
Heid I. M., et al., Estimating the single nucleotide polymorphism genotype misclassification from routine double measurements in a large epidemiologic sample. Am. J. Epidemiol. 168, 878 (2008).
57
Saunders I. W., Brohede J., Hannan G. N., Estimating genotyping error rates from Mendelian errors in SNP array genotypes and their impact on inference. Genomics 90, 291 (2007).
58
Ihaka R., Gentleman R., J. Comput. Graph. Statist. 5, 299 (1996).
59
Ramensky V., Bork P., Sunyaev S., Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 30, 3894 (2002).
60
Ng P. C., Henikoff S., Predicting deleterious amino acid substitutions. Genome Res. 11, 863 (2001).
61
Stabenau A., et al., The Ensembl core software libraries. Genome Res. 14, 929 (2004).
62
Siepel A., et al., Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034 (2005).
63
Blekhman R., et al., Natural selection on genes that underlie human disease susceptibility. Curr. Biol. 18, 883 (2008).
64
MacArthur D. G., Tyler-Smith C., Loss-of-function variants in the genomes of healthy humans. Hum. Mol. Genet. 19, (R2), R125 (2010).
65
Szpiech Z. A., Jakobsson M., Rosenberg N. A., ADZE: A rarefaction approach for counting alleles private to combinations of populations. Bioinformatics 24, 2498 (2008).
66
Watterson G. A., On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256 (1975).
67
Ebersberger I., Metzler D., Schwarz C., Pääbo S., Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70, 1490 (2002).
68
Nachman M. W., Crowell S. L., Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297 (2000).
69
Schaffner S. F., et al., Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576 (2005).
70
Excoffier L., Foll M., fastsimcoal: A continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332 (2011).
71
Coffey A. J., et al., The GENCODE exome: Sequencing the complete human exome. Eur. J. Hum. Genet. 19, 827 (2011).
72
Nelson M. R., et al., The Population Reference Sample, POPRES: A resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347 (2008).
73
Bacanu S. A., Whittaker J. C., Nelson M. R., How informative is a negative finding in a small pharmacogenetic study? Pharmacogenomics J. 12, 93 (2012).
74
Hunter S., et al., InterPro: The integrative protein signature database. Nucleic Acids Res. 37, (Database issue), D211 (2009).
75
Blake J. A., Bult C. J., Kadin J. A., Richardson J. E., Eppig, J. T.Mouse Genome Database Group, The Mouse Genome Database (MGD): Premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, (Database issue), D842 (2011).
76
Wang J. H., et al.Australian and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene), Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data. Genome Med. 3, 3 (2011).
77
Wang K. S., Liu X. F., Aragam N., A genome-wide meta-analysis identifies novel loci associated with schizophrenia and bipolar disorder. Schizophr. Res. 124, 192 (2010).
78
Cho M. H., et al., Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nat. Genet. 42, 200 (2010).
79
Lettre G., et al., Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genet. 7, e1001300 (2011).
80
Terracciano A., et al., Genome-wide association scan of trait depression. Biol. Psychiatry 68, 811 (2010).
81
De Jager P. L., et al.International MS Genetics Consortium, Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat. Genet. 41, 776 (2009).
82
Hafler D. A., et al.International Multiple Sclerosis Genetics Consortium, Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851 (2007).
83
Bahlo M., et al.Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene), Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat. Genet. 41, 824 (2009).
84
Sullivan P. F., et al., Genomewide association for schizophrenia in the CATIE study: Results of stage 1. Mol. Psychiatry 13, 570 (2008).
85
Aouizerat B. E., et al., GWAS for discovery and replication of genetic loci associated with sudden cardiac arrest in patients with coronary artery disease. BMC Cardiovasc. Disord. 11, 29 (2011).
Information & Authors
Information
Published In

Science
Volume 337 | Issue 6090
6 July 2012
6 July 2012
Copyright
Copyright © 2012, American Association for the Advancement of Science.
Article versions
You are viewing the most recent version of this article.
Submission history
Received: 14 December 2011
Accepted: 3 May 2012
Published in print: 6 July 2012
Acknowledgments
We thank GSK colleagues who advised on the selection of genes and collections, especially W. Anderson, L. Condreay, P. Agarwal, A. Hughes, J. Rubio, C. Spraggs, and D. Waterworth; the sample preparation team, especially J. Charnecki, M. E. Volk, D. Duran, D. Briley, and K. King for data preparation; A. Slater for subject selection and preparation of genome-wide genotype data; E. Woldu for capillary sequencing; A. Nelsen, S. Buhta-Halburnt, L. Amos, and J. Forte for consent review; M. Lawson for assistance in running the association analyses; J. Brown for discussions about gene feature analyses; S. Ghosh for providing reviews of the manuscript; and G. Tian, H. Jiang, Z. Su, X. Sun, L. Yang, and X. Zhang at BGI for sequencing. We acknowledge the work of collaborating clinicians and researchers who contributed to recruiting and characterizing subjects (13). J.N. and D.W. were supported by a Searle Scholars Program award to J.N. D.K. is supported by a NIH Genome Analysis training grant. All variants described in this study have been submitted to dbSNP; accession nos. are included in database S2. Subject-level sequence data for CoLaus and LOLIPOP studies are available in dbGaP. Additional subject-level sequence data can be made available upon request from the authors under a Data Transfer Agreement for the purpose to understand, assess, or extend the conclusions of this paper.
Authors
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
Cited by
- Predicting Genetic Variation Severity Using Machine Learning to Interpret Molecular Simulations, Biophysical Journal, 120, 2, (189-204), (2021).https://doi.org/10.1016/j.bpj.2020.12.002
- Human genetic variants disrupt RGS14 nuclear shuttling and regulation of LTP in hippocampal neurons, Journal of Biological Chemistry, 296, (100024), (2021).https://doi.org/10.1074/jbc.RA120.016009
- Pharmacogenomics in the era of next generation sequencing – from byte to bedside, Drug Metabolism Reviews, 53, 2, (253-278), (2021).https://doi.org/10.1080/03602532.2021.1909613
- ‘More than a box of puzzles’: Understanding the parental experience of having a child with a rare genetic condition", European Journal of Medical Genetics, 64, 4, (104164), (2021).https://doi.org/10.1016/j.ejmg.2021.104164
- Emerging roles of rare and low-frequency genetic variants in type 1 diabetes mellitus, Journal of Medical Genetics, 58, 5, (289-296), (2021).https://doi.org/10.1136/jmedgenet-2020-107350
- Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges, Informatics in Medicine Unlocked, 24, (100586), (2021).https://doi.org/10.1016/j.imu.2021.100586
- Highly diverse and rapidly spreading: Melanagromyza sojae threatens the soybean belt of South America, Biological Invasions, 23, 5, (1405-1423), (2021).https://doi.org/10.1007/s10530-020-02447-7
- Rare Functional Variants Associated with Antidepressant Remission in Mexican-Americans, Journal of Affective Disorders, 279, (491-500), (2021).https://doi.org/10.1016/j.jad.2020.10.027
- Whole Exome for the Identification of Mutations in CD8+ T-Cells, Cytotoxic T-Cells, (155-182), (2021).https://doi.org/10.1007/978-1-0716-1507-2_11
- The Counteracting Effects of Demography on Functional Genomic Variation: The Roma Paradigm, Molecular Biology and Evolution, 38, 7, (2804-2817), (2021).https://doi.org/10.1093/molbev/msab070
- See more
Loading...
View Options
Get Access
Log in to view the full text
AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your AAAS ID
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Register for free to read this article
As a service to the community, this article is available for free. Login or register for free to read this article.
Buy a single issue of Science for just $15 USD.
View options
PDF format
Download this article as a PDF file
Download PDF





