Advertisement

A Deep Look Into Our Genes

Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit). Tennessen et al. (p. 64, published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100, published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health.

Abstract

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
Get full access to this article

View all available purchase options and get full access to this article.

Already a Subscriber?

Supplementary Material

Summary

Materials and Methods
Supplementary Text
Figs. S1 to S15
Tables S1 to S17
References (3085)
Databases S1 to S3

Resources

File (nelson.sm.pdf)
File (nelson_science_data_table_s1.zip)
File (nelson_science_data_table_s2.zip)
File (nelson_science_data_table_s3.zip)

References and Notes

1
Pritchard J. K., Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124 (2001).
2
Kryukov G. V., Pennacchio L. A., Sunyaev S. R., Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727 (2007).
3
Marth G. T., et al.1000 Genomes Project, The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011).
4
Manolio T. A., et al., Finding the missing heritability of complex diseases. Nature 461, 747 (2009).
5
Eichler E. E., et al., Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446 (2010).
6
Asimit J., Zeggini E., Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 44, 293 (2010).
7
Gravel S., et al.1000 Genomes Project, Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. U.S.A. 108, 11983 (2011).
8
Coventry A., et al., Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1, 131 (2010).
9
Ohta T., Slightly deleterious mutant substitutions in evolution. Nature 246, 96 (1973).
10
Williamson S. H., et al., Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. U.S.A. 102, 7882 (2005).
11
Muller H. J., Our load of mutations. Am. J. Hum. Genet. 2, 111 (1950).
12
Russ A. P., Lampel S., The druggable genome: An update. Drug Discov. Today 10, 1607 (2005).
13
Materials and methods are available as supplementary materials on Science Online.
14
M. A. Jobling, M. Hurles, C. Tyler-Smith, Human Evolutionary Genetics: Origins, Peoples and Disease (Garland Science, 2003).
15
M. Livi-Bacci, A Concise History of World Population (Wiley-Blackwell, ed. 2, 2007), pp. 1 to 250.
16
Wakeley J., Takahashi T., Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20, 208 (2003).
17
Awadalla P., et al., Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am. J. Hum. Genet. 87, 316 (2010).
18
Conrad D. F., et al.1000 Genomes Project, Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712 (2011).
19
Messer P. W., Measuring the rates of spontaneous mutation from deep and large-scale polymorphism data. Genetics 182, 1219 (2009).
20
Price A. L., et al., Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832 (2010).
21
Kotowski I. K., et al., A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. Am. J. Hum. Genet. 78, 410 (2006).
22
Hindorff L. A., et al., Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U.S.A. 106, 9362 (2009).
23
Salmela E., et al., Genome-wide analysis of single nucleotide polymorphisms uncovers population structure in Northern Europe. PLoS ONE 3, e3519 (2008).
24
Bustamante C. D., Burchard E. G., De la Vega F. M., Genomics for the world. Nature 475, 163 (2011).
25
Lao O., et al., Correlation between genetic and geographic structure in Europe. Curr. Biol. 18, 1241 (2008).
26
Lander E. S., Schork N. J., Genetic dissection of complex traits. Science 265, 2037 (1994).
27
Akey J. M., et al., Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2, e286 (2004).
29
Ahituv N., et al., Medical sequencing at the extremes of human body mass. Am. J. Hum. Genet. 80, 779 (2007).
30
Durbin R. M., et al., A map of human genome variation from population-scale sequencing. Nature 467, 1061 (2010).
31
Firmann M., et al., The CoLaus study: A population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc. Disord. 8, 6 (2008).
32
Preisig M., et al., The PsyCoLaus study: Methodology and characteristics of the sample of a population-based survey on psychiatric disorders and their association with genetic and cardiovascular risk factors. BMC Psychiatry 9, 9 (2009).
33
Kooner J. S., et al., Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nat. Genet. 40, 149 (2008).
34
Ling H., et al., Genome-wide linkage and association analyses to identify genes influencing adiponectin levels: the GEMS Study. Obesity (Silver Spring) 17, 737 (2009).
35
Wyszynski D. F., et al., Relation between atherogenic dyslipidemia and the Adult Treatment Program-III definition of metabolic syndrome (Genetic Epidemiology of Metabolic Syndrome Project). Am. J. Cardiol. 95, 194 (2005).
36
Assimes T. L., et al.Myocardial Infarction Genetics Consortium,Wellcome Trust Case Control Consortium,Cardiogenics, Lack of association between the Trp719Arg polymorphism in kinesin-like protein-6 and coronary artery disease in 19 case-control studies. J. Am. Coll. Cardiol. 56, 1552 (2010).
37
Kraus V. B., et al., The Genetics of Generalized Osteoarthritis (GOGO) study: Study design and evaluation of osteoarthritis phenotypes. Osteoarthritis Cartilage 15, 120 (2007).
38
Vignal C., et al., Genetic association of the major histocompatibility complex with rheumatoid arthritis implicates two non-DRB1 loci. Arthritis Rheum. 60, 53 (2009).
39
Baranzini S. E., et al., Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis. Hum. Mol. Genet. 18, 767 (2009).
40
Oksenberg J. R., et al., Mapping multiple sclerosis susceptibility to the HLA-DR locus in African Americans. Am. J. Hum. Genet. 74, 160 (2004).
41
Cree B. A., et al., Modification of multiple sclerosis phenotypes by African ancestry at HLA. Arch. Neurol. 66, 226 (2009).
42
Patterson N., et al., Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979 (2004).
43
Heinzen E. L., et al., Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707 (2010).
44
Kasperaviciūte D., et al., Common genetic variation and susceptibility to partial epilepsies: a genome-wide association study. Brain 133, 2136 (2010).
45
Li H., et al., Candidate single-nucleotide polymorphisms from a genomewide association study of Alzheimer disease. Arch. Neurol. 65, 45 (2008).
46
Muglia P., et al., Genome-wide association study of recurrent major depressive disorder in two European case-control cohorts. Mol. Psychiatry 15, 589 (2010).
47
Francks C., et al., Population-based linkage analysis of schizophrenia and bipolar case-control cohorts identifies a potential susceptibility locus on 19q13. Mol. Psychiatry 15, 319 (2010).
48
Vestbo J., et al.ECLIPSE investigators, Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE). Eur. Respir. J. 31, 869 (2008).
49
Pillai S. G., et al.ICGN Investigators, A genome-wide association study in chronic obstructive pulmonary disease (COPD): Identification of two major susceptibility loci. PLoS Genet. 5, e1000421 (2009).
50
Ashburn T. T., Thor K. B., Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673 (2004).
51
Lussier Y. A., Chen J. L., The emergence of genome-based drug repositioning. Sci. Transl. Med. 3, 96ps35 (2011).
52
Harrow J., et al., GENCODE: Producing a reference annotation for ENCODE. Genome Biol. 7, (Suppl 1), S4, 1 (2006).
53
Ashburner M., et al.The Gene Ontology Consortium, Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25 (2000).
54
Li R., Li Y., Kristiansen K., Wang J., SOAP: Short oligonucleotide alignment program. Bioinformatics 24, 713 (2008).
55
Li R., et al., SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124 (2009).
56
Heid I. M., et al., Estimating the single nucleotide polymorphism genotype misclassification from routine double measurements in a large epidemiologic sample. Am. J. Epidemiol. 168, 878 (2008).
57
Saunders I. W., Brohede J., Hannan G. N., Estimating genotyping error rates from Mendelian errors in SNP array genotypes and their impact on inference. Genomics 90, 291 (2007).
58
Ihaka R., Gentleman R., J. Comput. Graph. Statist. 5, 299 (1996).
59
Ramensky V., Bork P., Sunyaev S., Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 30, 3894 (2002).
60
Ng P. C., Henikoff S., Predicting deleterious amino acid substitutions. Genome Res. 11, 863 (2001).
61
Stabenau A., et al., The Ensembl core software libraries. Genome Res. 14, 929 (2004).
62
Siepel A., et al., Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034 (2005).
63
Blekhman R., et al., Natural selection on genes that underlie human disease susceptibility. Curr. Biol. 18, 883 (2008).
64
MacArthur D. G., Tyler-Smith C., Loss-of-function variants in the genomes of healthy humans. Hum. Mol. Genet. 19, (R2), R125 (2010).
65
Szpiech Z. A., Jakobsson M., Rosenberg N. A., ADZE: A rarefaction approach for counting alleles private to combinations of populations. Bioinformatics 24, 2498 (2008).
66
Watterson G. A., On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256 (1975).
67
Ebersberger I., Metzler D., Schwarz C., Pääbo S., Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70, 1490 (2002).
68
Nachman M. W., Crowell S. L., Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297 (2000).
69
Schaffner S. F., et al., Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576 (2005).
70
Excoffier L., Foll M., fastsimcoal: A continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332 (2011).
71
Coffey A. J., et al., The GENCODE exome: Sequencing the complete human exome. Eur. J. Hum. Genet. 19, 827 (2011).
72
Nelson M. R., et al., The Population Reference Sample, POPRES: A resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347 (2008).
73
Bacanu S. A., Whittaker J. C., Nelson M. R., How informative is a negative finding in a small pharmacogenetic study? Pharmacogenomics J. 12, 93 (2012).
74
Hunter S., et al., InterPro: The integrative protein signature database. Nucleic Acids Res. 37, (Database issue), D211 (2009).
75
Blake J. A., Bult C. J., Kadin J. A., Richardson J. E., Eppig, J. T.Mouse Genome Database Group, The Mouse Genome Database (MGD): Premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, (Database issue), D842 (2011).
76
Wang J. H., et al.Australian and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene), Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data. Genome Med. 3, 3 (2011).
77
Wang K. S., Liu X. F., Aragam N., A genome-wide meta-analysis identifies novel loci associated with schizophrenia and bipolar disorder. Schizophr. Res. 124, 192 (2010).
78
Cho M. H., et al., Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nat. Genet. 42, 200 (2010).
79
Lettre G., et al., Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genet. 7, e1001300 (2011).
80
Terracciano A., et al., Genome-wide association scan of trait depression. Biol. Psychiatry 68, 811 (2010).
81
De Jager P. L., et al.International MS Genetics Consortium, Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat. Genet. 41, 776 (2009).
82
Hafler D. A., et al.International Multiple Sclerosis Genetics Consortium, Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851 (2007).
83
Bahlo M., et al.Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene), Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat. Genet. 41, 824 (2009).
84
Sullivan P. F., et al., Genomewide association for schizophrenia in the CATIE study: Results of stage 1. Mol. Psychiatry 13, 570 (2008).
85
Aouizerat B. E., et al., GWAS for discovery and replication of genetic loci associated with sudden cardiac arrest in patients with coronary artery disease. BMC Cardiovasc. Disord. 11, 29 (2011).

Information & Authors

Information

Published In

Science
Volume 337 | Issue 6090
6 July 2012

Article versions

You are viewing the most recent version of this article.

Submission history

Received: 14 December 2011
Accepted: 3 May 2012
Published in print: 6 July 2012

Permissions

Request permissions for this article.

Acknowledgments

We thank GSK colleagues who advised on the selection of genes and collections, especially W. Anderson, L. Condreay, P. Agarwal, A. Hughes, J. Rubio, C. Spraggs, and D. Waterworth; the sample preparation team, especially J. Charnecki, M. E. Volk, D. Duran, D. Briley, and K. King for data preparation; A. Slater for subject selection and preparation of genome-wide genotype data; E. Woldu for capillary sequencing; A. Nelsen, S. Buhta-Halburnt, L. Amos, and J. Forte for consent review; M. Lawson for assistance in running the association analyses; J. Brown for discussions about gene feature analyses; S. Ghosh for providing reviews of the manuscript; and G. Tian, H. Jiang, Z. Su, X. Sun, L. Yang, and X. Zhang at BGI for sequencing. We acknowledge the work of collaborating clinicians and researchers who contributed to recruiting and characterizing subjects (13). J.N. and D.W. were supported by a Searle Scholars Program award to J.N. D.K. is supported by a NIH Genome Analysis training grant. All variants described in this study have been submitted to dbSNP; accession nos. are included in database S2. Subject-level sequence data for CoLaus and LOLIPOP studies are available in dbGaP. Additional subject-level sequence data can be made available upon request from the authors under a Data Transfer Agreement for the purpose to understand, assess, or extend the conclusions of this paper.

Authors

Affiliations

Matthew R. Nelson*, [email protected]
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Daniel Wegmann*
Department of Ecology and Evolutionary Biology, University of California–Los Angeles, Los Angeles, CA 90095, USA.
Margaret G. Ehm
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Darren Kessner
Department of Ecology and Evolutionary Biology, University of California–Los Angeles, Los Angeles, CA 90095, USA.
Pamela St. Jean
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Claudio Verzilli
Department of Quantitative Sciences, GSK, Stevenage SG1 2NY, UK.
Judong Shen
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Zhengzheng Tang
Department of Genetics and Biostatistics, University of North Carolina–Chapel Hill, Chapel Hill, NC 27599, USA.
Silviu-Alin Bacanu
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Dana Fraser
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Liling Warren
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Jennifer Aponte
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Matthew Zawistowski
Department of Biostatistics, University of Michigan–Ann Arbor, Ann Arbor, MI 48109, USA.
Xiao Liu
BGI, Shenzhen 518083, China.
Hao Zhang
BGI, Shenzhen 518083, China.
Yong Zhang
BGI, Shenzhen 518083, China.
Jun Li
Department of Human Genetics, University of Michigan–Ann Arbor, Ann Arbor, MI 48109, USA.
Yun Li
Department of Genetics and Biostatistics, University of North Carolina–Chapel Hill, Chapel Hill, NC 27599, USA.
Li Li
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Peter Woollard
Department of Quantitative Sciences, GSK, Stevenage SG1 2NY, UK.
Simon Topp
Department of Quantitative Sciences, GSK, Stevenage SG1 2NY, UK.
Matthew D. Hall
Department of Quantitative Sciences, GSK, Stevenage SG1 2NY, UK.
Keith Nangle
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Jun Wang
BGI, Shenzhen 518083, China.
Department of Biology, Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen 3393 9524, Denmark.
Gonçalo Abecasis
Department of Biostatistics, University of Michigan–Ann Arbor, Ann Arbor, MI 48109, USA.
Lon R. Cardon
Department of Quantitative Sciences, GSK, Upper Merion, PA 19406, USA.
Sebastian Zöllner
Department of Biostatistics, University of Michigan–Ann Arbor, Ann Arbor, MI 48109, USA.
Department of Psychiatry, University of Michigan–Ann Arbor, Ann Arbor, MI 48109, USA.
John C. Whittaker
Department of Quantitative Sciences, GSK, Stevenage SG1 2NY, UK.
Stephanie L. Chissoe
Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, NC 27709, USA.
Department of Ecology and Evolutionary Biology, University of California–Los Angeles, Los Angeles, CA 90095, USA.
Vincent Mooser
Department of Quantitative Sciences, GSK, Upper Merion, PA 19406, USA.

Notes

*
These authors contributed equally to this work.
†To whom correspondence should be addressed. E-mail: [email protected] (M.R.N); [email protected] (J.N.)
These authors contributed equally to this work.

Metrics & Citations

Metrics

Article Usage
Altmetrics

Citations

Export citation

Select the format you want to export the citation of this publication.

Cited by
  1. Predicting Genetic Variation Severity Using Machine Learning to Interpret Molecular Simulations, Biophysical Journal, 120, 2, (189-204), (2021).https://doi.org/10.1016/j.bpj.2020.12.002
    Crossref
  2. Human genetic variants disrupt RGS14 nuclear shuttling and regulation of LTP in hippocampal neurons, Journal of Biological Chemistry, 296, (100024), (2021).https://doi.org/10.1074/jbc.RA120.016009
    Crossref
  3. Pharmacogenomics in the era of next generation sequencing – from byte to bedside, Drug Metabolism Reviews, 53, 2, (253-278), (2021).https://doi.org/10.1080/03602532.2021.1909613
    Crossref
  4. ‘More than a box of puzzles’: Understanding the parental experience of having a child with a rare genetic condition", European Journal of Medical Genetics, 64, 4, (104164), (2021).https://doi.org/10.1016/j.ejmg.2021.104164
    Crossref
  5. Emerging roles of rare and low-frequency genetic variants in type 1 diabetes mellitus, Journal of Medical Genetics, 58, 5, (289-296), (2021).https://doi.org/10.1136/jmedgenet-2020-107350
    Crossref
  6. Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges, Informatics in Medicine Unlocked, 24, (100586), (2021).https://doi.org/10.1016/j.imu.2021.100586
    Crossref
  7. Highly diverse and rapidly spreading: Melanagromyza sojae threatens the soybean belt of South America, Biological Invasions, 23, 5, (1405-1423), (2021).https://doi.org/10.1007/s10530-020-02447-7
    Crossref
  8. Rare Functional Variants Associated with Antidepressant Remission in Mexican-Americans, Journal of Affective Disorders, 279, (491-500), (2021).https://doi.org/10.1016/j.jad.2020.10.027
    Crossref
  9. Whole Exome for the Identification of Mutations in CD8+ T-Cells, Cytotoxic T-Cells, (155-182), (2021).https://doi.org/10.1007/978-1-0716-1507-2_11
    Crossref
  10. The Counteracting Effects of Demography on Functional Genomic Variation: The Roma Paradigm, Molecular Biology and Evolution, 38, 7, (2804-2817), (2021).https://doi.org/10.1093/molbev/msab070
    Crossref
  11. See more
Loading...

View Options

Get Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.
Log in via Shibboleth.
More options

Register for free to read this article

As a service to the community, this article is available for free. Login or register for free to read this article.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Media

Figures

Multimedia

Tables

Share

Share

Share article link

Share on social media