Long-read sequence assembly of the gorilla genome
Improving on the gorilla genome
Access to complete, high-quality genomes of nonhuman primates will also help us understand human biology. Gordon et al. used long-read sequencing technology to improve genome data on our close relative the gorilla. Sequencing from a single individual decreased assembly fragmentation and recovered previously missed genes and noncoding loci. Mapping short-read sequences from additional gorillas helped reconstruct a “pan” gorilla sequence documenting genetic variation. Comparison with human genomes revealed species-specific differences ranging in size from one to thousands of bases in length, including some that are likely to affect gene regulation.
Science, this issue p. 10.1126/science.aae0344
Structured Abstract
INTRODUCTION
The accurate sequence and assembly of genomes is critical to our understanding of evolution and genetic variation. Despite advances in short-read sequencing technology that have decreased cost and increased throughput, whole-genome assembly of mammalian genomes remains problematic because of the presence of repetitive DNA.
RATIONALE
The goal of this study was to sequence and assemble the genome of the western lowland gorilla by using primarily single-molecule, real-time (SMRT) sequencing technology and a novel assembly algorithm that takes advantage of long (>10 kbp) sequence reads. We specifically compare the properties of this assembly to gorilla genome assemblies that were generated by using more routine short sequence read approaches in order to determine the value and biological impact of a long-read genome assembly.
RESULTS
We generated 74.8-fold SMRT whole-genome shotgun sequence from peripheral blood DNA isolated from a western lowland gorilla (Gorilla gorilla gorilla) named Susie. We applied a string graph assembly algorithm, Falcon, and consensus algorithm, Quiver, to generate a 3.1-Gbp assembly with a contig N50 of 9.6 Mbp. Short-read sequence data from an additional six gorilla genomes was mapped so as to reduce indel errors and improve the accuracy of the final assembly. We estimate that 98.9% of the gorilla euchromatin has been assembled into 1854 sequence contigs. The assembly represents an improvement in contiguity: >800-fold with respect to the published gorilla genome assembly and >180-fold with respect to a more recently released upgrade of the gorilla assembly. Most of the sequence gaps are now closed, considerably increasing the yield of complete gene models. We estimate that 87% of the missing exons and 94% of the incomplete genes are recovered. We find that the sequence of most full-length common repeats is resolved, with the most significant gains occurring for the longest and most G+C–rich retrotransposons. Although complex regions such as the major histocompatibility locus are accurately sequenced and assembled, both heterochromatin and large, high-identity segmental duplications are not because read lengths are insufficiently long to traverse these repetitive structures. The long-read assembly produces a much finer map of structural variation down to 50 bp in length, facilitating the discovery of thousands of lineage-specific structural variant differences that have occurred since divergence from the human and chimpanzee lineages. This includes the disruption of specific genes and loss of predicted regulatory regions between the two species. We show that use of the new gorilla genome assembly changes estimates of divergence and diversity, resulting in subtle but substantial effects on previous population genetic inferences, such as the timing of species bottlenecks and changes in the effective population size over the course of evolution.
CONCLUSION
The genome assembly that results from using the long-read data provides a more complete picture of gene content, structural variation, and repeat biology, improving population genetic and evolutionary inferences. Long-read sequencing technology now makes it practical for individual laboratories to generate high-quality reference genomes for complex mammalian genomes.

Long-read sequence assembly of the gorilla genome.
(A) Susie, a female Western lowland gorilla, was used as the reference sample for full-genome sequencing and assembly [photograph courtesy of Max Block]. (B and C) A treemaps representing the differences in fragmentation of the long-read and short-read gorilla genome assemblies. The rectangles are the largest contigs that cumulatively make up 300 Mbp (~10%) of the assembly.
Abstract
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.
Get full access to this article
View all available purchase options and get full access to this article.
Already a Subscriber?Sign In
Supplementary Material
Summary
Supplementary Text
Figs. S1 to S63
Tables S1 to S36
Resources
References and Notes
1
Lam H. Y. K., Clark M. J., Chen R., Chen R., Natsoulis G., O’Huallachain M., Dewey F. E., Habegger L., Ashley E. A., Gerstein M. B., Butte A. J., Ji H. P., and Snyder M., Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2012).
2
Rogers J. and Gibbs R. A., Comparative primate genomics: Emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347–359 (2014).
3
Chaisson M. J. P., Wilson R. K., and Eichler E. E., Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet. 16, 627–640 (2015).
4
Scally A., Dutheil J. Y., Hillier L. W., Jordan G. E., Goodhead I., Herrero J., Hobolth A., Lappalainen T., Mailund T., Marques-Bonet T., McCarthy S., Montgomery S. H., Schwalie P. C., Tang Y. A., Ward M. C., Xue Y., Yngvadottir B., Alkan C., Andersen L. N., Ayub Q., Ball E. V., Beal K., Bradley B. J., Chen Y., Clee C. M., Fitzgerald S., Graves T. A., Gu Y., Heath P., Heger A., Karakoc E., Kolb-Kokocinski A., Laird G. K., Lunter G., Meader S., Mort M., Mullikin J. C., Munch K., O’Connor T. D., Phillips A. D., Prado-Martinez J., Rogers A. S., Sajjadian S., Schmidt D., Shaw K., Simpson J. T., Stenson P. D., Turner D. J., Vigilant L., Vilella A. J., Whitener W., Zhu B., Cooper D. N., de Jong P., Dermitzakis E. T., Eichler E. E., Flicek P., Goldman N., Mundy N. I., Ning Z., Odom D. T., Ponting C. P., Quail M. A., Ryder O. A., Searle S. M., Warren W. C., Wilson R. K., Schierup M. H., Rogers J., Tyler-Smith C., and Durbin R., Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).
5
Carbone L., Alan Harris R., Gnerre S., Veeramah K. R., Lorente-Galdos B., Huddleston J., Meyer T. J., Herrero J., Roos C., Aken B., Anaclerio F., Archidiacono N., Baker C., Barrell D., Batzer M. A., Beal K., Blancher A., Bohrson C. L., Brameier M., Campbell M. S., Capozzi O., Casola C., Chiatante G., Cree A., Damert A., de Jong P. J., Dumas L., Fernandez-Callejo M., Flicek P., Fuchs N. V., Gut I., Gut M., Hahn M. W., Hernandez-Rodriguez J., Hillier L. D. W., Hubley R., Ianc B., Izsvák Z., Jablonski N. G., Johnstone L. M., Karimpour-Fard A., Konkel M. K., Kostka D., Lazar N. H., Lee S. L., Lewis L. R., Liu Y., Locke D. P., Mallick S., Mendez F. L., Muffato M., Nazareth L. V., Nevonen K. A., O’Bleness M., Ochis C., Odom D. T., Pollard K. S., Quilez J., Reich D., Rocchi M., Schumann G. G., Searle S., Sikela J. M., Skollar G., Smit A., Sonmez K., Hallers B., Terhune E., Thomas G. W. C., Ullmer B., Ventura M., Walker J. A., Wall J. D., Walter L., Ward M. C., Wheelan S. J., Whelan C. W., White S., Wilhelm L. J., Woerner A. E., Yandell M., Zhu B., Hammer M. F., Marques-Bonet T., Eichler E. E., Fulton L., Fronick C., Muzny D. M., Warren W. C., Worley K. C., Rogers J., Wilson R. K., and Gibbs R. A., Gibbon genome and the fast karyotype evolution of small apes. Nature 513, 195–201 (2014).
6
Berlin K., Koren S., Chin C. S., Drake J. P., Landolin J. M., and Phillippy A. M., Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).
7
Pendleton M., Sebra R., Pang A. W., Ummat A., Franzen O., Rausch T., Stütz A. M., Stedman W., Anantharaman T., Hastie A., Dai H., Fritz M. H., Cao H., Cohain A., Deikus G., Durrett R. E., Blanchard S. C., Altman R., Chin C. S., Guo Y., Paxinos E. E., Korbel J. O., Darnell R. B., McCombie W. R., Kwok P. Y., Mason C. E., Schadt E. E., and Bashir A., Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
8
Chin C.-S., Alexander D. H., Marks P., Klammer A. A., Drake J., Heiner C., Clum A., Copeland A., Huddleston J., Eichler E. E., Turner S. W., and Korlach J., Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
9
Royle N. J., Baird D. M., and Jeffreys A. J., A subterminal satellite located adjacent to telomeres in chimpanzees is absent from the human genome. Nat. Genet. 6, 52–56 (1994).
10
Prado-Martinez J., Sudmant P. H., Kidd J. M., Li H., Kelley J. L., Lorente-Galdos B., Veeramah K. R., Woerner A. E., O’Connor T. D., Santpere G., Cagan A., Theunert C., Casals F., Laayouni H., Munch K., Hobolth A., Halager A. E., Malig M., Hernandez-Rodriguez J., Hernando-Herraez I., Prüfer K., Pybus M., Johnstone L., Lachmann M., Alkan C., Twigg D., Petit N., Baker C., Hormozdiari F., Fernandez-Callejo M., Dabad M., Wilson M. L., Stevison L., Camprubí C., Carvalho T., Ruiz-Herrera A., Vives L., Mele M., Abello T., Kondova I., Bontrop R. E., Pusey A., Lankester F., Kiyang J. A., Bergl R. A., Lonsdorf E., Myers S., Ventura M., Gagneux P., Comas D., Siegismund H., Blanc J., Agueda-Calpena L., Gut M., Fulton L., Tishkoff S. A., Mullikin J. C., Wilson R. K., Gut I. G., Gonder M. K., Ryder O. A., Hahn B. H., Navarro A., Akey J. M., Bertranpetit J., Reich D., Mailund T., Schierup M. H., Hvilsom C., Andrés A. M., Wall J. D., Bustamante C. D., Hammer M. F., Eichler E. E., and Marques-Bonet T., Great ape genetic diversity and population history. Nature 499, 471–475 (2013).
11
O’Leary N. A., Wright M. W., Brister J. R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D., Astashyn A., Badretdin A., Bao Y., Blinkova O., Brover V., Chetvernin V., Choi J., Cox E., Ermolaeva O., Farrell C. M., Goldfarb T., Gupta T., Haft D., Hatcher E., Hlavina W., Joardar V. S., Kodali V. K., Li W., Maglott D., Masterson P., McGarvey K. M., Murphy M. R., O’Neill K., Pujar S., Rangwala S. H., Rausch D., Riddick L. D., Schoch C., Shkeda A., Storz S. S., Sun H., Thibaud-Nissen F., Tolstoy I., Tully R. E., Vatsan A. R., Wallin C., Webb D., Wu W., Landrum M. J., Kimchi A., Tatusova T., DiCuccio M., Kitts P., Murphy T. D., and Pruitt K. D., Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44 (D1), D733–D745 (2016).
12
Harrow J., Frankish A., Gonzalez J. M., Tapanari E., Diekhans M., Kokocinski F., Aken B. L., Barrell D., Zadissa A., Searle S., Barnes I., Bignell A., Boychenko V., Hunt T., Kay M., Mukherjee G., Rajan J., Despacio-Reyes G., Saunders G., Steward C., Harte R., Lin M., Howald C., Tanzer A., Derrien T., Chrast J., Walters N., Balasubramanian S., Pei B., Tress M., Rodriguez J. M., Ezkurdia I., van Baren J., Brent M., Haussler D., Kellis M., Valencia A., Reymond A., Gerstein M., Guigó R., and Hubbard T. J., GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
13
Stanke M., Diekhans M., Baertsch R., and Haussler D., Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
14
Siepel A., Diekhans M., Brejová B., Langton L., Stevens M., Comstock C. L., Davis C., Ewing B., Oommen S., Lau C., Yu H. C., Li J., Roe B. A., Green P., Gerhard D. S., Temple G., Haussler D., and Brent M. R., Targeted discovery of novel human exons by comparative genomics. Genome Res. 17, 1763–1773 (2007).
15
Zhu J., Sanborn J. Z., Diekhans M., Lowe C. B., Pringle T. H., and Haussler D., Comparative genomics search for losses of long-established genes on the human lineage. PLOS Comput. Biol. 3, e247 (2007).
16
Ventura M., Catacchio C. R., Alkan C., Marques-Bonet T., Sajjadian S., Graves T. A., Hormozdiari F., Navarro A., Malig M., Baker C., Lee C., Turner E. H., Chen L., Kidd J. M., Archidiacono N., Shendure J., Wilson R. K., and Eichler E. E., Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee. Genome Res. 21, 1640–1649 (2011).
17
Sudmant P. H., Huddleston J., Catacchio C. R., Malig M., Hillier L. W., Baker C., Mohajeri K., Kondova I., Bontrop R. E., Persengiev S., Antonacci F., Ventura M., Prado-Martinez J., Marques-Bonet T., Eichler E. E., and Great Ape Genome Project, Evolution and diversity of copy number variation in the great ape lineage. Genome Res. 23, 1373–1382 (2013).
18
Yohn C. T., Jiang Z., McGrath S. D., Hayden K. E., Khaitovich P., Johnson M. E., Eichler M. Y., McPherson J. D., Zhao S., Pääbo S., and Eichler E. E., Lineage-specific expansions of retroviral insertions within the genomes of African great apes but not humans and orangutans. PLOS Biol. 3, e110 (2005).
19
Polavarapu N., Bowen N. J., and McDonald J. F., Identification, characterization and comparative genomics of chimpanzee endogenous retroviruses. Genome Biol. 7, R51 (2006).
20
Xue Y., Prado-Martinez J., Sudmant P. H., Narasimhan V., Ayub Q., Szpak M., Frandsen P., Chen Y., Yngvadottir B., Cooper D. N., de Manuel M., Hernandez-Rodriguez J., Lobon I., Siegismund H. R., Pagani L., Quail M. A., Hvilsom C., Mudakikwa A., Eichler E. E., Cranfield M. R., Marques-Bonet T., Tyler-Smith C., and Scally A., Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science 348, 242–245 (2015).
21
Petrovski S., Wang Q., Heinzen E. L., Allen A. S., and Goldstein D. B., Genic intolerance to functional variation and the interpretation of personal genomes. PLOS Genet. 9, e1003709 (2013).
22
Little A. M. and Parham P., Polymorphism and evolution of HLA class I and II genes and molecules. Rev. Immunogenet. 1, 105–123 (1999).
23
Myers E. W., Efficient local alignment discovery amongst noisy long reads. Lect. Notes Comput. Sci. 8701, 52–67 (2014).
24
Myers E. W., The fragment assembly string graph. Bioinformatics 21 (suppl. 2), ii79–ii85 (2005).
25
Pevzner P. A., Tang H., and Tesler G., De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796 (2004).
26
Brawand D., Soumillon M., Necsulea A., Julien P., Csárdi G., Harrigan P., Weier M., Liechti A., Aximu-Petri A., Kircher M., Albert F. W., Zeller U., Khaitovich P., Grützner F., Bergmann S., Nielsen R., Pääbo S., and Kaessmann H., The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011).
27
Necsulea A., Soumillon M., Warnefors M., Liechti A., Daish T., Zeller U., Baker J. C., Grützner F., and Kaessmann H., The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).
28
A. Smit, R. Hubley, P. Green, RepeatMasker Open-3.0 (1996); available at www.repeatmasker.org.
29
Benson G., Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
30
Chaisson M. J. and Tesler G., Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): Application and theory. BMC Bioinformatics 13, 238 (2012).
31
Parsons J. D., Miropeats: Graphical DNA sequence comparisons. Comput. Appl. Biosci. 11, 615–619 (1995).
32
H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, http://arxiv.org/abs/1303.3997 (2013).
33
E. Garrison, G. Marth, Haplotype-based variant detection from short-read sequencing, http://arxiv.org/abs/1207.3907 (2012).
34
Paten B., Earl D., Nguyen N., Diekhans M., Zerbino D., and Haussler D., Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011).
35
Bailey J. A., Gu Z., Clark R. A., Reinert K., Samonte R. V., Schwartz S., Adams M. D., Myers E. W., Li P. W., and Eichler E. E., Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
36
Phillippy A. M., Schatz M. C., and Pop M., Genome assembly forensics: Finding the elusive mis-assembly. Genome Biol. 9, R55 (2008).
37
Boetzer M., Henkel C. V., Jansen H. J., Butler D., and Pirovano W., Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).
38
Gordon D. and Green P., Consed: A graphical editor for next-generation sequencing. Bioinformatics 29, 2936–2937 (2013).
39
Delcher A. L., Phillippy A., Carlton J., and Salzberg S. L., Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).
40
Yunis J. J. and Prakash O., The origin of man: A chromosomal pictorial legacy. Science 215, 1525–1530 (1982).
41
Stanyon R., Rocchi M., Capozzi O., Roberto R., Misceo D., Ventura M., Cardone M. F., Bigoni F., and Archidiacono N., Primate chromosome evolution: Ancestral karyotypes, marker order and neocentromeres. Chromosome Res. 16, 17–39 (2008).
42
Grabherr M. G., Haas B. J., Yassour M., Levin J. Z., Thompson D. A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B. W., Nusbaum C., Lindblad-Toh K., Friedman N., and Regev A., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
43
Dobin A., Davis C. A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., and Gingeras T. R., STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
44
Kent W. J., BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
45
Tan T. Y., Gordon C. T., Miller K. A., Amor D. J., and Farlie P. G., YPEL1 overexpression in early avian craniofacial mesenchyme causes mandibular dysmorphogenesis by up-regulating apoptosis. Dev. Dyn. 244, 1022–1030 (2015).
46
Chaisson M. J. P., Huddleston J., Dennis M. Y., Sudmant P. H., Malig M., Hormozdiari F., Antonacci F., Surti U., Sandstrom R., Boitano M., Landolin J. M., Stamatoyannopoulos J. A., Hunkapiller M. W., Korlach J., and Eichler E. E., Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
47
Xu W. and Beutler E., The characterization of gene mutations for human glucose phosphate isomerase deficiency associated with chronic hemolytic anemia. J. Clin. Invest. 94, 2326–2329 (1994).
48
Kirov A., Kacer D., Conley B. A., Vary C. P. H., and Prudovsky I., AHNAK2 Participates in the Stress-Induced Nonclassical FGF1 Secretion Pathway. J. Cell. Biochem. 116, 1522–1531 (2015).
49
Horakova A. H., Moseley S. C., McLaughlin C. R., Tremblay D. C., and Chadwick B. P., The macrosatellite DXZ4 mediates CTCF-dependent long-range intrachromosomal interactions on the human inactive X chromosome. Hum. Mol. Genet. 21, 4367–4377 (2012).
50
Bailey J. A., Yavor A. M., Massa H. F., Trask B. J., and Eichler E. E., Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).
51
Sudmant P. H., Kitzman J. O., Antonacci F., Alkan C., Malig M., Tsalenko A., Sampas N., Bruhn L., Shendure J., Eichler E. E., and 1000 Genomes Project, Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
52
Hormozdiari F., Konkel M. K., Prado-Martinez J., Chiatante G., Herraez I. H., Walker J. A., Nelson B., Alkan C., Sudmant P. H., Huddleston J., Catacchio C. R., Ko A., Malig M., Baker C., Marques-Bonet T., Ventura M., Batzer M. A., Eichler E. E., and Great Ape Genome Project, Rates and patterns of great ape retrotransposition. Proc. Natl. Acad. Sci. U.S.A. 110, 13457–13462 (2013).
53
Chaisson M. J., Raphael B. J., and Pevzner P. A., Microinversions in mammalian evolution. Proc. Natl. Acad. Sci. U.S.A. 103, 19824–19829 (2006).
54
Lee J., Han K., Meyer T. J., Kim H.-S., and Batzer M. A., Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLOS ONE 3, e4047 (2008).
55
Chou H.-H., Takematsu H., Diaz S., Iber J., Nickerson E., Wright K. L., Muchmore E. A., Nelson D. L., Warren S. T., and Varki A., A mutation in human CMP-sialic acid hydroxylase occurred after the Homo-Pan divergence. Proc. Natl. Acad. Sci. U.S.A. 95, 11751–11756 (1998).
56
Turner T. N., Hormozdiari F., Duyzend M. H., McClymont S. A., Hook P. W., Iossifov I., Raja A., Baker C., Hoekzema K., Stessman H. A., Zody M. C., Nelson B. J., Huddleston J., Sandstrom R., Smith J. D., Hanna D., Swanson J. M., Faustman E. M., Bamshad M. J., Stamatoyannopoulos J., Nickerson D. A., McCallion A. S., Darnell R., and Eichler E. E., Genome sequencing of autism-affected families reveals disruption of putative noncoding regulatory DNA. Am. J. Hum. Genet. 98, 58–74 (2016).
57
Huang W., Sherman B. T., and Lempicki R. A., Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
58
Huang W., Sherman B. T., and Lempicki R. A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
59
Favorov A., Mularoni L., Cope L. M., Medvedeva Y., Mironov A. A., Makeev V. J., and Wheelan S. J., Exploring massive, genome scale datasets with the GenometriCorr package. PLOS Comput. Biol. 8, e1002529 (2012).
60
R. S. Harris, thesis, ProQuest (2007).
61
C. Bromberg, Sequencher: Version 4.1. 2. Gene Codes Corporation. (1995).
62
Li H., Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
63
McManus K. F., Kelley J. L., Song S., Veeramah K. R., Woerner A. E., Stevison L. S., Ryder O. A., Ape Genome Project G., Kidd J. M., Wall J. D., Bustamante C. D., and Hammer M. F., Inference of gorilla demographic and selective history from whole-genome sequence data. Mol. Biol. Evol. 32, 600–612 (2015).
64
Leendertz F. H., Yumlu S., Pauli G., Boesch C., Couacy-Hymann E., Vigilant L., Junglen S., Schenk S., and Ellerbrok H., A new Bacillus anthracis found in wild chimpanzees and a gorilla from West and Central Africa. PLOS Pathog. 2, e8 (2006).
65
Le Gouar P. J., Vallet D., David L., Bermejo M., Gatti S., Levréro F., Petit E. J., and Ménard N., How Ebola impacts genetics of Western lowland gorilla populations. PLOS ONE 4, e8375 (2009).
66
Thalmann O., Fischer A., Lankester F., Pääbo S., and Vigilant L., The complex evolutionary history of gorillas: Insights from genomic data. Mol. Biol. Evol. 24, 146–158 (2007).
67
and Analysis Consortium T. C. S. and Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
Information & Authors
Information
Published In

Science
Volume 352 | Issue 6281
1 April 2016
1 April 2016
Copyright
Copyright © 2016, American Association for the Advancement of Science.
Submission history
Received: 7 December 2015
Accepted: 26 February 2016
Published in print: 1 April 2016
Acknowledgments
We are grateful to A. Scally and Z. Ning for early access to the upgraded Kamilah gorilla assembly (gorGor4) and for discussion regarding its assembly. We thank M. Duyzend, L. Harshman, and C. Lee for technical assistance and quality control in generating sequencing data and H. Li for helpful suggestions for the PSMC analysis. The authors thank M. Heget, K. Gillespie, and M. Shender from the Lincoln Park Zoo for providing gorilla peripheral blood and T. Brown for assistance in editing this manuscript. This work was supported, in part, by grants from the U.S. National Institutes of Health (NIH grant HG002385 to E.E.E. and HG007635 to R.K.W. and E.E.E.; HG003079 to R.K.W.; HG007990 to D.H. and B.P.; and HG007234 to B.P.). E.E.E., J.S., and D.H. are investigators of the Howard Hughes Medical Institute. E.E.E. is on the scientific advisory board (SAB) of DNAnexus and was a SAB member of Pacific Biosciences. (2009–2013); E.E.E. is a consultant for Kunming University of Science and Technology (KUST) as part of the 1000 China Talent Program. M.J.P.C. is a former employee of (2009–2012) and owns shares in Pacific Biosciences. On 24 February 2011, Pacific Biosciences filed a patent entitled “Sequence assembly and consensus sequence determination” (U.S. patent no. US20120330566, issued 27 December 2012); M.J.P.C. is identified as inventor of this patent. Pacific Biosciences has filed two patents related to the Falcon assembler algorithm entitled “String graph assembly for polyploid genomes” (U.S. patent no. US2015/0169823 A1 filed 18 December 2014, and U.S. patent no. US2015/0286775 A1 filed 18 June 2015); C.C. is identified as inventor for both patents. The Susie3 assembly, PacBio and Illumina sequencing data for Susie, and clone sequences have been deposited in the European Nucleotide Archive under the project accession PRJEB10880. E.E.E., D.G., J.H., M.J.P.C., C.M.H., and Z.N.K. designed experiments; K.M.M., M.M., and C.B. prepared libraries and generated sequencing data; D.G., J.H., M.J.P.C., C.M.H., Z.N.K., L.W.H., and A.R. performed bioinformatics analyses; I.F., J.A., M.D., B.P., R.K.W., and D.H. analyzed gene accuracy. J.S. helped in the evaluation of Hi-C data. C.D. and C.-S.C. aided in Falcon assembler modifications. J.H. deposited SMRT sequencing data into SRA. E.E.E., D.G., J.H., M.J.P.C., C.M.H., and Z.N.K. wrote the manuscript.
Authors
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
Cited by
- Haplotype-resolved diverse human genomes and integrated analysis of structural variation, Science, 372, 6537, (2021)./doi/10.1126/science.abf7117
- High-resolution comparative analysis of great ape genomes, Science, 360, 6393, (2021)./doi/10.1126/science.aar6343
- Neanderthal-Denisovan ancestors interbred with a distantly related hominin, Science Advances, 6, 8, (2020)./doi/10.1126/sciadv.aay5483
Loading...
View Options
Get Access
Log in to view the full text
AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your AAAS ID
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Register for free to read this article
As a service to the community, this article is available for free. Login or register for free to read this article.
Buy a single issue of Science for just $15 USD.
View options
PDF format
Download this article as a PDF file
Download PDF





