The genetic and epigenetic landscape of the Arabidopsis centromeres
A closer look at centromeres
Centromeres are key for anchoring chromosomes to the mitotic spindle, but they have been difficult to sequence because they can contain many repeating DNA elements. These repeats, however, carry regularly spaced, distinctive sequence markers because of sequence heterogeneity between the mostly, but not completely, identical DNA sequence repeats. Such differences aid sequence assembly. Naish et al. used ultra-long-read DNA sequencing to establish a reference assembly that resolves all five centromeres in the small mustard plant Arabidopsis. Their view into the subtly homogenized world of centromeres reveals retrotransposons that interrupt centromere organization and repressive DNA methylation that excludes centromeres from meiotic crossover repair. Thus, Arabidopsis centromeres evolve under the opposing forces of sequence homogenization and retrotransposon disruption. —PJH
Structured Abstract
INTRODUCTION
The centromeres of eukaryotic chromosomes assemble the multiprotein kinetochore complex and thereby position attachment to the spindle microtubules, allowing chromosome segregation during cell division. The key function of the centromere is to load nucleosomes containing the CENTROMERE SPECIFIC HISTONE H3 (CENH3) histone variant [also known as centromere protein A (CENPA)], which directs kinetochore formation. Despite their conserved function during chromosome segregation, centromeres show radically diverse organization between species at the sequence level, ranging from single nucleosomes to megabase-scale satellite repeat arrays, which is termed the centromere paradox. Centromeric satellite repeats are variable in sequence composition and length when compared between species and show a high capacity for evolutionary change, both at the levels of primary sequence and array position along the chromosome. However, the genetic and epigenetic features that contribute to centromere function and evolution are incompletely understood, in part because of the challenges of centromere sequence assembly and functional genomics of highly repetitive sequences. New long-read DNA sequencing technologies can now resolve these complex repeat arrays, revealing insights into centromere architecture and chromatin organization.
RATIONALE
Arabidopsis thaliana is a model plant species; its genome was first sequenced in 2000, yet the centromeres, telomeres, and ribosomal DNA repeats have remained unassembled, owing to their high repetition and similarity. Genomic repeats are difficult to assemble from fragmented sequencing reads, with longer, high-identity repeats being the most challenging to correctly assemble. As sequencing reads have become longer and more accurate, eukaryotic de novo genome assemblies have captured an increasingly complete picture of the repetitive component of the genome, including the centromeres. For example, Oxford Nanopore Technologies (ONT) reads have become longer and more accurate, now reaching >100 kilo–base pairs (kbp) in length with 95 to 99% modal accuracy. PacBio high-fidelity (HiFi) reads, although shorter (~15 kbp), are >99% accurate. Using ONT and HiFi reads, it is possible to bridge across interspersed unique marker sequences and accurately assemble centromere sequences. In this study, we used long-read DNA sequencing to generate a genome assembly of the A. thaliana accession Columbia (Col-CEN) that resolves all five centromeres. We use the Col-CEN assembly to derive insights into the chromatin and recombination landscapes within the Arabidopsis centromeres and how these regions evolve.
RESULTS
The Col-CEN assembly reveals that the Arabidopsis centromeres consist of megabase-scale tandemly repeated satellite arrays, which support high CENH3 (the centromere-specific histone variant that recruits kinetochores) occupancy and are densely DNA methylated. We show patterns of higher-order repetition within centromeres and that many satellite variants are private to each chromosome, which has implications for the recombination pathways acting in the centromeres. CENH3 preferentially occupies the satellites with the least amount of divergence and that show higher-order repetition. The Arabidopsis centromeres are mainly composed of satellite repeats that are ~178 bp in length, termed the CEN180 satellites. Arabidopsis centromeres have also been invaded by ATHILA long terminal repeat–class retrotransposons, which disrupt the genetic and epigenetic organization of the centromeres. Using chromatin immunoprecipitation sequencing (ChIP-seq) and immunofluorescence, we demonstrate that the centromeres show a hybrid chromatin state that is distinct from euchromatin and heterochromatin. We show that crossover recombination is suppressed within the centromeres, yet low levels of meiotic double-strand breaks occur, which are regulated by DNA methylation. Together, our Col-CEN assembly reveals the genetic and epigenetic landscapes within the Arabidopsis centromeres.
CONCLUSION
Our Col-CEN assembly and functional genomics analysis have implications for understanding centromere sequence evolution in eukaryotes. We propose that a recombination-based homogenization process, occurring between allelic or nonallelic locations on the same chromosome, maintains the CEN180 library close to the consensus optimal for CENH3 recruitment. The advantage conferred to ATHILA retrotransposons by integration within the centromeres is presently unclear. They may be engaged in centromere drive, supporting the hypothesis that centromere satellite homogenization acts as a mechanism to purge driving elements. Each Arabidopsis centromere appears to represent different stages in cycles of satellite homogenization and ATHILA-driven diversification. These opposing forces provide both a capacity for homeostasis and a capacity for change during centromere evolution. In the future, assembly of centromeres from multiple Arabidopsis accessions and closely related species may further clarify how centromeres form and the evolutionary dynamics of CEN180 and ATHILA repeats.

Assembly of the Arabidopsis centromeres.
The structure of Arabidopsis centromere 1 is shown by fluorescence in situ hybridization (top) [upper-arm bacterial artificial chromosomes (BACs) (green), ATHILA (purple), CEN180 (blue), the telomeric repeat (green), and bottom-arm BACs (yellow)] and a long-read genome assembly (middle). The density of centromeric histone CENH3 binding measured by ChIP-seq is shown (black), alongside the frequency of CEN180 centromere satellite repeats. Red and blue represent forward- and reverse-strand satellites, respectively. The heatmap (bottom) shows patterns of sequence identity across the centromere between nonoverlapping 5-kbp windows. Chr, chromosome 1.
Abstract
Centromeres attach chromosomes to spindle microtubules during cell division and, despite this conserved role, show paradoxically rapid evolution and are typified by complex repeats. We used long-read sequencing to generate the Col-CEN Arabidopsis thaliana genome assembly that resolves all five centromeres. The centromeres consist of megabase-scale tandemly repeated satellite arrays, which support CENTROMERE SPECIFIC HISTONE H3 (CENH3) occupancy and are densely DNA methylated, with satellite variants private to each chromosome. CENH3 preferentially occupies satellites that show the least amount of divergence and occur in higher-order repeats. The centromeres are invaded by ATHILA retrotransposons, which disrupt genetic and epigenetic organization. Centromeric crossover recombination is suppressed, yet low levels of meiotic DNA double-strand breaks occur that are regulated by DNA methylation. We propose that Arabidopsis centromeres are evolving through cycles of satellite homogenization and retrotransposon-driven diversification.
Get full access to this article
View all available purchase options and get full access to this article.
Already a Subscriber?Sign In
Supplementary Materials
This PDF file includes:
Other Supplementary Material for this manuscript includes the following:
Table S6
MDAR Reproducibility Checklist
References and Notes
1
H. S. Malik, S. Henikoff, Major evolutionary transitions in centromere complexity. Cell 138, 1067–1082 (2009).
2
D. P. Melters, K. R. Bradnam, H. A. Young, N. Telis, M. R. May, J. G. Ruby, R. Sebra, P. Peluso, J. Eid, D. Rank, J. F. Garcia, J. L. DeRisi, T. Smith, C. Tobias, J. Ross-Ibarra, I. Korf, S. W. L. Chan, Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 14, R10 (2013).
3
K. L. McKinley, I. M. Cheeseman, The molecular basis for centromere identity and function. Nat. Rev. Mol. Cell Biol. 17, 16–29 (2016).
4
M. K. Rudd, G. A. Wray, H. F. Willard, The evolutionary dynamics of α-satellite. Genome Res. 16, 88–96 (2006).
5
M. Jain, S. Koren, K. H. Miga, J. Quick, A. C. Rand, T. A. Sasani, J. R. Tyson, A. D. Beggs, A. T. Dilthey, I. T. Fiddes, S. Malla, H. Marriott, T. Nieto, J. O’Grady, H. E. Olsen, B. S. Pedersen, A. Rhie, H. Richardson, A. R. Quinlan, T. P. Snutch, L. Tee, B. Paten, A. M. Phillippy, J. T. Simpson, N. J. Loman, M. Loose, Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
6
K. H. Miga, S. Koren, A. Rhie, M. R. Vollger, A. Gershman, A. Bzikadze, S. Brooks, E. Howe, D. Porubsky, G. A. Logsdon, V. A. Schneider, T. Potapova, J. Wood, W. Chow, J. Armstrong, J. Fredrickson, E. Pak, K. Tigyi, M. Kremitzki, C. Markovic, V. Maduro, A. Dutra, G. G. Bouffard, A. M. Chang, N. F. Hansen, A. B. Wilfert, F. Thibaud-Nissen, A. D. Schmitt, J.-M. Belton, S. Selvaraj, M. Y. Dennis, D. C. Soto, R. Sahasrabudhe, G. Kaya, J. Quick, N. J. Loman, N. Holmes, M. Loose, U. Surti, R. A. Risques, T. A. Graves Lindsay, R. Fulton, I. Hall, B. Paten, K. Howe, W. Timp, A. Young, J. C. Mullikin, P. A. Pevzner, J. L. Gerton, B. A. Sullivan, E. E. Eichler, A. M. Phillippy, Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).
7
G. A. Logsdon, M. R. Vollger, P. Hsieh, Y. Mao, M. A. Liskovykh, S. Koren, S. Nurk, L. Mercuri, P. C. Dishuck, A. Rhie, L. G. de Lima, T. Dvorkina, D. Porubsky, W. T. Harvey, A. Mikheenko, A. V. Bzikadze, M. Kremitzki, T. A. Graves-Lindsay, C. Jain, K. Hoekzema, S. C. Murali, K. M. Munson, C. Baker, M. Sorensen, A. M. Lewis, U. Surti, J. L. Gerton, V. Larionov, M. Ventura, K. H. Miga, A. M. Phillippy, E. E. Eichler, The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).
8
S. Nurk, S. Koren, A. Rhie, M. Rautiainen, A. V. Bzikadze, A. Mikheenko, M. R. Vollger, N. Altemose, L. Uralsky, A. Gershman, S. Aganezov, S. J. Hoyt, M. Diekhans, G. A. Logsdon, M. Alonge, S. E. Antonarakis, M. Borchers, G. G. Bouffard, S. Y. Brooks, G. V. Caldas, H. Cheng, C.-S. Chin, W. Chow, L. G. de Lima, P. C. Dishuck, R. Durbin, T. Dvorkina, I. T. Fiddes, G. Formenti, R. S. Fulton, A. Fungtammasan, E. Garrison, P. G. S. Grady, T. A. Graves-Lindsay, I. M. Hall, N. F. Hansen, G. A. Hartley, M. Haukness, K. Howe, M. W. Hunkapiller, C. Jain, M. Jain, E. D. Jarvis, P. Kerpedjiev, M. Kirsche, M. Kolmogorov, J. Korlach, M. Kremitzki, H. Li, V. V. Maduro, T. Marschall, A. M. McCartney, J. McDaniel, D. E. Miller, J. C. Mullikin, E. W. Myers, N. D. Olson, B. Paten, P. Peluso, P. A. Pevzner, D. Porubsky, T. Potapova, E. I. Rogaev, J. A. Rosenfeld, S. L. Salzberg, V. A. Schneider, F. J. Sedlazeck, K. Shafin, C. J. Shew, A. Shumate, Y. Sims, A. F. A. Smit, D. C. Soto, I. Sović, J. M. Storer, A. Streets, B. A. Sullivan, F. Thibaud-Nissen, J. Torrance, J. Wagner, B. P. Walenz, A. Wenger, J. M. D. Wood, C. Xiao, S. M. Yan, A. C. Young, S. Zarate, U. Surti, R. C. McCoy, M. Y. Dennis, I. A. Alexandrov, J. L. Gerton, R. J. O’Neill, W. Timp, J. M. Zook, M. C. Schatz, E. E. Eichler, K. H. Miga, A. M. Phillippy, The complete sequence of a human genome. bioRxiv 2021.05.26.445798 [Preprint] (2021).
9
Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
10
S. Maheshwari, T. Ishii, C. T. Brown, A. Houben, L. Comai, Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence. Genome Res. 27, 471–478 (2017).
11
G. P. Copenhaver, K. Nickel, T. Kuromori, M.-I. Benito, S. Kaul, X. Lin, M. Bevan, G. Murphy, B. Harris, L. D. Parnell, W. R. McCombie, R. A. Martienssen, M. Marra, D. Preuss, Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286, 2468–2474 (1999).
12
P. B. Talbert, R. Masuelli, A. P. Tyagi, L. Comai, S. Henikoff, Centromeric localization and adaptive evolution of an Arabidopsis histone H3 variant. Plant Cell 14, 1053–1066 (2002).
13
J. M. Martinez-Zapater, M. A. Estelle, C. R. Somerville, A highly repeated DNA sequence in Arabidopsis thaliana. Mol. Gen. Genet. 204, 417–423 (1986).
14
E. K. Round, S. K. Flowers, E. J. Richards, Arabidopsis thaliana centromere regions: Genetic map positions and repetitive DNA structure. Genome Res. 7, 1045–1053 (1997).
15
A. M. McCartney, K. Shafin, M. Alonge, A. V. Bzikadze, G. Formenti, A. Fungtammasan, K. Howe, C. Jain, S. Koren, G. A. Logsdon, K. H. Miga, A. Mikheenko, B. Paten, A. Shumate, D. C. Soto, I. Sović, J. M. D. Wood, J. M. Zook, A. M. Phillippy, A. Rhie, Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. bioRxiv 2021.07.02.450803 [Preprint] (2021).
16
T. Hosouchi, N. Kumekawa, H. Tsuruoka, H. Kotani, Physical map-based sizes of the centromeric regions of Arabidopsis thaliana chromosomes 1, 2, and 3. DNA Res. 9, 117–121 (2002).
17
A. Rhie, B. P. Walenz, S. Koren, A. M. Phillippy, Merqury: Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
18
D. A. Wright, D. F. Voytas, Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses. Genome Res. 12, 122–131 (2002).
19
B. F. McAllister, J. H. Werren, Evolution of tandemly repeated sequences: What happens at the end of an array? J. Mol. Evol. 48, 469–481 (1999).
20
P. Ni, N. Huang, F. Nie, J. Zhang, Z. Zhang, B. Wu, L. Bai, W. Liu, C.-L. Xiao, F. Luo, J. Wang, Genome-wide detection of cytosine methylations in plant from nanopore sequencing data using deep learning. bioRxiv 2021.02.07.430077 [Preprint] (2021).
21
H. Stroud, T. Do, J. Du, X. Zhong, S. Feng, L. Johnson, D. J. Patel, S. E. Jacobsen, Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nat. Struct. Mol. Biol. 21, 64–72 (2014).
22
H. Stroud, M. V. C. Greenberg, S. Feng, Y. V. Bernatavichute, S. E. Jacobsen, Comprehensive analysis of silencing mutants reveals complex regulation of the Arabidopsis methylome. Cell 152, 352–364 (2013).
23
Y. Jacob, S. Feng, C. A. LeBlanc, Y. V. Bernatavichute, H. Stroud, S. Cokus, L. M. Johnson, M. Pellegrini, S. E. Jacobsen, S. D. Michaels, ATXR5 and ATXR6 are H3K27 monomethyltransferases required for chromatin structure and gene silencing. Nat. Struct. Mol. Biol. 16, 763–768 (2009).
24
R. Yelagandula, H. Stroud, S. Holec, K. Zhou, S. Feng, X. Zhong, U. M. Muthurajan, X. Nie, T. Kawashima, M. Groth, K. Luger, S. E. Jacobsen, F. Berger, The histone variant H2A.W defines heterochromatin and promotes chromatin condensation in Arabidopsis. Cell 158, 98–109 (2014).
25
J. Shi, S. E. Wolf, J. M. Burke, G. G. Presting, J. Ross-Ibarra, R. K. Dawe, Widespread gene conversion in centromere cores. PLOS Biol. 8, e1000327 (2010).
26
C. Lambing, A. J. Tock, S. D. Topp, K. Choi, P. C. Kuo, X. Zhao, K. Osman, J. D. Higgins, F. C. H. Franklin, I. R. Henderson, Interacting genomic landscapes of REC8-cohesin, chromatin, and meiotic recombination in Arabidopsis. Plant Cell 32, 1218–1239 (2020).
27
C. Lambing, P. C. Kuo, A. J. Tock, S. D. Topp, I. R. Henderson, ASY1 acts as a dosage-dependent antagonist of telomere-led recombination and mediates crossover interference in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 117, 13647–13658 (2020).
28
K. Choi, X. Zhao, A. J. Tock, C. Lambing, C. J. Underwood, T. J. Hardcastle, H. Serra, J. Kim, H. S. Cho, J. Kim, P. A. Ziolkowski, N. E. Yelina, I. Hwang, R. A. Martienssen, I. R. Henderson, Nucleosomes and DNA methylation shape meiotic DSB frequency in Arabidopsis thaliana transposons and gene regulatory regions. Genome Res. 28, 532–546 (2018).
29
M. Rigal, C. Becker, T. Pélissier, R. Pogorelcnik, J. Devos, Y. Ikeda, D. Weigel, O. Mathieu, Epigenome confrontation triggers immediate reprogramming of DNA methylation and transposon silencing in Arabidopsis thaliana F1 epihybrids. Proc. Natl. Acad. Sci. U.S.A. 113, E2083–E2092 (2016).
30
A. Steimer, P. Amedeo, K. Afsar, P. Fransz, O. Mittelsten Scheid, J. Paszkowski, Endogenous targets of transcriptional gene silencing in Arabidopsis. Plant Cell 12, 1165–1178 (2000).
31
S. C. Lee, E. Ernst, B. Berube, F. Borges, J.-S. Parent, P. Ledon, A. Schorn, R. A. Martienssen, Arabidopsis retrotransposon virus-like particles and their regulation by epigenetically activated small RNA. Genome Res. 30, 576–588 (2020).
32
A. Rhie, S. A. McCarthy, O. Fedrigo, J. Damas, G. Formenti, S. Koren, M. Uliano-Silva, W. Chow, A. Fungtammasan, J. Kim, C. Lee, B. J. Ko, M. Chaisson, G. L. Gedman, L. J. Cantin, F. Thibaud-Nissen, L. Haggerty, I. Bista, M. Smith, B. Haase, J. Mountcastle, S. Winkler, S. Paez, J. Howard, S. C. Vernes, T. M. Lama, F. Grutzner, W. C. Warren, C. N. Balakrishnan, D. Burt, J. M. George, M. T. Biegler, D. Iorns, A. Digby, D. Eason, B. Robertson, T. Edwards, M. Wilkinson, G. Turner, A. Meyer, A. F. Kautt, P. Franchini, H. W. Detrich III, H. Svardal, M. Wagner, G. J. P. Naylor, M. Pippel, M. Malinsky, M. Mooney, M. Simbirsky, B. T. Hannigan, T. Pesout, M. Houck, A. Misuraca, S. B. Kingan, R. Hall, Z. Kronenberg, I. Sović, C. Dunn, Z. Ning, A. Hastie, J. Lee, S. Selvaraj, R. E. Green, N. H. Putnam, I. Gut, J. Ghurye, E. Garrison, Y. Sims, J. Collins, S. Pelan, J. Torrance, A. Tracey, J. Wood, R. E. Dagnew, D. Guan, S. E. London, D. F. Clayton, C. V. Mello, S. R. Friedrich, P. V. Lovell, E. Osipova, F. O. Al-Ajli, S. Secomandi, H. Kim, C. Theofanopoulou, M. Hiller, Y. Zhou, R. S. Harris, K. D. Makova, P. Medvedev, J. Hoffman, P. Masterson, K. Clark, F. Martin, K. Howe, P. Flicek, B. P. Walenz, W. Kwak, H. Clawson, M. Diekhans, L. Nassar, B. Paten, R. H. S. Kraus, A. J. Crawford, M. T. P. Gilbert, G. Zhang, B. Venkatesh, R. W. Murphy, K.-P. Koepfli, B. Shapiro, W. E. Johnson, F. Di Palma, T. Marques-Bonet, E. C. Teeling, T. Warnow, J. M. Graves, O. A. Ryder, D. Haussler, S. J. O’Brien, J. Korlach, H. A. Lewin, K. Howe, E. W. Myers, R. Durbin, A. M. Phillippy, E. D. Jarvis, Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
33
E. Wijnker, G. Velikkakam James, J. Ding, F. Becker, J. R. Klasen, V. Rawat, B. A. Rowan, D. F. de Jong, C. B. de Snoo, L. Zapata, B. Huettel, H. de Jong, S. Ossowski, D. Weigel, M. Koornneef, J. J. B. Keurentjes, K. Schneeberger, The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana. eLife 2, e01426 (2013).
34
S. J. Durfy, H. F. Willard, Patterns of intra- and interarray sequence variation in alpha satellite from the human X chromosome: Evidence for short-range homogenization of tandemly repeated DNA sequences. Genomics 5, 810–821 (1989).
35
N. Altemose, G. A. Logsdon, A. V. Bzikadze, P. Sidhwani, S. A. Langley, G. V. Caldas, S. J. Hoyt, L. Uralsky, F. D. Ryabov, C. J. Shew, M. E. G. Sauria, M. Borchers, A. Gershman, A. Mikheenko, V. A. Shepelev, T. Dvorkina, O. Kunyavskaya, M. R. Vollger, A. Rhie, A. M. McCartney, M. Asri, R. Lorig-Roach, K. Shafin, S. Aganezov, D. Olson, L. Gomes de Lima, T. Potapova, G. A. Hartley, M. Haukness, P. Kerpedjiev, F. Gusev, K. Tigyi, S. Brooks, A. Young, S. Nurk, S. Koren, S. R. Salama, B. Paten, E. I. Rogaev, A. Streets, G. H. Karpen, A. F. Dernburg, B. A. Sullivan, A. F. Straight, T. J. Wheeler, J. L. Gerton, E. E. Eichler, A. M. Phillippy, W. Timp, M. Y. Dennis, R. J. O’Neill, J. M. Zook, M. C. Schatz, P. A. Pevzner, M. Diekhans, C. H. Langley, I. A. Alexandrov, K. H. Miga, Complete genomic and epigenetic maps of human centromeres. bioRxiv 2021.07.12.452052 [Preprint] (2021).
36
M. M. Mahtani, H. F. Willard, Physical and genetic mapping of the human X chromosome centromere: Repression of recombination. Genome Res. 8, 100–110 (1998).
37
S. Tsukahara, A. Kawabe, A. Kobayashi, T. Ito, T. Aizu, T. Shin-i, A. Toyoda, A. Fujiyama, Y. Tarutani, T. Kakutani, Centromere-targeted de novo integrations of an LTR retrotransposon of Arabidopsis lyrata. Genes Dev. 26, 705–713 (2012).
38
A. Kawabe, S. Nasuda, Structure and genomic organization of centromeric repeats in Arabidopsis species. Mol. Genet. Genomics 272, 593–602 (2005).
39
S. J. Klein, R. J. O’Neill, Transposable elements: Genome innovation, chromosome diversity, and centromere conflict. Chromosome Res. 26, 5–23 (2018).
40
H. S. Malik, The centromere-drive hypothesis: A simple basis for centromere complexity. Prog. Mol. Subcell. Biol. 48, 33–52 (2009).
41
D. Haig, A. Grafen, Genetic scrambling as a defence against meiotic drive. J. Theor. Biol. 153, 531–558 (1991).
42
M. Kolmogorov, J. Yuan, Y. Lin, P. A. Pevzner, Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
43
S. Sato, Y. Nakamura, T. Kaneko, E. Asamizu, S. Tabata, Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6, 283–290 (1999).
44
D. B. Sloan, Z. Wu, J. Sharbrough, Correction of persistent errors in Arabidopsis reference mitochondrial genomes. Plant Cell 30, 525–527 (2018).
45
H. Li, Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
46
A. Morgulis, E. M. Gertz, A. A. Schäffer, R. Agarwala, WindowMasker: Window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).
47
A. Morgulis, G. Coulouris, Y. Raytselis, T. L. Madden, R. Agarwala, A. A. Schäffer, Database indexing for production MegaBLAST searches. Bioinformatics 24, 1757–1764 (2008).
48
D. Guan, S. A. McCarthy, J. Wood, K. Howe, Y. Wang, R. Durbin, Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
49
S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, M. Shumway, C. Antonescu, S. L. Salzberg, Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
50
M. Alonge, S. Soyk, S. Ramakrishnan, X. Wang, S. Goodwin, F. J. Sedlazeck, Z. B. Lippman, M. C. Schatz, RaGOO: Fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).
51
R. Poplin, P.-C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger, J. Dijamco, N. Nguyen, P. T. Afshar, S. S. Gross, L. Dorfman, C. Y. McLean, M. A. DePristo, A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
52
H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN] (2013).
53
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
54
P. Danecek, J. K. Bonfield, J. Liddle, J. Marshall, V. Ohan, M. O. Pollard, A. Whitwham, T. Keane, S. A. McCarthy, R. M. Davies, H. Li, Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
55
A. R. Quinlan, I. M. Hall, BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
56
C. Jain, A. Rhie, H. Zhang, C. Chu, B. P. Walenz, S. Koren, A. M. Phillippy, Weighted minimizer sampling improves long read mapping. Bioinformatics 36, i111–i118 (2020).
57
B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
58
F. J. Sedlazeck, P. Rescheneder, M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz, Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
59
M. Alonge, X. Wang, M. Benoit, S. Soyk, L. Pereira, L. Zhang, H. Suresh, S. Ramakrishnan, F. Maumus, D. Ciren, Y. Levy, T. H. Harel, G. Shalev-Schlosser, Z. Amsellem, H. Razifard, A. L. Caicedo, D. M. Tieman, H. Klee, M. Kirsche, S. Aganezov, T. R. Ranallo-Benavidez, Z. H. Lemmon, J. Kim, G. Robitaille, M. Kramer, S. Goodwin, W. R. McCombie, S. Hutton, J. Van Eck, J. Gillis, Y. Eshed, F. J. Sedlazeck, E. van der Knaap, M. C. Schatz, Z. B. Lippman, Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).
60
R. Vaser, I. Sović, N. Nagarajan, M. Šikić, Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
61
G. Formenti, A. Rhie, B. P. Walenz, F. Thibaud-Nissen, K. Shafin, S. Koren, E. W. Myers, E. D. Jarvis, A. M. Phillippy, Merfin: improved variant filtering and polishing via k-mer validation. bioRxiv 2021.07.16.452324 [Preprint] (2021). .
62
H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
63
N. Abdennur, L. A. Mirny, Cooler: Scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
64
A. Shumate, S. L. Salzberg, Liftoff: Accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
65
S. Ou, W. Su, Y. Liao, K. Chougule, J. R. A. Agda, A. J. Hellinga, C. S. B. Lugo, T. A. Elliott, D. Ware, T. Peterson, N. Jiang, C. N. Hirsch, M. B. Hufford, Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
66
N. Buisine, H. Quesneville, V. Colot, Improved detection and annotation of transposable elements in sequenced genomes using multiple reference sequence sets. Genomics 91, 467–475 (2008).
67
K. D. Yamada, K. Tomii, K. Katoh, Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees. Bioinformatics 32, 3246–3251 (2016).
68
D. Ellinghaus, S. Kurtz, U. Willhoeft, LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
69
K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
70
P. Rice, I. Longden, A. Bleasby, EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
71
K. Liu, C. R. Linder, T. Warnow, RAxML and FastTree: Comparing two methods for large-scale maximum likelihood phylogeny estimation. PLOS ONE 6, e27731 (2011).
72
M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10 (2011).
73
B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
74
F. Ramírez, F. Dündar, S. Diehl, B. A. Grüning, T. Manke, deepTools: A flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
75
A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
76
A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
77
B. Bushnell, R. Egan, A. Copeland, B. Foster, A. Clum, H. Sun, BBMap: a fast, accurate, splice-aware aligner (2014); https://sourceforge.net/projects/bbmap/.
78
F. Krueger, Trim galore. A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files; https://github.com/FelixKrueger/TrimGalore.
79
F. Krueger, S. R. Andrews, Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
80
B. A. Rowan, V. Patel, D. Weigel, K. Schneeberger, Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping. G3 5, 385–398 (2015).
81
T. Mandáková, M. A. Lysak, Chromosome preparation for cytogenetic analyses in Arabidopsis. Curr. Protoc. Plant Biol. 1, 43–51 (2016).
82
J. W. IJdo, R. A. Wells, A. Baldini, S. T. Reeders, Improved telomere detection using a telomere repeat probe (TTAGGG)n generated by PCR. Nucleic Acids Res. 19, 4780 (1991).
83
K. Nagaki, P. B. Talbert, C. X. Zhong, R. K. Dawe, S. Henikoff, J. Jiang, Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163, 1221–1225 (2003).
84
M. Ravi, S. W. L. Chan, Haploid plants produced by centromere-mediated genome elimination. Nature 464, 615–618 (2010).
85
S. J. Armstrong, G. H. Jones, Meiotic cytology and chromosome behaviour in wild-type Arabidopsis thaliana. J. Exp. Bot. 54, 1–10 (2003).
86
J. D. Higgins, E. Sanchez-Moran, S. J. Armstrong, G. H. Jones, F. C. H. Franklin, The Arabidopsis synaptonemal complex protein ZYP1 is required for chromosome synapsis and normal fidelity of crossing over. Genes Dev. 19, 2488–2500 (2005).
87
B. Zhu, W. Zhang, T. Zhang, B. Liu, J. Jiang, Genome-wide prediction and validation of intergenic enhancers in Arabidopsis using open chromatin signatures. Plant Cell 27, 2415–2426 (2015).
88
H. Serra, C. Lambing, C. H. Griffin, S. D. Topp, D. C. Nageswaran, C. J. Underwood, P. A. Ziolkowski, M. Séguéla-Arnaud, J. B. Fernandes, R. Mercier, I. R. Henderson, Massive crossover elevation via combination of HEI10 and recq4a recq4b during Arabidopsis meiosis. Proc. Natl. Acad. Sci. U.S.A. 115, 2437–2442 (2018).
89
C. J. Underwood, K. Choi, C. Lambing, X. Zhao, H. Serra, F. Borges, J. Simorowski, E. Ernst, Y. Jacob, I. R. Henderson, R. A. Martienssen, Epigenetic activation of meiotic recombination near Arabidopsis thaliana centromeres via loss of H3K9me2 and non-CG DNA methylation. Genome Res. 28, 519–531 (2018).
90
Feng, S. J., Cokus, V., Schubert, J., Zhai, M., Pellegrini, S. E., and Jacobsen, Genome-wide Hi-C analyses in wild-type and mutants reveal high-resolution chromatin interactions in Arabidopsis. Mol. Cell 55, 694–707 (2014).
91
N. Kumekawa, T. Hosouchi, H. Tsuruoka, H. Kotani, The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 5. DNA Res. 7, 315–321 (2000).
92
N. Kumekawa, T. Hosouchi, H. Tsuruoka, H. Kotani, The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 4. DNA Res. 8, 285–290 (2001).
93
C. Pockrandt, M. Alzamel, C. S. Iliopoulos, K. Reinert, GenMap: Ultra-fast computation of genome mappability. Bioinformatics 36, 3687–3692 (2020).
94
J. M. Keith, Bioinformatics: Volume I: Data, Sequence Analysis, and Evolution (Humana, 2018).
Information & Authors
Information
Published In

Science
Volume 374 | Issue 6569
12 November 2021
12 November 2021
Copyright
Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
This is an article distributed under the terms of the Science Journals Default License.
Submission history
Received: 31 March 2021
Accepted: 27 September 2021
Published in print: 12 November 2021
Acknowledgments
This paper is dedicated to Simon Chan. We thank I. Thompson for ATHILA analysis, S. Henikoff for the generous gift of CENH3 antibodies, A. Shumate for help with gene Liftoff interpretation, B. Fischer for advice on high–molecular weight DNA isolation, and M. Pouch for assistance designing FISH probes.
Funding: This work was supported by BBSRC grants BB/S006842/1, BB/S020012/1, and BB/V003984/1 to I.R.H.; European Research Council Consolidator Award ERC-2015-CoG-681987 “SynthHotSpot” to I.R.H.; Marie Curie International Training Network “MEICOM” to I.R.H.; Human Frontier Science Program award RGP0025/2021 to T.K., M.C.S., and I.R.H.; US National Institutes of Health grant S10OD028632-01; US National Science Foundation grants DBI-1350041 and IOS-1732253 to M.C.S.; Royal Society awards UF160222 and RGF/R1/180006 to A.B.; the Czech Science Foundation grant no. 21-03909S to M.A.L.; the Gregor Mendel Institute to F.B.; grants Fonds zur Förderung der wissenschaftlichen Forschung (FWF) P26887, P28320, P30802, P32054, and TAI304 to F.B. and chromatin dynamics W1238 to A.S. and B.J.; Leverhulme Trust Research Leadership grant RL-2012-042 to J.T.; and grants from the Howard Hughes Medical Institute and US National Institutes of Health (RO1GM067014) to R.A.M.
Author contributions: M.N. sequenced DNA; performed genome assembly and analysis, ChIP-seq, and DNA methylation analysis; and wrote the manuscript. M.A. performed genome assembly, polishing, validation, annotation, and analysis and wrote the manuscript. P.W. performed satellite repeat annotation and genome analysis and wrote the manuscript. A.J.T. performed short-read alignment and genome analysis and wrote the manuscript. B.W.A. sequenced DNA, performed optical mapping, and contributed to the assembly. A.S. performed chromatin immunofluorescence analysis. B.J. provided ChIP-seq data. C.L. and P.K. performed immunocytology. N.Y. generated the DMC1 epitope-tagged line. N.H. and K.C. sequenced DNA and contributed to the assembly. L.M.S., J.T., and K.S. performed PacBio sequencing. T.K. and R.A.M. provided intellectual input. T.M. and M.A.L. performed FISH. F.B. supervised ChIP-seq and immunofluorescence analysis and wrote the manuscript. A.B. performed ATHILA annotation and genome analysis and wrote the manuscript. T.P.M. supervised DNA sequencing and genome assembly and analysis and wrote the manuscript. M.C.S. supervised genome assembly, validation, annotation, and analysis and wrote the manuscript. I.R.H. supervised DNA sequencing, genome assembly, validation, annotation, and analysis and wrote the manuscript.
Competing interests: The authors have no competing interests.
Data and materials availability: The ONT sequencing reads used for assembly are available for download at ArrayExpress accession E-MTAB-10272 (www.ebi.ac.uk/arrayexpress/). The PacBio HiFi reads are available for download at European Nucleotide Archive accession number PRJEB46164 (www.ebi.ac.uk/ena/browser/view/PRJEB46164). All data, code and materials are available in the manuscript or the supplementary materials and at https://github.com/schatzlab/Col-CEN.
Authors
Funding Information
National Science Foundation: DBI-1350041
National Science Foundation: IOS-1732253
National Institutes of Health: S10OD028632-01
H2020 European Research Council: ERC-2015-C
H2020 European Research Council: G-681987
Biotechnology and Biological Sciences Research Council: BB/S006842/1
Biotechnology and Biological Sciences Research Council: BB/V003984/1
Czech Science Foundation: 21-03909S
Marie Curie International Training Network MEICOM to IH Human Frontier Science Program: RGP0025/2021
Howard Hughes Medical Institute and National Institutes of Health: RO1GM067014
Human Frontier Science Program: RGP0025/2021
Biotechnology and Biological Sciences Research Council: BB/S020012/1
Austrian Science Fund: P26887
Austrian Science Fund: P28320
Austrian Science Fund: P30802
Austrian Science Fund: P32054
Austrian Science Fund: TAI304
Czech Science Foundation: 21-03909S
Leverhulme Trust: RL-2012-042
Royal Society: UF160222
Royal Society: RGF/R1/180006
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
View Options
Get Access
Log in to view the full text
AAAS login provides access to Science for AAAS members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your Account
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Purchase digital access to this article
Download and print this article for your personal scholarly, research, and educational use.
Buy a single issue of Science for just $15 USD.
View options
PDF format
Download this article as a PDF file
Download PDF





