Sequence and epigenetic landscapes of active and silent nucleolus organizer regions in Arabidopsis

Arabidopsis thaliana has two ribosomal RNA (rRNA) gene loci, nucleolus organizer regions NOR2 and NOR4, whose complete sequences are missing in current genome assemblies. Ultralong DNA sequences assembled using an unconventional approach yielded ~5.5- and 3.9-Mbp sequences for NOR2 and NOR4 in the reference strain, Col-0. The distinct rRNA gene subtype compositions of the NORs enabled the positional mapping of their active and inactive regions, using RNA sequencing to identify subtype-specific transcripts and DNA sequencing to identify subtypes associated with flow-sorted nucleoli. Comparisons of wild-type and silencing-defective plants revealed that most rRNA gene activity occurs in the central region of NOR4, whereas most, but not all, genes of NOR2 are epigenetically silenced. Intervals of low CG and CHG methylation overlap regions where gene activity and gene subtype homogenization are high. Collectively, the data reveal the genetic and epigenetic landscapes underlying nucleolar dominance (differential NOR activity) and implicate transcription as a driver of rRNA gene concerted evolution.

The PDF file includes: Figs. S1 to S8 Tables S1 and S2 Legends for data S1 to S10 Other Supplementary Material for this manuscript includes the following: Data S1 to S10 Fig. S1.Variation identified by PacBio sequencing of rRNA gene repeat units.Genomic DNA was digested with the rDNA-specific endonuclease I-Ppo and resulting ~10 kb fragments were gel-purified and deep-sequenced.Individual reads were then aligned to an rRNA gene reference sequence (see file S1).Sequences identical to the reference are shown in gray.Differences relative to the reference are shown in black or other colors.Intervals corresponding to the VLEs used to define rRNA gene subtypes are shown at the bottom of the figure, as in Figure 1B.
Fig. S2A.Gene subtypes defined by their VLE compositions.The 59 VLEs summarized in Figure 1B occur in 74 different combinations among rRNA genes, thus defining 74 distinct rRNA gene subtypes.The most prevalent VLE is VARA, occurring in 64 gene subtypes and corresponding to the presence of two tandem C repeats within the 5' ETS.VLEs corresponding to different 3' ETS sequences provide the next largest groupings, with 27 gene subtypes having VAR1-class VLEs (with point mutations allowing 1.1 or 1.2 sub-classifications), twenty-four having VAR3.1 or 3.2 VLEs and nineteen having VAR2.1 or 2.2 VLEs.The least frequent subtypes carry VAR4, VAR5, and VAR6 VLEs.Subtype-specific VLEs occur primarily in the IGS, where there can be as few as one, or as many as five, clusters of Sal repeats (each repeat being 20-21 nt long) that vary in number.Sal repeat cluster lengths, in bp, are given in the gene subtype names.Likewise, variable numbers of spacer promoters (SPs) occur when there are two or more Sal repeat clusters.In total, there are 42 different Sal cluster-SP arrangements, 27 of which are unique to single rRNA gene subtypes.The remaining 15 IGS arrangements occur in two or more subtypes, an example being 294-SP-1272, having Sal repeat clusters of 294 bp and 1272 bp separated by a single spacer promoter.This IGS arrangement is present in six subtypes that collectively represent the 3 major 3'ETS classes, VAR1, VAR2 and VAR3.

Fig. S4 .
Fig. S4.Physical mapping test of the NOR assemblies using custom guide RNAs that direct Cas9 cleavage of rRNA gene 3' ETS regions.(A) The diagram shows the 3'ETS variable region and the positions of PCR primers that flank the region.PCR amplification of genomic DNA using these primers yields products of different lengths, corresponding to the abundant genes bearing the VAR1, VAR2 and VAR3 VLEs and the less abundant genes bearing the VAR4 VLE.The stained agarose gel shows the amplification products obtained using either uncut genomic DNA (left lanes) or DNA that had been incubated with Cas9 and three different sgRNAs prior to PCR.Two of the sgRNAs guide the digestion of genes of two different VLE classes that share the same sgRNA target sequence (either VAR1 + VAR2 or VAR3 + VAR4), and the third sgRNA targets three VLE classes (VAR1 + VAR3 + VAR4).Note that the targeted VLE classes are depleted among the PCR amplification products, demonstrating the specificity and efficacy of the sgRNA-Cas9 complexes.(B) In silico prediction of large sgRNA-Cas9 digestion fragments of NOR2 and NOR4 based on the sgRNA specificities demonstrated in panel A. The sizes of the expected fragments are shown, with color-coding showing the regions of the NORs giving rise to the fragments.(C) sgRNA-Cas9 digestion products visualized by CHEF gel electrophoresis and Southern blotting with a 25S rRNA probe.For this experiment, ultra-high molecular weight gDNA was embedded in agarose plugs and subjected to Cas9 digestion programmed by individual sgRNAs, as in panel A, or a mix of all three sgRNAs.I-PpoI and no-digestion controls are included in the first and last lanes.The DNA fragments were resolved by CHEF electrophoresis and visualized by Southern blotting and hybridization to the 25S rDNA probe.Predicted large fragments (see panel B) were observed.

Fig. S5 .
Fig. S5.VLEs that are common to both NORs.The positions of 6 VLEs present within genes of both NOR2 and NOR4 are indicated by colored horizontal lines.Tel and Cen indicate the telomere and centromere-proximal ends of the NORs.TAIR10 indicates where the current sequences of chromosomes 2 and 4 begin in the TAIR10 genome assembly.

Table S1 . ONT sequencing run statistics Total output
SNPs / small indels detected by Illumina sequencing A.

Table S2 .
45S rRNA gene consensus reference sequence (fasta file) used for VLE analyses and sequence for subtype #10, used for dot-plot analyses of ONT reads Data S2.(separate file) NOR assembly landmarks, observed vs. predicted coverage based on sequencing depth.ONT read alignments based on VLEs for NOR2 telomere-proximal end.ONT read alignments based on VLEs for NOR2 centromere-proximal end.ONT read alignments based on VLEs for NOR4 telomere-proximal end.ONT read alignments based on VLEs for NOR4 central region.
Data S7.(separate file)ONT read alignments based on VLEs for NOR4 centromere-proximal end.