A conserved oligomerization domain in the disordered linker of coronavirus nucleocapsid proteins

The nucleocapsid (N-)protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a key role in viral assembly and scaffolding of the viral RNA. It promotes liquid-liquid phase separation (LLPS), forming dense droplets that support the assembly of ribonucleoprotein particles with as-of-yet unknown macromolecular architecture. Combining biophysical experiments, molecular dynamics simulations, and analysis of the mutational landscape, we describe a heretofore unknown oligomerization site that contributes to LLPS, is required for the assembly of higher-order protein-nucleic acid complexes, and is coupled to large-scale conformational changes of N-protein upon nucleic acid binding. The self-association interface is located in a leucine-rich sequence of the intrinsically disordered linker between N-protein folded domains and formed by transient helices assembling into trimeric coiled-coils. Critical residues stabilizing hydrophobic and electrostatic interactions between adjacent helices are highly protected against mutations in viable SARS-CoV-2 genomes, and the oligomerization motif is conserved across related coronaviruses, thus presenting a target for antiviral therapeutics.

Oligomeric configurations of FL-N-protein predicted by ColabFold.
(A) Hexamer of six linker-CTD constructs (N:181-364) with three CTD dimer contacts and two LRS trimers. Only three chains are highlighted in blue, red, and green tones for clarity. Truncation was required due to limitations in ColabFold GPU memory and computation time. (B) Tetramer of full-length N-proteins with pairwise CTD dimer contacts and LRS tetramer contacts. Two chains in the symmetric tetramer are colored in blue and red tones, respectively. While the CTD of each chain is engaged in dimeric contacts, the LRS regions form a parallel coiled-coil. The NTD of each chain is linked by disordered regions and in a random position not in contact with any other domain.

Supporting figure S4
Oligomeric structures of the LRS peptide predicted by ColabFold. Good prediction confidence was achieved, highest for the trimer with maximum pLDDT score of 86.8, closely followed by a dimer (84.9) and the tetramer (76.7). (A) Predicted alignment error (PAE) of the trimer N LRS peptide, depicting high confidence predictions in blue. The PAE matrix shows high-confidence intermolecular contacts in the off-diagonal. Highlighted in the structure as pseudo-bonds are contacts (residues within 3.5 Å) of helix A and B (side view) and A and B/C (top view). The residues in contacts with highest confidence are labeled. (B) Predicted structure of the trimer, dimer, and tetramer with highlighted side-chains of residues of the blue and magenta helix that are within 3.5 Å. The oligomeric structures are close to symmetric with regard to their conformation and intermolecular contacts. (C) The trimer surface with color rendered by hydrophobicity shows shape complementary of chains. Residues highly protected from mutations are labeled.

Supporting figure S5
Snaphots along the MD simulations for N LRS monomer and oligomers. (B) Hydrophobic interactions of the residues on the stabilizing surface patch. This patch is not a major stabilizing force for dimers, but it is significant in higher-order oligomers.

Supporting figure S7
N LRS mutants abrogating helix formation and self-association.
(A) Snaphots of MD simulations. Top row: Peptides N LRS :L222P and N LRS :R226P were introduced to disrupt the helix. The simulations show that P222 disrupts the cohesiveness of the surface hydrophobic patch in all the oligomers, which is effectively kept in place in the reference N LRS peptide by leucine. The hinge, clearly observed in the monomer, repositions the upper portion of the helices so that A217 and A218 in the oligomers take the role of L222, partially compensating for the loss of the hydrophobic interaction with L222. Less acute is the hinge introduced by P226, as the interactions between surrounding residues and their sizes prevent complete helix bending. However, the loss of the salt bridge with E231 in the oligomers is not compensated, which likely affects complex stability, particularly in the trimer, where it is stronger. Middle row: Three mutations were introduced at position L224 to probe the role of this critical leucine on the surface hydrophobic patch: in N LRS :L224A, the smaller size of alanine weakens the hydrophobic forces that keep adjacent helices in place. Polar or charged residues at position 224 also destabilize the monomer due to the weakened hydrophobic forces mediated by the bulky/branched leucine in N LRS . In the oligomers, N LRS :L224S disrupts the hydrophobic interactions without any compensating force and destabilizes the helix creating transient kinks; distortions of a-helixes are common even in the absence of prolines. Bottom row: N LRS :L224D behaves similar to N LRS :L224S, although it also interacts electrostatically with R226. Mutant N LRS :L219A shows little difference in the monomer's behavior, but alanine has a deleterious effect on the stability of the hydrophobic cluster in the oligomers, similar to that in N LRS :L219A. In particular, the smaller alanine disengages L221, leading to a measurable opening of the upper portion of the oligomers and residues D216 more exposed to the solvent. (B) CD spectra of the same mutants at different concentrations. As a reference, spectra for N LRS without mutations are shown at 1.2 mM without TFE, and at 0.4 mM with 30% TFE (solid and dashed black lines). Most mutants show little or no concentration-dependence, with exception of N LRS :L222P.
(C) Concentration dependence of sedimentation coefficient distributions for the same mutants. For reference, the distribution of N LRS without mutations at 78 µM and 980 µM are shown as solid and dashed black lines. None of the mutants display self-association.

Supporting figure S8
N LRS mutants enhancing helix formation and self-association. Temperature-dependent folding of N LRS and enhancing and abrogating mutants. Raw CD data as a function of wavelength and temperature (Left Column) are decomposed into spectral components (Middle Column) and their corresponding temperature-dependent amplitudes (Right Column). All measurements are carried out with identical optical pathlengths. However, normalization to mean-A B C D residue ellipticity is not possible due to partial sedimentation of particles formed above the transition temperature leading to reduced concentrations in the light path. (A) The mutant N LRS :L222P was shown to abrogate helix formation and self-association. In the temperature scan it shows little helicity below a transition at 36.5 °C to a helical state, which unfolds into a largely disordered state at 42.5 °C.
(B,C) The reference peptide N LRS exhibits concentration-dependent helicity. From the temperature scan a transition can be discerned at 42 °C (0.4 mM) and 52.5 °C (1.2 mM) to a state with greater magnitude of the ratio of signals at 208 nm and 222 nm, and slight shift of the first minimum to lower wavelength (at 1.2 mM, minimum is at 208 nm at low versus 202 nm at high temperature). This is indicative of a state at higher temperature that is still largely helical but with higher disordered fraction.
(D) The mutant N LRS :G215C has been shown to enhance helicity and self-association. The temperaturedependent CD spectra show a similar transition as the reference N LRS peptide, but at a higher temperature of 55.8 °C. From the slightly higher ratio of θ(208)/θ(222)=1.34 compared to the reference peptide (θ(208)/θ(222)=1.17 for N LRS at 1.2 mM), as well as the smaller shift in the first minimum (206 nm at high temperature), the high-temperature state appears to maintain a greater degree of folding than N LRS .

Supporting figure S10
Properties of the naturally occurring mutant L222M in N LRS . The effect of TFE on FL-N. CD spectra of FL-N in working buffer (green) and supplemented with 30% TFE (magenta).

Supporting figure S12
Probing NA binding of N LRS . SV experiment of N LRS (green), T 20 (blue), and a mixture (magenta) under conditions where strong binding of NA to FL-N is observed, and with the large molar excess of N LRS . Shown are sedimentation coefficient distributions c(s) for data acquired at 230 nm (solid lines) and with the interference optical detection system (dashed lines). The absorbance signal is dominated by T 20 , whereas the interference signal is dominated by the peptide. 1:1 complexes would be expected to sediment between 1.5 and 2.5 S. The absence of absorbance c(s) signal in this range places a lower limit for K D of a potential N LRS -T 20 interaction in the mM range.

Supporting figure S13
Predicted trimeric structure and concentration-dependent linked self-association/folding of SARS-CoV-1 nucleocapsid LRS peptide.