Structural impact on SARS-CoV-2 spike protein by D614G substitution

Substitution for aspartic acid by glycine at position 614 in the spike (S) protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the ongoing pandemic, appears to facilitate rapid viral spread. The G614 variant has now replaced the D614-carrying virus as the dominant circulating strain. We report here cryo-EM structures of a full-length S trimer carrying G614, which adopts three distinct prefusion conformations differing primarily by the position of one receptor-binding domain (RBD). A loop disordered in the D614 S trimer wedges between domains within a protomer in the G614 spike. This added interaction appears to prevent premature dissociation of the G614 trimer, effectively increasing the number of functional spikes and enhancing infectivity. The loop transition may also modulate structural rearrangements of S protein required for membrane fusion. These findings extend our understanding of viral entry and suggest an improved immunogen for vaccine development.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), an enveloped positivestranded RNA virus, is the cause of the ongoing COVID-19 pandemic. It probably originated when related viruses circulating in bats acquired the ability to infect human cells 1 . While viral evolution is believed to be slow, owing to the RNA proofreading capability of its replication machinery 2 , a variant with a single-residue substitution (D614G) in its spike protein has rapidly become the dominant strain throughout the world 3 . Understanding the molecular features of the now most prevalent virus strain can guide intervention strategies to control the crisis.
SARS-CoV-2 initiates infection by fusion of its envelope lipid bilayer with the membrane of a host cell. This critical step in the viral life cycle is catalyzed by the trimeric spike (S) protein, which is produced as a single-chain precursor and subsequently processed by a furin-like protease into the receptor-binding fragment S1 and the fusion fragment S2 4 .
After engagement of the receptor-binding domain (RBD) in S1 with the viral receptor angiotensin converting enzyme 2 (ACE2) on the host cell surface, followed by a second proteolytic cleavage within S2 (S2' site) 5 , the S protein undergoes large conformational changes, prompting dissociation of S1 and irreversible refolding of S2 into a postfusion structure 6,7 . Formation of the postfusion S2 provides energy for overcoming the kinetic barrier of membrane fusion and effectively brings the viral and cellular membranes close together to induce fusion of the two.
Rapid advances in the structural biology of the SARS-CoV-2 S protein, an important target for development of diagnostics, therapeutics and vaccines, include structures of S protein fragments derived from the original virus carrying D614: the S ectodomain stabilized in its prefusion conformation 8,9 , RBD-ACE2 complexes [10][11][12][13] , and segments of S2 in the postfusion state 14 . In the prefusion ectodomain structure, S1 folds into four domains -NTD (N-terminal domain), RBD, and two CTDs (C-terminal domains), and wraps around the prefusion conformation of S2, with the RBD sampling two distinct conformations -"up" for a receptor-accessible state and "down" for a receptorinaccessible state. We and others have also reported structures of a purified, full-length D614 S protein in both prefusion and postfusion conformations 15,16 . Studies by cryoelectron tomography, with chemically inactivated SARS-CoV-2 preparations, using both D614 and G614 variants have revealed additional structural details of S proteins present on the surface of virion [17][18][19][20] .
Epidemiological surveillance indicated that the SARS-CoV-2 carrying G614 outcompeted the original virus and became the globally dominant form within a month 3,21,22 . This single-residue substitution appears to correlate with high viral loads in infected patients and high infectivity of pseudotyped viruses, but not with disease severity 3 . The G614 virus has comparable (or even slightly higher) sensitivity to neutralization by convalescent human sera or vaccinated hamster sera 3,23-25 , suggesting that the response to vaccination with immunogens containing D614 remains effective against the new strain. Further studies have demonstrated that S1 dissociates more readily from the D614 virus than from G614 virus 26 , indicating that the D614 viral spike is substantially less stable than the G614 variant. The stabilized, soluble S ectodomain trimer with G614 samples the RBD-up conformations more frequently than does the D614 trimer 25,27 , but it is puzzling why the former binds more weakly to recombinant ACE2 than the latter 27 . The known S trimer structures indicate that the D614G change breaks a salt bridge between D614 and a positively charged residue (K854) in the fusion peptide proximal region (FPPR) 15 , which may help clamp the RBD in the prefusion conformation. This observation can explain why the G614 trimer favors the RBD-up conformations, but does not account for its increased stability. To resolve these issues, which relate directly to viral entry mechanism and to vaccine development, we report here the structural consequences of the D614G substitution in the context of the fulllength S protein.

Characteristics of the full-length SARS-CoV-2 S protein carrying G614
Following our established protocols for producing the full-length S protein containing D614 solubilized in detergent 15 , we transfected HEK293 cells with a construct expressing a full-length wildtype SARS-CoV-2 S with G614. We compared the membrane fusion activity of the G614 S protein with that of the full-length D614 S construct in a βgalactosidase-based cell-cell fusion assay 15 . As shown in Fig. S1A, all the cells expressing S fused efficiently with cells transfected with a human ACE2 construct, demonstrating that the S proteins expressed on the cell surfaces are fully functional. At the low transfection levels, the G614 S had higher fusion activity than the D614 S, but the difference diminished with the increased amount of transfected DNA, suggesting that the high expression levels can compensate for any possible defects that associated with the D614 S protein. We also tested inhibition of the cell-cell fusion by an engineered trimeric ACE2-based inhibitor that competes with the receptor on the target cells 28 , showing that the G614 trimer is slightly more sensitive than the D614 trimer (Fig. S1B).
To purify the full-length S protein, we used an expression construct fused with a Cterminal strep-tag, which was equally active in cell-cell fusion as the untagged version ( Fig. S1A), and purified both G614 and D614 proteins under identical conditions. We lysed the transfected cells and solubilized membrane-bound proteins in 1% detergent dodecyl-β-D-maltopyranoside (DDM), purified the strep-tagged S proteins by elution from strep-tactin resin in 0.3% DDM followed by gel filtration chromatography in 0.02% DDM. The D614 protein eluted in three peaks representing the prefusion S trimer, the postfusion S2 trimer and the dissociated monomeric S1, respectively, as we have reported previously 15 . The G614 protein eluted as a single major peak, corresponding to the prefusion S trimer (Fig. 1A). Coomassie-stained SDS-PAGE analysis confirmed that the G614 peak contained mainly the cleaved S1/S2 complex (~90%) and a small amount of the uncleaved S precursor (~10%). We found similar patterns when the two proteins were purified in detergent NP-40, indicating that the choice of detergent had not affected the relative stability of the two spike variants. We conclude that the single-residue substitution has a striking effect on the stability of the SARS-CoV-2 S trimer, even as a purified protein.
We measured by bio-layer interferometry (BLI) binding of the prefusion trimer fractions of the full-length proteins to recombinant soluble ACE2 (Fig. 1B). The S trimers bound more strongly to a dimeric ACE2 than to a monomeric ACE2, as expected. The G614 protein bound ACE2 less tightly than did the D614 protein, consistent with the measurements reported by others using soluble constructs 27 . This observation appears inconsistent with reports that the G614 trimer has a more exposed RBD than the D614 trimer 17,18,25,27 .

Cryo-EM structures of the full-length S trimer with G614
We determined the cryo-EM structures of the full-length S trimer carrying G614 in both DDM and NP-40. Cryo-EM images were acquired on a Titan Krios electron microscope operated at 300 keV and equipped with a Gatan K3 direct electron detector. We used  Table S2). The same three classes were observed for the samples purified in both detergents, giving essentially identical maps of the corresponding classes after refinement (Fig. S6). These results demonstrate that detergent has little impact on the S structure at least in the visible regions of the ectodomain.
The overall structure of the full-length S protein with G614 in the closed, three RBDdown prefusion conformation is very similar to that of the D614 S trimer that we have published recently ( Fig. 2; ref 15 ). In the three RBD-down structure, the four domains in each S1, including NTD, RBD, CTD1 and CTD2, wrap around the three-fold axis of the trimer, protecting the prefusion S2. The furin cleavage site at the S1/S2 boundary remains disordered, making it difficult to determine whether this structure represents the uncleaved or cleaved trimer, although the preparation contains primarily the cleaved forms (Fig. 1A). The S2 fragment folds around a central three-stranded coiled coil that forms the most stable part of the structure with the strongest density in the entire S trimer; it is also the least variable region among all the known S trimer structures. The S2 structure is identical in the two structures of the G614 trimer with one RBD projecting upwards, either completely or partially (Figs. S7 and S8A). In the conformation with one RBD fully up, the two neighboring NTDs, including the one from the same protomer, shift away from the three-fold axis (Fig. S7). In the RBD-intermediate conformation, only the NTD from the adjacent protomer packing directly against the moving RBD shifted, suggesting that there is at least one local free-energy minimum along the pathway of the RBD upward movement.
The D614G substitution eliminates a salt bridge between residue 614 in CTD2 of one subunit and residue 854 in the FPPR of the adjacent subunit, probably destabilizing the latter 15 . Nonetheless, the FPPR in the three RBD-down conformation of the G614 trimer is structured, although the density in the regions is slightly weaker than in the D614 map ( Fig. S8B). A major difference between the G614 and D614 trimer structures is that a ~20-residue segment (620-640) in the CTD2, largely disordered in the D614 trimer, has become structured in the G614 trimer. There is no density for the C-terminal segments, including HR2, TM and CT, consistent with the flexibility near residue Pro1162 found in cryo-ET studies 17,18 .

Structural consequences of the D614G substitution
To examine the structural changes resulted from the D614G substitution, we superposed the structures of the G614 trimer onto the D614 trimer in the closed conformation aligning them by the invariant S2 (Fig. 2B). A shift by a clockwise, outward rotation of all three S1 subunits, relative to the D614 structure, is evident even for the G614 trimer in the closed conformation. A similar shift was also observed in the RBD-intermediate and RBD-up G614 structures. Thus, the D614G substitution has led to a slightly more open conformation than that of the D614 trimer, even when all three RBDs are down. The D614G change has apparently also rigidified a neighboring segment of CTD2, residues 620-640, which we designate the "630 loop". This loop inserts into a gap, slightly wider in the G614 than in the D614 trimer, between the NTD and CTD1 of the same protomer In the immediate vicinity of residue 614, the change from Asp to Gly did not cause any large local structural rearrangements except for loss of the D614-K854 salt bridge we had previously predicted 15 , and a small shift of residue 614 towards the three-fold axis (Fig.   4A). The position of the FPPR and the conformation of K854 allow a hydrogen bond between K854Nη and the main-chain carbonyl of G614, perhaps accounting for the subtlety of the structural difference. The loss of the salt bridge involving D614, at least partially compensated by the new hydrogen bond between G614 and K854, was apparently not sufficient to destabilize the packing of the FPPR against the rest of the trimer, but it did weaken the FPPR density, especially between residues 842-846. The 630 loop, which packs directly against the NTD, CTD1 and CTD2 of the same protomer, lies close to the S1/S2 boundary of the same protomer and the FPPR of an adjacent protomer (Fig. 4B). The loop inserts between the NTD and CTD1 (Fig. 4C), probably requiring the shifts illustrated in Fig. 2B. This wedge-like loop may also help secure the NTD and CTDs and enhance G614 S trimer stability.
CTD2 is formed by two stacked, four-strand β-sheets, with a fifth strand in one sheet contributed by the connector between the NTD and RBD. In the other sheet, an interstrand loop contains the S1/S2 cleavage site, and thus one strand is the N-terminal segment of S2 (Fig. 4B). In the G614 trimer, one side of the 630 loop packs along a long hydrophobic surface, largely solvent-exposed in the D614 trimer, formed by residues on the "upward" facing surface of the CTD2 along with Pro295 from the NTD (Fig. 4D).
Pro631, Trp633 and Val635 of the 630 loop appear to contribute to this interaction. Since S1 dissociation from S2 likely requires destabilization of the CTD2 to free the β-strand from the N-terminal end of S2, an ordered 630 loop that completes folding of the CTD2 by closing off an exposed, hydrophobic surface may retard S1 shedding, thereby enhancing the stability of a cleaved S trimer.
We note that although S1 in the G614 trimer moves outwards from its position in the D614 trimer, the extent of the shift is still appreciably smaller than the shift seen in soluble S trimers stabilized by a trimerization foldon tag and two proline mutations (Fig.   S9). The comparison suggests that the soluble trimer may not completely mimic all the physiologically relevant conformations of the S trimer.

Structural basis of increased stability and infectivity of the G614 spike
The structures described here allow us to propose an explanation why the virus carrying the D614G substitution, with a more stable S trimer, is more infectious than the original strain. We consider the schematic free energy landscape of the S-protein conformational distribution in Fig. 5. In the D614 trimer, the kinetic barrier for transition from the RBD- Our data can also explain why the G614 trimer binds somewhat less tightly to ACE2 than does the D614 trimer, despite the greater proportion of well-exposed RBDs, both as seen here and as reported by others 17,18,25,27 . The higher kinetic barrier for an upward shift in the RBD than that in the D614 trimer would create a significant hurdle for the rest of RBDs in the same G614 trimer or the closed G614 trimers to adopt an ACE2-accessible state. We note that the second binding event with the dimeric ACE2 has a slower on-rate and also a slower off-rate for the G614 than the D614 trimer (Table S1).It is therefore not surprising that the G614 trimer preparation showed weaker binding to both monomeric and dimeric ACE2 than did a D614 S preparation. We suggest that the enhanced infectivity of the G614 virus largely results from the increased stability of the S trimer, rather than the better exposed RBDs. Indeed, if the virus that passed from bats to humans or to an intermediate vector contained D614 (also present in the bat coronavirus BatCoV RaTG13 1 ), then it could have gained fitness in the new host by acquiring changes such as G614 for greater stability and infectivity than the parental form.

Membrane fusion mechanism
We previously hypothesized that the FPPR might modulate the fusogenic structural rearrangements of S protein, as it retains the RBDs in the down conformation but moves and free the N-terminal segment of S2 to dissociate from S1, if the furin site has already been cleaved, and release S1 altogether. Dissociation of S1 would then initiate a cascade of refolding events in the metastable prefusion S2, allowing the fusogenic transition to a stable postfusion structure. We note that this model is very similar to that proposed for membrane fusion catalyzed by HIV envelope protein, in which gp120 dissociation triggers refolding of gp41 to complete the fusion process 30 .

Implications for vaccine development
The SARS-CoV-2 S protein is the centerpiece of almost all the first-generation vaccines, being developed, which began at early stages in the pandemic and used the D614 sequence. We have previously suggested that the inactivated-virus vaccines might have too many postfusion spikes and induce mainly non-neutralizing antibodies 15 . Indeed, these vaccine candidates induced the lowest level of neutralizing antibody responses; other S constructs containing stabilization modifications to prevent conformational changes have given much stronger responses 31 , but their protective efficacy still needs to be evaluated in phase 3 clinical trials. The G614 S trimer is naturally constrained in a prefusion state that presents both the RBD-down and RBD-up conformations with great stability. It is therefore likely to be a superior immunogen, whether in a protein or nucleic acid form, for eliciting protective neutralizing antibody responses, which appear largely to target the RBD and NTD 32,33 .

Acknowledgments:
We thank the SBGrid team for technical support, K. Arnett for support and advice on the

Expression and purification of recombinant proteins
Expression and purification of the full-length S protein carrying G614 were carried out as previously described 15  Expi293F cells transfected with monomeric ACE2 or dimeric ACE2 expression construct were grown in 250 ml roller bottles with DMEM containing 10% FBS. The supernatant of the cell culture was collected by centrifugation at 2,524 xg for 30 minutes. The monomeric ACE2 was purified by affinity chromatography using Ni-NTA agarose (Qiagen, Hilden, Germany), followed by gel filtration chromatography, as described previously 34,35 . The peak fractions were pooled and concentrated to 10 mg/ml using a 30 kDa MWCO Millipore filter (MilliporeSigma, Burlington, MA). The supernatant of dimeric ACE2 was loaded to a column packed with GammaBind Plus Sepharose beads (GE Healthcare). The column was washed with PBS. The protein was eluted using 100 mM glycine (pH 2.5) and neutralized immediately with 2 M Tris-HCl (pH 8.0). The eluted protein was further purified by gel filtration chromatography on a Superdex 200 Increase 10/300 GL column. The peak fractions were pooled and concentrated to 5 mg/ml using a 50kDa MWCO Millipore filter.

Binding assay by bio-layer interferometry (BLI)
Binding of monomeric or dimeric ACE2 to the full-length Spike protein was measured using an Octet RED384 system (ForteBio, Fremont, CA). Each ACE2 protein was diluted using the running buffer (PBS, 0.02% Tween 20, 1 mg/ml BSA) and transferred to a 96well plate. The full-length S protein was immobilized to Amine Reactive 2 nd Generation (AR2G) biosensors (ForteBio), following a protocol recommended by the manufacturer.
After equilibrating in the running buffer for 5 minutes, the sensors with immobilized Spike protein were dipped in the wells containing the ACE2 protein at various concentrations (5.56-450 nM for monomeric ACE2; 2.78-225 nM for dimeric ACE2) for 5 minutes to measure the association rate. The sensors were then dipped in the running buffer for 10 minutes to determine the dissociation rate. Control sensors with no Spike protein were also dipped in the ACE2 solutions and the running buffer as references.
Recorded sensorgrams with background subtracted from the references were analyzed using the software Octet Data Analysis HT Version 11.1 (ForteBio). The curves for monomeric ACE2 were fit to a 1:1 binding model, while those for dimeric ACE2 were fit to a bivalent binding model.

Cell-cell fusion assay
The cell-cell fusion assay, based on the α-complementation of E. coli β-galactosidase, was conducted to quantify the fusion activity mediated by SARS-CoV2 S protein, as described 15 . Briefly, various amount of the full-length SARS-CoV2 (614D or 614G) S construct (0.025-10 µg) and the α fragment of E. coli β-galactosidase construct (10 µg), or the full-length ACE2 construct (10 µg) together with the ω fragment of E. coli βgalactosidase construct (10 µg), were transfected to HEK293T cells using Polyethylenimine (PEI) (80 µg). After a 24-hour incubation at 37°C, the cells were detached using DPBS buffer with 5mM EDTA and resuspended in complete DMEM medium. 50 µl S-expressing cells (1.0x10 6 cells/ml) were mixed with 50 µl ACE2expressing cells (1.0x10 6 cells/ml) to allow the cell-cell fusion proceed at 37 °C for 2 hours. Cell-cell fusion activity was quantified using a chemiluminescent assay system, Gal-Screen (Applied Biosystems, Foster City, CA), following the standard protocol recommended by the manufacturer. The substrate was added to the mixture of the cells and allowed to react for 90 minutes in dark at room temperature. The luminescence signal was recorded with a Synergy Neo plate reader (Biotek).
For the inhibition assay, the S-expressing cells were incubated with trimeric ACE2 variant, ACE2 615 -foldon T27W 28 , at varied concentrations (6.25-200 µg/ml) for 1 hour at 37°C. After the incubation, the ACE2-expressing cells were added to the mixture, followed with a 2-hour incubation at 37 °C. The fusion activity was quantified using the Gal-Screen system as mentioned above.

Cryo-EM sample preparation and data collection.
To prepare cryo grids, 3.5 µl of the freshly purified G614 sample in NP-40 at ~0.3 mg/ml was applied to a 1.2/1.3 Quantifoil grid with continuous carbon support (Quantifoil Micro Tools GmbH), which had been glow discharged with a PELCO easiGlow TM Glow Discharge Cleaning system (Ted Pella, Inc.) for 60 s at 15 mA. For the G614 sample in DDM, 3.5 µl of the peak fraction from gel filtration chromatography at ~1.0 mg/ml was also applied to the glow discharged 1.2/1.3 Quantifoil grids (Quantifoil Micro Tools Gmb). Grids were immediately plunge-frozen in liquid ethane using a Vitrobot Mark IV (ThermoFisher Scientific), and excess protein was blotted away by using grade 595 filter paper (Ted Pella, Inc.) with a blotting time of 4 s, a blotting force of -12 at 4 in 100% humidity. The grids were first screened for ice thickness and particle distribution using a Talos Arctica transmission electron microscope (ThermoFisher Scientific), operated at 200 keV and equipped with a K3 direct electron detector (Gatan). For data collection, images were acquired with selected grids using a Titan Krios transmission electron microscope (ThermoFisher Scientific) operated at 300 keV and equipped with a BioQuantum GIF/K3 direct electron detector. Automated data collection was carried out using

Image processing and 3D reconstructions
Drift correction for cryo-EM images was performed using MotionCor2 37 , and contrast transfer function (CTF) was estimated by CTFFIND4 38 using motion-corrected sums without dose-weighting. Motion corrected sums with dose-weighting were used for all other image processing. RELION3.0.8 was used for particle picking, 2D classification, 3D classification and refinement procedure. Approximately 3,000 particles were manually picked for each protein sample and subjected to 2D classification to generate the templates for automatic particle picking. For the NP-40 sample, after manual inspection of auto-picked particles, a total of 3,640,242 particles were extracted from 11,577 images. The selected particles were subjected to 2D classification, giving a total of 2,657,624 good particles. A low-resolution negative-stain reconstruction of the sample was low-pass filtered to 30Å resolution and used as an initial model for 3D classification with C3 symmetry. One major class showed clear structural features were subjected to another round of 3D classification with C1 symmetry, giving one major class. Third round of 3D classification with C1 symmetry and further local angular search produced three major classes, representing the closed, three RBD-down conformation, the one RBD-up conformation and the RBD-intermediate conformation, respectively. The three classes were then subjected to 3D auto-refinement, followed by particle polishing and signal-subtraction classification focused on the apex region of the S trimer. The three classes containing 63,558, 86,353 and 93,426 particles were subjected to 3D refinement with C1 (RBD-intermediate and RBD-up) and C3 (closed) symmetry using an overall mask, resulting in three final reconstructions at 3.3Å, 3.5Å and 3.1Å resolutions, respectively. For the DDM sample, a total of 5,652,781 particles were extracted from 11,092 images by reference-based auto-picking. Two rounds of 2D classification were performed, giving 3,207,721 good particles. These particles were subjected to two rounds of 3D classification with C3 symmetry, yielding one major class with clear structural features. The refined map from the first dataset was low-pass filtered to 30Å resolution and used as the initial model for 3D classification.
A third round of 3D classification with C1 symmetry and further local angular search were performed, generating three major classes, which are similar to those from the NP-40 sample. CTF refinement and Bayesian polishing were performed for these classes. The RBD-up class containing 55,537 particles gave a 3D reconstruction at 3.2Å resolution after 3D auto-refine with C1 symmetry and overall mask; the RBDintermediate class with 63,040 particles and the closed conformation class with 55,096 particles were subjected to the same refinement strategy, yielding 3D maps with resolution at 3.2Å and 3.4Å. The best map from each class was used for model building.
Reported resolutions are based on the gold-standard Fourier shell correlation (FSC) using the 0.143 criterion. All density maps were corrected from the modulation transfer function of the K3 detector and then sharpened by applying a temperature factor that was estimated using post-processing in RELION. Local resolution was determined using RELION with half-reconstructions as input maps.

Model building
The initial templates for model building used the stabilized SARS-CoV-2 S ectodomain trimer structure (PDB ID 6XR8) for the prefusion conformation. Several rounds of manual building were performed in Coot 39 . The model was then refined in Phenix 40 against the 3.1Å (closed), 3.2Å (RBD-intermediate) and 3.5Å (RBD-up) cryo-EM maps. Iteratively, refinement was performed in both Phenix (real space refinement) and ISOLDE 41 , and the Phenix refinement strategy included minimization_global, local_grid_search, and adp, with rotamer, Ramachandran, and reference-model restraints, using 6XR8 as the reference model. The refinement statistics are summarized in Table S2. Structural biology applications used in this project were compiled and configured by SBGrid 42 .