The emergence of SARS-CoV-2 in Europe and North America
A series of unfortunate events
The history of how severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread around the planet has been far from clear. Several narratives have been propagated by social media and, in some cases, national policies were forged in response. Now that many thousands of virus sequences are available, two studies analyzed some of the key early events in the spread of SARS-CoV-2. Bedford et al. found that the virus arrived in Washington state in late January or early February. The viral genome from the first case detected had mutations similar to those found in Chinese samples and rapidly spread and dominated subsequent undetected community transmission. The other viruses detected had origins in Europe. Worobey et al. found that early introductions into Germany and the west coast of the United States were extinguished by vigorous public health efforts, but these successes were largely unrecognized. Unfortunately, several major travel events occurred in February, including repatriations from China, with lax public health follow-up. Serial, independent introductions triggered the major outbreaks in the United States and Europe that still hold us in the grip of control measures.
Abstract
Accurate understanding of the global spread of emerging viruses is critical for public health responses and for anticipating and preventing future outbreaks. Here we elucidate when, where, and how the earliest sustained severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission networks became established in Europe and North America. Our results suggest that rapid early interventions successfully prevented early introductions of the virus from taking hold in Germany and the United States. Other, later introductions of the virus from China to both Italy and Washington state, United States, founded the earliest sustained European and North America transmission networks. Our analyses demonstrate the effectiveness of public health measures in preventing onward transmission and show that intensive testing and contact tracing could have prevented SARS-CoV-2 outbreaks from becoming established in these regions.
In late 2019, the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19), ignited a pandemic that has been associated with more than 500,000 deaths globally as of July 2020. As the original outbreak in Hubei province, China, spilled into other countries, containment strategies focused on travel restrictions, isolation, and contact tracing. Given the virus’s exponential growth rate, delaying the onset of community transmission by even a few weeks likely bought government officials valuable time to establish diagnostic testing capacity and implement social distancing plans.
Viral genetic sequence data can provide critical information about whether viruses separated by time and space are likely to be epidemiologically linked. Genomic data have suggested differences in the timing, spatial origins, and transmission dynamics of early SARS-CoV-2 outbreaks in multiple North American locations, including Washington state (1, 2); the East Coast of the United States (3, 4); California (5); and British Columbia (BC), Canada (5, 6). The first confirmed U.S. case was associated with a virus strain (“WA1”) isolated in Washington state from a traveler who returned from Wuhan, China, on 15 January 2020 (7). No onward transmission was detected after extensive follow-up in what appeared to be successful containment of the country’s first known incursion of the virus (8). However, subsequent identification of viruses that were genetically similar to WA1—first in Washington, then in Connecticut (3), California (5), BC (6), and elsewhere—raised the possibility that WA1 had actually established chains of cryptic transmission that started on 15 January and went undetected for several weeks (1, 2). If true, this introduction would predate early SARS-CoV-2 community transmission chains documented elsewhere on the continent (3–5) and would establish the Seattle area as the epicenter of the North American epidemic. Hence, it is necessary to resolve this question to determine where the virus first initiated substantial community outbreaks and whether the earliest coast-to-coast spread of the virus within the United States (3) was from west to east or from east to west.
In Europe, the first diagnosed case occurred in an employee of an automobile supplier who visited the company’s headquarters in Bavaria, Germany, from Shanghai, China, on 20 January 2020 (9). She had been infected with SARS-CoV-2 in Shanghai (after her parents had visited from Wuhan) (10) and transmitted the virus to a German man who tested positive on 27 January (11) and whose viral genome (“BavPat1”) was sampled on 28 January (10). In total, this outbreak infected 16 employees but was apparently contained through rapid testing and isolation (9). Italy’s first major outbreak in Lombardy, which was apparent by ~20 February 2020, was associated with viruses closely related to BavPat1 but of a separate lineage (designated “B.1”), which differs from BavPat (a lineage “B” virus) by just 1 nucleotide in the nearly 30,000-nucleotide genome. A narrative took hold that the virus from Germany had not been contained but had been transmitting undetected for weeks and had been carried to Italy by an infected German (9, 12). In addition to igniting a severe outbreak in Italy, this B.1 lineage subsequently spread widely across Europe and beyond, initiating outbreaks in many countries, including the intense U.S. outbreak in New York City (NYC) (13, 14). Greater clarity about the effectiveness of Germany’s early contact tracing efforts has implications for the feasibility of controlling the virus through nonpharmaceutical interventions.
There are a number of limitations in phylogenetic and spatial inferences drawn from SARS-CoV-2 genomic data. SARS-CoV-2 has a relatively long (~29 kb) positive-sense single-stranded RNA genome that evolves at a rate of <1 × 10−3 substitutions per site per year, amounting to ~2 substitutions per genome per month. This rate is slower than that of most RNA viruses, owing to the “proofreading” activity encoded by the nonstructural gene nsp14 (15). Consequently, the entire global population of SARS-CoV-2 through March 2020 differed by only 0 to 12 nucleotide substitutions from the inferred ancestor of the entire pandemic. Transmission clusters tend to be defined by 1 to 3 nucleotide differences across the entire viral genome. Phylogeographic inferences are further confounded by the relatively low availability of genomic sequence data from locations that experienced early outbreaks, including Italy, Iran, and the original epicenter in Hubei. The combination of the relatively slow rate of SARS-CoV-2 evolution, its rapid dissemination within and between locations, and unrepresentative sampling presents risks for serious misinterpretation.
In this study, we investigated fundamental questions about when, where, and how SARS-CoV-2 established itself globally. We developed phylogenetic inferences from multiple sources of information—including airline passenger flow data between potential sources and destinations of viral dispersals early in the pandemic, as well as disease incidence data in Hubei province and other locales that likely affected the probability of infected travelers moving the virus around the globe. By combining a genomic epidemiology approach, which aims to account for the effects of undersampling viral genetic diversity in the epicenter of the pandemic, with consideration of expected evolutionary patterns for a novel pathogen with low diversity, we resolved key questions about how and when the SARS-CoV-2 pandemic unfolded in Europe and North America.
Emergence of SARS-CoV-2 in the United States
A key turning point in the U.S. outbreak occurred when researchers sequenced the first viral genome recovered from a putative case of community transmission in the United States (“WA2,” sampled in the Seattle area on 24 February 2020), reporting on 29 February that it was similar to WA1, the viral variant from the first-diagnosed COVID-19 patient (1). This finding led to the suggestion that WA1 might have established cryptic transmission in Washington state in mid-January (1). The researchers did, however, acknowledge the possibility of an independent introduction of WA2 separate from that of WA1. This finding fundamentally altered the picture of the SARS-CoV-2 situation in the United States, playing a decisive role in Washington state’s early adoption of intensive social distancing efforts. This, in turn, appeared to explain Washington state’s relative success in controlling the outbreak, compared with that of states that adopted a delayed approach, such as New York.
The availability of hundreds of SARS-CoV-2 genomes sampled in Washington state by mid-March revealed that WA2 belongs to a large, monophyletic clade of “A.1” lineage viruses that accounted for ~85% of cases in Washington state at that point, designated the “Washington state outbreak clade” (2) (hereafter “WA outbreak clade”). To investigate whether the WA outbreak clade was initiated in mid-January by WA1, we used these data to simulate the epidemic under the constraint that it had been established by WA1 and then compared the observed evolutionary patterns with those expected under that scenario. A range of phylogenetic patterns could have been observed in this large sample (e.g., Fig. 1, A to C) but were not (Fig. 1D).

Fig. 1 Schematic showing a hypothetical path that the key mutations in the WA outbreak could have taken in a susceptible population, alongside the inferred phylogeny.
(A) Scenario in which a hypothetical mutation occurs from WA1-like genomes. (B) Hypothetical phylogeny in which A17747 and C17858 from the original WA1 virus are maintained in the population and sampled at the end. (C) Hypothetical scenario in which a virus that differs from WA1 by one mutation (A17858G) is maintained in the population. (D) Observed tree from the WA outbreak.
To investigate whether the observed pattern of evolution reported in (1, 2) was consistent with the WA outbreak clade having descended from WA1, we used FAVITES (FrAmework for VIral Transmission and Evolution Simulation) to simulate outbreaks (16) (fig. S1 and table S1). These simulated outbreaks had a median doubling time of 4.7 days (95% range across simulations: 4.2 to 5.1 days)—including those that originated from so-called “superspreading” events (fig. S2)—and a fixed evolutionary rate of 0.8 × 10−3 substitutions per site per year. A duration of 2 months (61 days) was chosen to reflect the time period between WA1 and the implementation of disease mitigation efforts that would affect the median doubling time.
We examined the phylogenetic structure of maximum likelihood trees inferred from subsampled simulated viral sequences to determine how frequently they matched the observed relationship between WA1 and the WA outbreak clade. Specifically, a simulation tree matching the observed tree must produce a single branch emanating from WA1 that experiences at least two mutations (C17747T and A17858G in the observed tree) before establishment of a single outbreak clade (Fig. 2A). Alternative patterns include: (i) a virus identical to WA1 (Fig. 2B), (ii) a virus that differs from WA1 by a single mutation (Fig. 2C), (iii) a viral lineage forming a basal polytomy with WA1 and the outbreak clade (Fig. 2D), and (iv) a viral lineage that is a “sibling” of the outbreak clade but experiences fewer than two mutations before divergence (Fig. 2E). The frequency of alternative phylogenetic patterns in the simulated epidemics represents the probability that the true topology (Fig. 2A) could not have occurred if the WA outbreak clade had been initiated by WA1.

Fig. 2 Potential phylogenetic relationships between WA1 and the WA outbreak clade and their occurrence probabilities.
(A) Observed pattern in which the WA1 genome is the direct ancestor of the outbreak clade, separated by at least two mutations. (B) Identical sequence to that of WA1. (C) Sequence that diverges from the WA1 sequence by one mutation. (D) Lineage forming a basal polytomy with WA1 and the outbreak clade. (E) Sibling lineage to the outbreak clade, with fewer than two mutations from WA1 before divergence. The frequency of each relationship across 1000 simulations is reported in the gray box.
In 70.1% of simulations, we observed at least one virus that is genetically identical to WA1, with a median of 12 identical viruses in each simulation (95% range: 0 to 85 identical viruses) (Fig. 2). Not observing a virus identical to WA1 in the real Washington data does not significantly differ from expectation (P = 0.299). However, viruses with one mutation from WA1 were observed in 95.5% of simulations, indicating a low probability of failing to detect even a single sequence from Washington within one mutation of WA1 (P = 0.045). Lineages forming a basal polytomy with WA1 and the epidemic clade were observed in 99.7% of populations (P = 0.003), and 100% of simulations had at least one sibling lineage that diverged before experiencing two mutations and the formation of the outbreak clade (P < 0.001). Therefore, even if C17747T and A17858G were linked—a possibility because they are both nonsynonymous mutations located in the nsp13 helicase gene—we would still expect to see descendants of their predecessors in Washington before 15 March. In summary, when we simulated the Washington outbreak beginning with WA1 on 15 January 2020 and sampled 294 genomes in the first two months of this outbreak, we failed to observe a single simulated epidemic that had the characteristics of the real phylogeny (Fig. 2). These findings were robust to simulations that used a slower epidemic doubling time of 5.6 days (95% range: 5.2 to 5.9 days) or an accelerated substitution rate of 1.6 × 10−3 substitutions per site per year (16) (supplementary text).
Although WA outbreak-related genomes lacking one or the other of the clade-defining substitutions C17747T and A17858G (Fig. 2, C and E) were absent in this initial large sample from Washington state, such genomes have been reported to be very common in nearby BC, Canada (supplementary text). Genomes with the ancestral C17747 state constituted 16 of the first 27 WA outbreak-related genomes sequenced in BC and have been sampled occasionally at much lower frequency in several U.S. states (3). Such a high frequency of these viruses in BC but not in Washington state raises the possibility that BC, rather than Washington state, was the site of introduction of the founding virus of this key lineage. Another possibility is that these BC genomes are descendants of a separate A.1 lineage introduction from China. The first scenario seems unlikely because of epidemiological evidence that the outbreak was larger in Washington state than in BC in February and March; the second scenario is unlikely because it would necessitate both introduced lineages to independently acquire the C17747T mutation.
We therefore considered a third hypothesis: that these 16 BC viral genomes contain a sequencing error at position 17747 and, in reality, bear the derived C17747T mutation. We reasoned that if this were the case, some of these genomes might share additional derived mutations with C17747 and A17858G genomes sampled in the same location (i.e., they might be identical or highly similar except for a spurious C17747 base) (supplementary text). As shown in Fig. 3, this is indeed the case: Each of the six C17747 genomes from BC that contain one or more derived mutations at positions other than 17747 and 17858 shared one to four of these mutations with other locally sampled genomes. Such a pattern is virtually impossible to explain through homoplasy events. Observing even one such homoplasy in a genome with more than 29,000 bases is rare; the probability of observing more than one is infinitesimally small. Similarly, the hypothesis that the C17747 state in these genomes is due to multiple, independent T17747C reversions is untenable. Occasional C17747 genomes from California, Oregon, Wyoming, Minnesota, Washington state, and elsewhere also share derived mutations with viruses sampled in the same location (Fig. 3, table S2, and supplementary text). Most of these genomes were generated through the amplicon-based ARTIC protocol, and we speculate that mistaken incorporation of a primer sequence containing C17747 (“nCov2019_58_RIGHT”) may be the cause.

Fig. 3 Phylogeny of representative sequences, showing connections between sequences that share derived mutations despite differences at the key site 17747.
Derived mutations from ancestral states (relative to the reference sequence hCoV-19/Wuhan/Hu-1/2019|EPI_ISL_402125) are shown above each branch, with position numbers indicated. Branches are connected to taxon names with horizontal dotted lines. The taxon names include a two-letter state or province code, as well as the GISAID accession number. In cases for which more than one sequence is represented, the total number of additional, identical sequences is indicated after the “+” symbol. Sequences that share derived mutations are connected with colored lines on the right, with colors indicating the locations where the connected sequences were sampled. Some lines on the right are dashed for clarity. Names of sequences that contain the derived nucleotide at site 17747 are shaded in gray.
When we investigated an exhaustive collection of genomes sampled in Washington state, including those of viruses sampled after 15 March that are related to the WA outbreak clade (supplementary text), we detected a single virus lineage—“WA-S566,” sampled on 29 March 2020—that lacked the derived C17747T and A17858G mutations found in the rest of the WA outbreak clade. The phylogenetic position of this virus matches the pattern in Fig. 2D, though it differs from WA1 at seven additional sites. Hence, the observed pattern in this larger, and later, sample of ~1000 viral genomes reflects the scenario depicted in Fig. 1A rather than that in Fig. 1D. Consequently, we revisited our WA simulations, sampled 1000 genomes instead of the original 294, and looked for instances in which more than two lineages diverged before the formation of the outbreak clade. In 88.8% of the simulations, we observed two or fewer basally divergent lineages and, therefore, cannot reject a scenario in which WA1 gave rise to only two lineages that diverge as a basal polytomy (P = 0.112). However, in 99.0% of simulations, we observed three or more divergent lineages before two mutations (i.e., lineages that experienced zero or one mutation from WA1 before diverging; fig. S3). As a result, it is unlikely that, had it been the ancestral virus, WA1 would have given rise to only the S566 lineage and the WA outbreak clade (P = 0.010). Therefore, to explain the presence of S566 and the WA outbreak clade, we must seriously consider the possibility that there were multiple introductions of genetically similar viruses into the United States.
We thus turned to a distinct phylogeographic approach that explicitly considers the relatively late sampling time of WA-S566, along with other temporal, epidemiological, and geographic data. This method accounts for geographical gaps in sampling and integrates relevant covariates for global spatial spread in a Bayesian framework (16). We investigated how tree topologies were affected by the inclusion of unsampled viruses assigned to 12 of the most severely undersampled locations, both in China and globally, on the basis of COVID-19 incidence data (16). Realistic sampling time distributions were also inferred from COVID-19 incidence data. To better inform placement of unsampled viruses on the phylogeography, we adopted a generalized linear model formulation of the phylogenetic diffusion process (17). This approach estimates a significant contribution for both air passenger flow and asymmetric flow in and out of Hubei (both with Bayes factors >8042 and positive log effect sizes; supplementary text).
The resulting phylogeny (Fig. 4) provides one reconstruction of the possible evolutionary relationships of WA outbreak viruses and their closest relatives that realistically accounts for major gaps in sequence data. For low-diversity data, a single phylogeny has a resolution that is largely not supported by the full posterior tree distribution containing several plausible phylogeographic scenarios that must be considered, all of which are compatible with the genetic data [e.g., the mutation trees in (2) and those available at nextstrain.org]. The posterior maximum clade credibility (MCC) tree (Fig. 4) suggests that the WA outbreak clade (plus S566 and a sibling virus sampled in New York, “NY”) resulted from an introduction from Zhejiang, China, as supported by the clustering of sampled and unsampled taxa from this location. Additionally, although an introduction from a Chinese location other than Hubei yields considerable posterior support (bar chart inset in Fig. 4), Hubei is preferred over Zhejiang for the entire posterior sample as the most likely source for this introduction. Notably, although the genome from NY (near S566 in Fig. 4) is identical to that of WA1, its much more recent sampling time separates it from WA1 (and, similarly, early Chinese sampling) in the time-calibrated phylogeographic reconstruction. The more recent collection date for both this NY sample and S566, as well as modest support [posterior probability (pp) = 0.67] indicating that they share a U.S. location with the WA outbreak viruses, results in a reconstruction with a single introduction for these viruses. Using Markov jump estimates that account for phylogenetic uncertainty (18), we inferred 1 February 2020 [95% highest posterior density (HPD): 14 January to 15 February] as the time for this introduction, consistent with the observation that viruses from the WA outbreak clade were likely present during the voyage of the Grand Princess cruise ship to Mexico starting on 11 February (5).

Fig. 4 Hypothesis of SARS-CoV-2 entry into Washington state.
A subtree of the maximum clade credibility (MCC) tree is shown, depicting the evolutionary relationships inferred between (i) the first identified SARS-CoV-2 case in the United States (WA1); (ii) the clade associated with the Washington state outbreak (including WA2) and related viruses (WA-S566 and a virus from New York); and (iii) closely related viruses that were identified in multiple locations in Asia. Genome sequences sampled at the tips of the phylogeny are represented by circles shaded according to sampling location. Internal node circles, representing posterior clade support values, and branches are shaded similarly by location. Dotted lines represent branches associated with unsampled taxa assigned to Hubei and Zhejiang, China. Posterior location state probabilities are shown for three well-supported key nodes (circle color indicates inferred location state). The inset bar chart summarizes the probability by location for a second introduction giving rise to the WA outbreak clade. The mean date and 95% HPD intervals represent estimated time of introduction from Hubei.
Through a comparison with a time-inhomogeneous model, we show that our estimates are relatively robust to the assumption of constant covariate effect sizes through time (fig. S4). Although the time-inhomogeneous model was fitted to a dataset without unsampled viruses, it also provides strong support for an independent introduction from Hubei (fig. S5). Without unsampled taxa, we estimate a somewhat earlier date for the introduction of the ancestor of the WA outbreak clade plus S566 [26 January 2020 (95% HPD: 15 January to 7 February)], likely because the time-homogeneous analysis allows unsampled taxa from Hubei or other Chinese locations (as in the MCC tree in Fig. 5) to branch off near the WA outbreak clade. In the light of the travel restrictions, specifically those from Hubei, the earlier mean date obtained without unsampled taxa may be the more realistic estimate.

Fig. 5 MCC tree of SARS-CoV-2 entry into Europe.
A subtree was inferred for viruses from (i) the first outbreak in Europe (Germany, BavPat) and identical viruses from China, (ii) outbreaks in Italy and New York, and (iii) other locations in Europe. Dotted lines represent branches associated with unsampled taxa assigned to Italy and Hubei, China. Country codes are shown at branch tips for genomes sampled from travelers returning from Italy (BR, Brazil; FL, Finland; DE, Germany; NG, Nigeria; MX, Mexico; GB, United Kingdom of Great Britain and Northern Ireland). The inset bar chart summarizes the probability distribution for the location state ancestral to the Italian clade. Other features as described in Fig. 4.
The MCC tree suggests that a Malaysian virus also descended from this introduction (i.e., that it resulted from a subsequent United States–to–Malaysia jump). It is, however, much more plausible that this virus was introduced directly from China to Malaysia, but both the sequence and covariate data in the phylogeographic model lack the information to strongly support this scenario. In light of the simulation results, there is a distinct possibility that S566 and the related NY virus may have descended from a separate introduction from Asia, with the site of arrival in the United States unresolved owing to the presence of both a West Coast and East Coast virus in the clade. Accordingly, an analysis that does not assign a known location to S566 and the related NY virus supports independent introductions from Hubei for these viruses and for the WA outbreak clade (fig. S6), with 7 February (95% HPD: 23 January to 18 February) as the date for the latter.
Consistent with estimates of the introduction date of this viral lineage into Washington state, the Seattle Flu Study tested 6908 archived samples from January and February, of which only 10, from the end of February, were positive (19). Our estimates of the introduction date of the WA outbreak clade into Washington state around the end of January or beginning of February 2020 are ~2 weeks later than they would be if the outbreak had originated with WA1’s arrival on 15 January (2), implying that: (i) archived “self-swab” samples retrospectively detected the virus within a few weeks of its arrival (19), (ii) this Washington state outbreak may have been smaller than estimates based on the assumption of a 15 January arrival of WA1, and (iii) the individual who introduced the founding virus likely arrived in the United States when entry to the country was suspended for non-U.S. residents from China (beginning on 2 February 2020) (20), perhaps during the concurrent period when ~40,000 U.S. residents were repatriated from China, with screening described as cursory or lax (21). These passengers were directed to a short list of airports, including those in Los Angeles, San Francisco, New York, Chicago, Newark, Detroit, and Seattle (21). The late-February timing of COVID-19 cases in Solano County and Santa Clara County in California (5) (supplementary text) suggests that self-limited outbreaks may have originated from returning U.S. residents during this period. So although our reconstructions incorporating unsampled lineages do not account for travel restrictions, the remaining influx likely provided an opportunity for a second introduction of virus (distinct from the WA1 lineage), or even multiple such introductions, into Washington state. Recent inferences that there have been >1000 independent introductions of SARS-CoV-2 into the United Kingdom (22) lend support to this idea.
Early establishment of SARS-CoV-2 in Europe
We used a similar approach to investigate whether the Northern Italy SARS-CoV-2 outbreak was introduced from the German outbreak or independently from China: We simulated the Northern Italy outbreak under the hypothetical constraint that it was initiated by a virus imported from the German outbreak (fig. S7) and conducted phylogeographic analyses (Fig. 5). Our simulation framework suggested that the outbreak in Bavaria, Germany, was unlikely to be responsible for initiating the Italian outbreak (see fig. S7 and supplementary results for detailed phylogenetic scenarios). We again used realistic epidemiological parameters to simulate the origins of the Italian outbreak under the assumption that it was associated with viruses genetically related to the German virus BavPat1. Simulations with a median doubling time of 3.4 days (95% range: 2.9 to 4.4 days) resulted in a median epidemic size (including outbreaks that died out) of 725 infections (95% range: 140 to 2847 infections) after 36 days. In the observed phylogeny, the Italian outbreak is the sole descendant lineage from BavPat1. Within the Italian outbreak, no viruses are identical to BavPat1, and 4 of the 27 related viruses included in this analysis are separated from BavPat1 by a single mutation. In simulation, the distributions of identical and one-mutation-divergent viruses are not significantly different from expectation (P = 0.156 and 0.157, respectively). However, the lack of at least one descendant lineage that forms a polytomy with BavPat1 and the Italian outbreak significantly differs from expectation (P = 0.004). Therefore, it is highly unlikely that BavPat1 or a virus identical to it initiated the Italian outbreak (fig. S7). As with the WA outbreak, these findings were robust to different infection rates and faster evolutionary rates (supplementary text). Notably, therefore, both a WA1-origin of the WA outbreak and a German origin of the Italian outbreak are rejected even by misspecified models of the epidemiological and evolutionary process.
An alternative scenario in which the outbreaks in both Germany and Italy were independently introduced from China is further supported by our phylogeographic inference (Fig. 5). The resulting reconstruction provides stronger support for independent viral introductions from China into Germany and into Italy (pp = 0.84) than for a direct connection between Germany and Italy (pp = 0.16) (Fig. 5). Similar support is obtained for this scenario by a time-inhomogeneous inference without unsampled taxa (fig. S8). These findings emphasize that epidemiological linkages inferred from genetically similar SARS-CoV-2 associated with outbreaks in different locations can be highly tenuous, given low levels of sampled viral genetic diversity and insufficient background data from key locations.
Our approach infers that the European B.1 clade (emanating from the green node labeled 0.86 in Fig. 5), which also dominates in NYC (14) and Arizona (23), originated in Italy, as might be expected from the epidemiological evidence. Both travel history and unsampled diversity contribute to this inference. Although only two samples in our dataset are from Italy, five additional genomes were obtained from people who arrived from Italy (Fig. 5). The unsampled taxa from Italy further contributed to a reconstruction with stronger support for Italy at the origin of the entire clade (Fig. 5 versus fig. S8; also see fig. S9). The introduction from Hubei to Italy was dated to 28 January 2020 (95% HPD: 20 January to 6 February). This Italian-and-European cluster, in turn, was the source of multiple introductions to NYC (14). Using the same approach, we dated the introduction that led to the largest NYC transmission cluster to 12 February 2020 (95% HPD: 3 February to 22 February). This is consistent with the finding that the earliest seropositive samples in NYC were from the week of 17 February through 23 February (24).
Hence, even though a second introduction into Washington state (independent of WA1) implies that the Washington transmission cluster had a more recent origin date than under the WA1-origin scenario (~1 February versus 15 January, if it had originated with WA1), the WA outbreak clade still predates the earliest genomically identified transmission clusters elsewhere in the United States: the large one in NYC (4) and two smaller, apparently self-limited clusters in California (in Solano County and Santa Clara County) that appear to have been introduced from China (5). Of these, the transmission cluster from Santa Clara County appears older, dating to before 22 February 2020 (95% HPD: 5 February to 29 February) (supplementary text).
Discussion
Despite the early successes in containment, SARS-CoV-2 eventually took hold in both Europe and North America during the first 2 months of 2020—first in Italy around the end of January, then in Washington state around the beginning of February, and followed by NYC later that month. Our analyses therefore delineate when widespread community transmission was first established on both continents (Fig. 6) and clarify the period before SARS-CoV-2 establishment when contact tracing and isolation might have been most effective.

Fig. 6 SARS-CoV-2 introductions to Europe and the United States.
Pierce projection mapping of early and apparently “dead-end” introductions of SARS-CoV-2 to Europe and the United States. Successful dispersals between late January and mid-February are shown with solid arrows: from Hubei Province, China, to Northern Italy; from China to Washington state; and later from Europe (as the Italian outbreak spread more widely) to NYC and from China to California. Dashed arrows indicate dead-end introductions.
Our findings highlight the potential value of establishing intensive, community-level respiratory virus surveillance architectures, such as the Seattle Flu Study, during a pre-pandemic period. The value of detecting cases early, before they have bloomed into an outbreak, cannot be overstated in a pandemic situation (25). Given that every delay in case detection reduces the feasibility of containment, it is also worth assessing the impact of lengthy delays in the U.S. Food and Drug Administration’s approval of testing the Seattle Flu Study’s stored samples for SARS-CoV-2.
The public health response to the WA1 case in Washington state and the particularly impressive response to an early outbreak in Germany delayed local COVID-19 outbreaks by a few weeks and bought crucial time for U.S. and European cities, as well as those in other countries, to prepare for the virus when it finally did arrive. Surveillance efforts and genomic analyses subsequently helped close the gap between the onset of sustained community transmission and mitigation measures in Washington state, relative to other locales such as NYC. However, our evidence suggests that the period between the founding of the outbreak and the initiation of mitigation measures in Washington state was not as long as supposed under the WA1-origin hypothesis and that the outbreak may therefore have been smaller than some estimates based on that hypothesis.
Because the evolutionary rate of SARS-CoV-2 is slower than its transmission rate, many identical genomes are rapidly spreading. This genetic similarity places limitations on some inferences, such as calculating the ratio of imported cases to local transmissions in a given area. Nevertheless we have shown that, precisely because of this slow rate, when viral genomes are separated by as few as one mutation, this difference can provide enough information for hypothesis testing when appropriate methods are employed. Bearing this in mind will put us in a better position to understand SARS-CoV-2 in the coming years.
Acknowledgments
We thank the patients and healthcare workers who made the collection of this global viral dataset possible and all those who made viral genomic data available for analysis. We thank N. Moshiri for guidance on FAVITES, T. Bedford for insights into how viral genomic inferences influenced public health responses in Washington state, and L. du Plessis for insights into the timing of the origin of the WA outbreak clade on the basis of Grand Princess voyage dates. Funding: M.W. was supported by the David and Lucile Packard Foundation as well as the University of Arizona College of Science. This work was supported by the Multinational Influenza Seasonal Mortality Study (MISMS), an ongoing international collaborative effort to understand influenza epidemiology and evolution, led by the Fogarty International Center, NIH. The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement 725422-ReservoirDOCS) and from the European Union’s Horizon 2020 project MOOD (grant agreement 874850). The Artic Network receives funding from the Wellcome Trust through project 206298/Z/17/Z. J.O.W. acknowledges funding from the NIH (K01AI110181, AI135992, and AI136056). P.L. acknowledges support by the Research Foundation–Flanders (“Fonds voor Wetenschappelijk Onderzoek–Vlaanderen,” G066215N, G0D5117N, and G0B9317N). M.A.S. acknowledges support from NIH U19 AI135995. J.B.J. is thankful for support from the Canadian Institutes of Health Research Coronavirus Rapid Response Programme 440371 and Genome Canada Bioinformatics and Computational Biology Programme grant 287PHY. J.P. acknowledges funding from the NIH (T15LM011271). V.H. acknowledges funding from the Biotechnology and Biological Sciences Research Council (BBSRC) (grant BB/M010996/1). The content of this paper is solely the responsibility of the authors and does not necessarily represent official views of the NIH. We gratefully acknowledge support from NVIDIA Corporation, with the donation of parallel computing resources used for this research. Author contributions: Conceptualization: M.W. Methodology: M.W., J.P., M.A.S., P.L., and J.O.W. Software: J.P., M.A.S., P.L., and J.O.W. Validation: J.P., M.A.S., and P.L. Formal analysis: M.W., J.P., P.L., and M.A.S. Investigation: M.W., J.P., B.B.L., J.B.J., A.R., M.I.N., and V.H. Resources: M.W., P.L., and M.A.S. Data curation: B.B.L., J.B.J., and V.H. Writing – original draft: M.W. and M.I.N. Writing – review and editing: M.W., B.B.L., M.A.S., J.O.W., J.B.J., and A.R. Visualization: B.B.L., J.O.W., and A.R. Supervision: M.W. and J.O.W. Project administration: M.W. Funding acquisition: M.W., M.A.S., and J.O.W. Competing interests: J.O.W. has received funding from Gilead Sciences, LLC (completed) and the CDC (ongoing) via grants and contracts to his institution that are unrelated to this research. M.A.S. receives funding from Janssen Research & Development, IQVIA and Private Health Management via contracts unrelated to this research. Data and materials availability: All data used in this analysis are free to access: a BEAST .xml file example, FAVITES simulated phylogenies, the GISAID accession numbers for all sequences used in the analysis, and alignments are hosted at GitHub (https://github.com/Worobeylab/SC2_outbreak) and Zenodo (26). The GISAID data are also provided in table S3. This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
Supplementary Material
Summary
Materials and Methods
Supplementary Text
Figs. S1 to S9
Tables S1 to S4
MDAR Reproducibility Checklist
Resources
Correction (23 December 2020): The authors had originally referred readers to an external website that housed the acknowledgments table of the GISAID data used for the study. This table is now included in the supplementary materials as table S3, and the âÂÂData and materials availabilityâ section of the Acknowledgments has been updated. The authors regret the error and thank all GISAID contributors for the use of their data.
References and Notes
1
Trevor Bedford (@trvrb), “The team at the @seattleflustudy have sequenced the genome the #COVID19 community case reported yesterday from Snohomish County, WA, and have posted the sequence publicly to gisaid.org. There are some enormous implications here. 1/9” Twitter, 29 February 2020, 11:20 p.m.; https://twitter.com/trvrb/status/1233970271318503426.
2
T. Bedford, A. L. Greninger, P. Roychoudhury, L. M. Starita, M. Famulare, M.-L. Huang, A. Nalla, G. Pepper, A. Reinhardt, H. Xie, L. Shrestha, T. N. Nguyen, A. Adler, E. Brandstetter, S. Cho, D. Giroux, P. D. Han, K. Fay, C. D. Frazar, M. Ilcisin, K. Lacombe, J. Lee, A. Kiavand, M. Richardson, T. R. Sibley, M. Truong, C. R. Wolf, D. A. Nickerson, M. J. Rieder, J. A. Englund, The Seattle Flu Study Investigators, J. Hadfield, E. B. Hodcroft, J. Huddleston, L. H. Moncla, N. F. Müller, R. A. Neher, X. Deng, W. Gu, S. Federman, C. Chiu, J. S. Duchin, R. Gautom, G. Melly, B. Hiatt, P. Dykema, S. Lindquist, K. Queen, Y. Tao, A. Uehara, S. Tong, D. MacCannell, G. L. Armstrong, G. S. Baird, H. Y. Chu, J. Shendure, K. R. Jerome, Cryptic transmission of SARS-CoV-2 in Washington state. Science 370, 571–575 (2020).
3
J. R. Fauver, M. E. Petrone, E. B. Hodcroft, K. Shioda, H. Y. Ehrlich, A. G. Watts, C. B. F. Vogels, A. F. Brito, T. Alpert, A. Muyombwe, J. Razeq, R. Downing, N. R. Cheemarla, A. L. Wyllie, C. C. Kalinich, I. M. Ott, J. Quick, N. J. Loman, K. M. Neugebauer, A. L. Greninger, K. R. Jerome, P. Roychoudhury, H. Xie, L. Shrestha, M.-L. Huang, V. E. Pitzer, A. Iwasaki, S. B. Omer, K. Khan, I. I. Bogoch, R. A. Martinello, E. F. Foxman, M. L. Landry, R. A. Neher, A. I. Ko, N. D. Grubaugh, Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States. Cell 181, 990–996.e5 (2020).
4
A. S. Gonzalez-Reiche, M. M. Hernandez, M. J. Sullivan, B. Ciferri, H. Alshammary, A. Obla, S. Fabre, G. Kleiner, J. Polanco, Z. Khan, B. Alburquerque, A. van de Guchte, J. Dutta, N. Francoeur, B. S. Melo, I. Oussenko, G. Deikus, J. Soto, S. H. Sridhar, Y.-C. Wang, K. Twyman, A. Kasarskis, D. R. Altman, M. Smith, R. Sebra, J. Aberg, F. Krammer, A. García-Sastre, M. Luksza, G. Patel, A. Paniz-Mondolfi, M. Gitman, E. M. Sordillo, V. Simon, H. van Bakel, Introductions and early spread of SARS-CoV-2 in the New York City area. Science 369, 297–301 (2020).
5
X. Deng, W. Gu, S. Federman, L. du Plessis, O. G. Pybus, N. R. Faria, C. Wang, G. Yu, B. Bushnell, C.-Y. Pan, H. Guevara, A. Sotomayor-Gonzalez, K. Zorn, A. Gopez, V. Servellita, E. Hsu, S. Miller, T. Bedford, A. L. Greninger, P. Roychoudhury, L. M. Starita, M. Famulare, H. Y. Chu, J. Shendure, K. R. Jerome, C. Anderson, K. Gangavarapu, M. Zeller, E. Spencer, K. G. Andersen, D. MacCannell, C. R. Paden, Y. Li, J. Zhang, S. Tong, G. Armstrong, S. Morrow, M. Willis, B. T. Matyas, S. Mase, O. Kasirye, M. Park, G. Masinde, C. Chan, A. T. Yu, S. J. Chai, E. Villarino, B. Bonin, D. A. Wadford, C. Y. Chiu, Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California. Science 369, 582–587 (2020).
6
Trevor Bedford (@trvrb), “This separate introduction may have been to British Columbia or may have been elsewhere. Better resolving this introduction geographically would benefit from additional sequencing of samples collected closer in time to the introduction event. 14/18.” Twitter, 25 May 2020, 7:35 p.m.; https://twitter.com/trvrb/status/1265063937663328256.
7
M. L. Holshue, C. DeBolt, S. Lindquist, K. H. Lofy, J. Wiesman, H. Bruce, C. Spitters, K. Ericson, S. Wilkerson, A. Tural, G. Diaz, A. Cohn, L. Fox, A. Patel, S. I. Gerber, L. Kim, S. Tong, X. Lu, S. Lindstrom, M. A. Pallansch, W. C. Weldon, H. M. Biggs, T. M. Uyeki, S. K. Pillai; Washington State 2019-nCoV Case Investigation Team, First Case of 2019 Novel Coronavirus in the United States. N. Engl. J. Med. 382, 929–936 (2020).
8
A. Harmon, “Inside the Race to Contain America’s First Coronavirus Case.” The New York Times, 5 February 2020; www.nytimes.com/2020/02/05/us/corona-virus-washington-state.html.
9
D. A. Bolduc, “Webasto disputes link to Italy coronavirus outbreak.” Automotive News, 9 March 2020; www.autonews.com/suppliers/webasto-disputes-link-italy-coronavirus-outbreak.
10
M. M. Böhmer, U. Buchholz, V. M. Corman, M. Hoch, K. Katz, D. V. Marosevic, S. Böhm, T. Woudenberg, N. Ackermann, R. Konrad, U. Eberle, B. Treis, A. Dangel, K. Bengs, V. Fingerle, A. Berger, S. Hörmansdorfer, S. Ippisch, B. Wicklein, A. Grahl, K. Pörtner, N. Muller, N. Zeitlmann, T. S. Boender, W. Cai, A. Reich, M. An der Heiden, U. Rexroth, O. Hamouda, J. Schneider, T. Veith, B. Mühlemann, R. Wölfel, M. Antwerpen, M. Walter, U. Protzer, B. Liebl, W. Haas, A. Sing, C. Drosten, A. Zapf, Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: A case series. Lancet Infect. Dis. 20, 920–928 (2020).
11
C. Rothe, M. Schunk, P. Sothmann, G. Bretzel, G. Froeschl, C. Wallrauch, T. Zimmer, V. Thiel, C. Janke, W. Guggemos, M. Seilmaier, C. Drosten, P. Vollmar, K. Zwirglmaier, S. Zange, R. Wölfel, M. Hoelscher, Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany. N. Engl. J. Med. 382, 970–971 (2020).
12
P. Forster, L. Forster, C. Renfrew, M. Forster, Phylogenetic network analysis of SARS-CoV-2 genomes. Proc. Natl. Acad. Sci. U.S.A. 117, 9241–9243 (2020).
13
A. Rambaut, E. C. Holmes, V. Hill, Á. O’Toole, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology. bioRxiv 2020.04.17.046086 [Preprint]. 19 April 2020. https://doi.org/10.1101/2020.04.17.046086
14
M. T. Maurano, S. Ramaswami, G. Westby, P. Zappile, D. Dimartino, G. Shen, X. Feng, A. M. Ribeiro-dos-Santos, N. A. Vulpescu, M. Black, M. Hogan, C. Marier, P. Meyn, Y. Zhang, J. Cadley, R. Ordonez, R. Luther, E. Huang, E. Guzman, A. Serrano, B. Belovarac, T. Gindin, A. Lytle, J. Pinnell, T. Vougiouklakis, L. Boytard, J. Chen, L. H. Lin, A. Rapkiewicz, V. Raabe, M. I. Samanovic-Golden, G. Jour, I. Osman, M. Aguero-Rosenfeld, M. J. Mulligan, P. Cotzia, M. Snuderl, A. Heguy, Sequencing identifies multiple, early introductions of SARS-CoV2 to New York City Region. medRxiv 2020.04.15.20064931 [Preprint]. 19 August 2020. doi:10.1101/2020.04.15.20064931.
15
E. Minskaia, T. Hertzig, A. E. Gorbalenya, V. Campanacci, C. Cambillau, B. Canard, J. Ziebuhr, Discovery of an RNA virus 3′→5′ exoribonuclease that is critically involved in coronavirus RNA synthesis. Proc. Natl. Acad. Sci. U.S.A. 103, 5108–5113 (2006).
16
Materials and methods are available as supplementary materials.
17
P. Lemey, A. Rambaut, T. Bedford, N. Faria, F. Bielejec, G. Baele, C. A. Russell, D. J. Smith, O. G. Pybus, D. Brockmann, M. A. Suchard, Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLOS Pathog. 10, e1003932 (2014).
18
V. N. Minin, M. A. Suchard, Fast, accurate and simulation-free stochastic mapping. Philos. Trans. R. Soc. B 363, 3985–3995 (2008).
19
H. Y. Chu, J. A. Englund, L. M. Starita, M. Famulare, E. Brandstetter, D. A. Nickerson, M. J. Rieder, A. Adler, K. Lacombe, A. E. Kim, C. Graham, J. Logue, C. R. Wolf, J. Heimonen, D. J. McCulloch, P. D. Han, T. R. Sibley, J. Lee, M. Ilcisin, K. Fay, R. Burstein, B. Martin, C. M. Lockwood, M. Thompson, B. Lutz, M. Jackson, J. P. Hughes, M. Boeckh, J. Shendure, T. Bedford; Seattle Flu Study Investigators, Early Detection of Covid-19 through a Citywide Pandemic Surveillance Platform. N. Engl. J. Med. 383, 185–187 (2020).
20
The White House, “Proclamation on Suspension of Entry as Immigrants and Nonimmigrants of Persons who Pose a Risk of Transmitting 2019 Novel Coronavirus” (2020); www.whitehouse.gov/presidential-actions/proclamation-suspension-entry-immigrants-nonimmigrants-persons-pose-risk-transmitting-2019-novel-coronavirus/.
21
S. Eder, H. Fountain, M. H. Keller, M. Xiao, A. Stevenson, “430,000 People Have Traveled From China to U.S. Since Coronavirus Surfaced.” The New York Times, 15 April 2020; www.nytimes.com/2020/04/04/us/coronavirus-china-travel-restrictions.html.
22
O. Pybus, A. Rambaut, L. du Plessis, A. E. Zarebski, M. U. G. Kraemer, J. Raghwani, B. Gutiérrez, V. Hill, J. McCrone, R. Colquhoun, B. Jackson, Á. O’Toole, J. Ashworth; COG-UK consortium, “Preliminary analysis of SARS-CoV-2 importation and establishment of UK transmission lineages.” Virological, 8 June 2020; https://virological.org/t/preliminary-analysis-of-sars-cov-2-importation-establishment-of-uk-transmission-lineages/507.
23
J. T. Ladner, B. B. Larsen, J. R. Bowers, C. M. Hepp, E. Bolyen, M. Folkerts, K. Sheridan, A. Pfeiffer, H. Yaglom, D. Lemmer, J. W. Sahl, E. A. Kaelin, R. Maqsood, N. A. Bokulich, G. Quirk, T. D. Watt, K. Komatsu, V. Waddell, E. S. Lim, J. G. Caporaso, D. M. Engelthaler, M. Worobey, P. Keim, Defining the Pandemic at the State Level: Sequence-Based Epidemiology of the SARS-CoV-2 virus by the Arizona COVID-19 Genomics Union (ACGU). medRxiv 2020.05.08.20095935 [Preprint]. 13 May 2020. doi:10.1101/2020.05.08.20095935.
24
D. Stadlbauer, J. Tan, K. Jiang, M. Hernandez, S. Fabre, F. Amanat, C. Teo, G. Asthagiri Arunkumar, M. McMahon, J. Jhang, M. Nowak, V. Simon, E. Sordillo, H. van Bakel, F. Krammer, Seroconversion of a city: Longitudinal monitoring of SARS-CoV-2 seroprevalence in New York City. medRxiv 2020.06.28.20142190 [Preprint]. 29 June 2020. doi:10.1101/2020.06.28.20142190.
25
M. Worobey, Epidemiology: Molecular mapping of Zika spread. Nature 546, 355–357 (2017).
26
Worobeylab, Worobeylab/SC2_outbreak: Release with published paper (Version v1.0), Zenodo (2020); http://doi.org/10.5281/zenodo.3979896.
27
Z. Dezső, A.-L. Barabási, Halting viruses in scale-free networks. Phys. Rev. E 65, 055103 (2002).
28
J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, J. Heijne, M. Sadkowska-Todys, M. Rosinska, W. J. Edmunds, Social contacts and mixing patterns relevant to the spread of infectious diseases. PLOS Med. 5, e74 (2008).
29
J. O. Lloyd-Smith, S. J. Schreiber, P. E. Kopp, W. M. Getz, Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).
30
S. J. Spielman, C. O. Wilke, Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies. PLOS ONE 10, e0139047 (2015).
31
B. Q. Minh, H. A. Schmidt, O. Chernomor, D. Schrempf, M. D. Woodhams, A. von Haeseler, R. Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534 (2020).
32
J. Huerta-Cepas, F. Serra, P. Bork, ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol. Biol. Evol. 33, 1635–1638 (2016).
33
K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
34
A. Rambaut, T. T. Lam, L. Max Carvalho, O. G. Pybus, Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
35
N. Moshiri, M. Ragonnet-Cronin, J. O. Wertheim, S. Mirarab, FAVITES: Simultaneous simulation of transmission networks, phylogenetic trees and sequences. Bioinformatics 35, 1852–1861 (2019).
36
M. A. Suchard, P. Lemey, G. Baele, D. L. Ayres, A. J. Drummond, A. Rambaut, Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
37
D. L. Ayres, M. P. Cummings, G. Baele, A. E. Darling, P. O. Lewis, D. L. Swofford, J. P. Huelsenbeck, P. Lemey, A. Rambaut, M. A. Suchard, BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Syst. Biol. 68, 1052–1061 (2019).
38
P. Zhou, X.-L. Yang, X.-G. Wang, B. Hu, L. Zhang, W. Zhang, H.-R. Si, Y. Zhu, B. Li, C.-L. Huang, H.-D. Chen, J. Chen, Y. Luo, H. Guo, R.-D. Jiang, M.-Q. Liu, Y. Chen, X.-R. Shen, X. Wang, X.-S. Zheng, K. Zhao, Q.-J. Chen, F. Deng, L.-L. Liu, B. Yan, F.-X. Zhan, Y.-Y. Wang, G.-F. Xiao, Z.-L. Shi, A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020).
39
F. Bielejec, P. Lemey, G. Baele, A. Rambaut, M. A. Suchard, Inferring heterogeneous evolutionary processes through time: From sequence substitution to phylogeography. Syst. Biol. 63, 493–504 (2014).
40
A. Rambaut, A. J. Drummond, D. Xie, G. Baele, M. A. Suchard, Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Information & Authors
Information
Published In

Science
Volume 370 | Issue 6516
30 October 2020
30 October 2020
Copyright
Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Article versions
You are viewing the most recent version of this article.
Submission history
Received: 18 May 2020
Accepted: 3 September 2020
Published in print: 30 October 2020
Acknowledgments
We thank the patients and healthcare workers who made the collection of this global viral dataset possible and all those who made viral genomic data available for analysis. We thank N. Moshiri for guidance on FAVITES, T. Bedford for insights into how viral genomic inferences influenced public health responses in Washington state, and L. du Plessis for insights into the timing of the origin of the WA outbreak clade on the basis of Grand Princess voyage dates. Funding: M.W. was supported by the David and Lucile Packard Foundation as well as the University of Arizona College of Science. This work was supported by the Multinational Influenza Seasonal Mortality Study (MISMS), an ongoing international collaborative effort to understand influenza epidemiology and evolution, led by the Fogarty International Center, NIH. The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement 725422-ReservoirDOCS) and from the European Union’s Horizon 2020 project MOOD (grant agreement 874850). The Artic Network receives funding from the Wellcome Trust through project 206298/Z/17/Z. J.O.W. acknowledges funding from the NIH (K01AI110181, AI135992, and AI136056). P.L. acknowledges support by the Research Foundation–Flanders (“Fonds voor Wetenschappelijk Onderzoek–Vlaanderen,” G066215N, G0D5117N, and G0B9317N). M.A.S. acknowledges support from NIH U19 AI135995. J.B.J. is thankful for support from the Canadian Institutes of Health Research Coronavirus Rapid Response Programme 440371 and Genome Canada Bioinformatics and Computational Biology Programme grant 287PHY. J.P. acknowledges funding from the NIH (T15LM011271). V.H. acknowledges funding from the Biotechnology and Biological Sciences Research Council (BBSRC) (grant BB/M010996/1). The content of this paper is solely the responsibility of the authors and does not necessarily represent official views of the NIH. We gratefully acknowledge support from NVIDIA Corporation, with the donation of parallel computing resources used for this research. Author contributions: Conceptualization: M.W. Methodology: M.W., J.P., M.A.S., P.L., and J.O.W. Software: J.P., M.A.S., P.L., and J.O.W. Validation: J.P., M.A.S., and P.L. Formal analysis: M.W., J.P., P.L., and M.A.S. Investigation: M.W., J.P., B.B.L., J.B.J., A.R., M.I.N., and V.H. Resources: M.W., P.L., and M.A.S. Data curation: B.B.L., J.B.J., and V.H. Writing – original draft: M.W. and M.I.N. Writing – review and editing: M.W., B.B.L., M.A.S., J.O.W., J.B.J., and A.R. Visualization: B.B.L., J.O.W., and A.R. Supervision: M.W. and J.O.W. Project administration: M.W. Funding acquisition: M.W., M.A.S., and J.O.W. Competing interests: J.O.W. has received funding from Gilead Sciences, LLC (completed) and the CDC (ongoing) via grants and contracts to his institution that are unrelated to this research. M.A.S. receives funding from Janssen Research & Development, IQVIA and Private Health Management via contracts unrelated to this research. Data and materials availability: All data used in this analysis are free to access: a BEAST .xml file example, FAVITES simulated phylogenies, the GISAID accession numbers for all sequences used in the analysis, and alignments are hosted at GitHub (https://github.com/Worobeylab/SC2_outbreak) and Zenodo (26). The GISAID data are also provided in table S3. This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
Authors
Funding Information
National Institutes of Health: K01AI110181
National Institutes of Health: AI135992
National Institutes of Health: AI136056
National Institutes of Health: AI135995
National Institutes of Health: T15LM011271
Canadian Institutes of Health Research Coronavirus Rapid Response Programme: 440371
Genome Canada for Bioinformatics and Computational Biology Programme 28PHY: N/A
Wellcome: 206298/Z/17/Z
H2020 European Research Council: 725422-ReservoirDOCS
H2020 European Research Council: 874850
Research Foundation -- Flanders: G066215N
Research Foundation -- Flanders: G0D5117N
Research Foundation -- Flanders: G0B9317N
Biotechnology and Biological Sciences Research Council: BB/M010996/1
Multinational Influenza Seasonal Mortality Study: N/A
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Export citation
Select the format you want to export the citation of this publication.
Cited by
- Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events, Science, 371, 6529, (2021)./doi/10.1126/science.abe3261
- Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK, Science, 371, 6530, (708-712), (2021)./doi/10.1126/science.abf2946
- Tracking the UK SARS-CoV-2 outbreak, Science, 371, 6530, (680-681), (2021)./doi/10.1126/science.abg2297
- Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington State, Science Translational Medicine, 13, 595, (2021)./doi/10.1126/scitranslmed.abf0202
- Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B.1.1.7 emergence, Science, 373, 6557, (889-895), (2021)./doi/10.1126/science.abj0113
- UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets, Computers in Biology and Medicine, 131, (104264), (2021).https://doi.org/10.1016/j.compbiomed.2021.104264
- Genomic sequencing effort for SARS-CoV-2 by country during the pandemic, International Journal of Infectious Diseases, 103, (305-307), (2021).https://doi.org/10.1016/j.ijid.2020.12.034
- Molecular evidence of SARS-CoV-2 in New York before the first pandemic wave, Nature Communications, 12, 1, (2021).https://doi.org/10.1038/s41467-021-23688-7
- The origin and early spread of SARS-CoV-2 in Europe, Proceedings of the National Academy of Sciences, 118, 9, (e2012008118), (2021).https://doi.org/10.1073/pnas.2012008118
- Review of the risk factors for SARS-CoV-2 transmission, World Journal of Clinical Cases, 9, 7, (1499-1512), (2021).https://doi.org/10.12998/wjcc.v9.i7.1499
- See more
Loading...
View Options
View options
PDF format
Download this article as a PDF file
Download PDFGet Access
Log in to view the full text
AAAS login provides access to Science for AAAS members, and access to other journals in the Science family to users who have purchased individual subscriptions.
- Become a AAAS Member
- Activate your Account
- Purchase Access to Other Journals in the Science Family
- Account Help
Log in via OpenAthens.
Log in via Shibboleth.
More options
Register for free to read this article
As a service to the community, this article is available for free. Login or register for free to read this article.
Buy a single issue of Science for just $15 USD.







Hunting for the origin of the globally dominant coronavirus type B-D614G: Next steps.
Peter Forster1,2, Christoph Becker2, Bernd Brinkmann2, Carsten Hohoff2, Lucy Forster3
1 McDonald Institute for Archaeological Research, University of Cambridge, CB2 3ER UK
2Institute for Forensic Genetics, 48161 Muenster, Germany
3Lakeside Healthcare Group at Cedar House Surgery, St Neots PE191BQ, United Kingdom
It is constructive and to be welcomed that Muehlemann and colleagues acknowledge our conclusion that symptomatic Bavarian Patient 4 had unwittingly escaped quarantine and had travelled towards the Italian border in January 2020. She carried two types of coronavirus (ancestral nucleotide 6446G and derived nucleotide 6446A). Hence, we ask of Muehlemann and colleagues that they correct the GISAID database entry for Bavarian Patient 4, which misleadingly scores position 6446 as "A" instead of "R" (A/G). Otherwise researchers may come to the mistaken conclusion that Bavarian Patient 4 cannot have led to the Italian outbreak, the Italian side having 6446G. We further request that Muehlemann and colleagues should assist the scientific community by noting, in the GISAID database, the mode of transport that Bavarian Patient 4 took from her Bavarian home to Innsbruck. With the information provided by ourselves and hopefully by Muehlemann and colleagues, Worobey et al. can proceed with designing realistic hypotheses and simulations which can address the question whether the globally dominant virus type B-D614G travelled from China via Bavaria to Italy, or directly from China to Italy.
Footnote: Most comments provided in the two responses are not useful and are ignored here. We focus on the major consequence of this correspondence, i.e. the necessary amendments of the primary data for Bavarian Patient 4.
RE: Independent marker mutations argue against the seeding of the SARS-CoV-2 outbreak in Northern Italy from Germany in early 2020
On January 27, 2020, the first known case of SARS-CoV-2 was reported in Germany in a patient working for a company in the Munich area. The index case, a person travelling from China, had visited the company for a business meeting between the 19th and 22nd of January, and tested positive for SARS-CoV-2 upon returning to China. Subsequently, a total of 16 cases associated with the German outbreak were identified. Following extensive contact tracing, isolation of confirmed cases, and quarantining of contacts, the outbreak was contained.
On February 20, 2020, the first case of SARS-CoV-2 was reported in Lombardy, Northern Italy. Subsequently, Northern Italy saw widespread transmission, and the lineage associated with the outbreak initiated outbreaks all over the world.
Genome sequencing showed that all the sequences from the Munich cases (belonging to lineage B) differed from the early sequences associated with the outbreak in Italy (lineage B.1) by at least one nucleotide, always including an identical substitution at position 14408 (Böhmer et al. 2020; Zehender et al. 2020). Mainstream media articles also reported that one of the patients from the Munich outbreak went on a skiing trip to Tyrol from 24 to 26 January, 2020 ("Frau Mit Coronavirus Im Kühtai" 2020, "Coronavirus: Erkrankte Aus Deutschland War in Tirol Zu Besuch" 2020). Together, this led to speculation that the outbreak in Northern Italy was caused by a virus introduced due to insufficient containment of the Munich outbreak (Forster et al. 2020).
Simulation and phylogeographic analyses (Worobey et al. 2020), as well as the detection of sequences belonging to lineage B in China, have already shown that the direct seeding of the Northern Italy outbreak from Munich is unlikely. Here, we highlight additional evidence against a spread of the disease from Munich to Northern Italy by drawing on two of our already published studies (Wölfel et al. 2020; Böhmer et al. 2020), particularly related to the patient who travelled to Tyrol before being diagnosed with the disease.
The sequences associated with the Munich outbreak can be divided into two groups. The first group comprises four sequences, none of which have the C → T mutation at position 14408 that was present in all early sequences from Northern Italy, including in the very first case sequenced from Lombardy. The four patients from whom those sequences were recovered either reported direct contact with the Chinese index case, with each other, or were household members. Sequences in the second group have an additional substitution, G → A at position 6446, relative to the sequences from the first group, and also do not have the C14408T mutation present in the majority of the early sequences from Northern Italy.
The epidemiological investigation suggests that the patient who travelled to Tyrol (Patient 4) was one of four cases in the Munich cluster that was infected directly from the index patient that travelled from China (Böhmer et al. 2020). Sequences generated from upper respiratory tract (oropharyngeal and nasopharyngeal swab) and sputum samples of this patient have a variability at position 6446. We found that the original nucleotide (G) is present in the sputum, while a derived nucleotide (A) is found in the upper respiratory tract (see Table 1 in (Wölfel et al. 2020). The derived nucleotide was passed on to all but one of the second generation cases in the Munich cluster that we were able to sequence (the exception is a household member of a patient with a sequence belonging to group one, mentioned above). The epidemiological investigation also suggests that Patient 4 transmitted the virus to at least one second-generation case before the skiing trip (Böhmer et al. 2020). Since all second-generation cases had the G6446A substitution, it is unlikely that Patient 4 could have passed on a virus lacking this mutation during the skiing trip, with the G6446A mutation only occurring and being transmitted to the other patients in the Munich cluster afterwards.
In summary, two independent lines of evidence support the conclusion that the outbreak in Northern Italy was not due to a chain of transmission from Patient 4: 1) a mutation found in all early Italian sequences (C14408T) was not present in any patient sequenced in Munich, and 2) a mutation that had already occurred and been transmitted within the Munich cluster (G6446A) prior to a skiing trip to Tyrol by one Munich patient was not found thereafter in the Italian population.
References
Böhmer, Merle M., Udo Buchholz, Victor M. Corman, Martin Hoch, Katharina Katz, Durdica V. Marosevic, Stefanie Böhm, et al. 2020. "Investigation of a COVID-19 Outbreak in Germany Resulting from a Single Travel-Associated Primary Case: A Case Series." The Lancet Infectious Diseases. https://doi.org/10.1016/s1473-3099(20)30314-5.
"Coronavirus: Erkrankte Aus Deutschland War in Tirol Zu Besuch." 2020. Tiroler Tageszeitung. January 30, 2020. https://www.tt.com/artikel/30714426/coronavirus-erkrankte-aus-deutschlan....
Forster, Peter, Lucy Forster, Colin Renfrew, and Michael Forster. 2020. "Phylogenetic Network Analysis of SARS-CoV-2 Genomes." Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2004999117.
"Frau Mit Coronavirus Im Kühtai." 2020. Red, tirol.ORF.at. January 31, 2020. https://tirol.orf.at/stories/3032480.
Wölfel, Roman, Victor M. Corman, Wolfgang Guggemos, Michael Seilmaier, Sabine Zange, Marcel A. Müller, Daniela Niemeyer, et al. 2020. "Virological Assessment of Hospitalized Patients with COVID-2019." Nature 581 (7809): 465–69.
Worobey, Michael, Jonathan Pekar, Brendan B. Larsen, Martha I. Nelson, Verity Hill, Jeffrey B. Joy, Andrew Rambaut, Marc A. Suchard, Joel O. Wertheim, and Philippe Lemey. 2020. "The Emergence of SARS-CoV-2 in Europe and North America." Science, September. https://doi.org/10.1126/science.abc8169.
Zehender, Gianguglielmo, Alessia Lai, Annalisa Bergna, Luca Meroni, Agostino Riva, Claudia Balotta, Maciej Tarkowski, et al. 2020. "Genomic Characterization and Phylogenetic Analysis of SARS-COV-2 in Italy." Journal of Medical Virology, March. https://doi.org/10.1002/jmv.25794.
Viral genomic epidemiologic evidence indicates that Germany was not the source of the Italian SARS-CoV-2 outbreak
We agree with Forster et al. (1) that it is vitally important to study how the lineage of SARS-CoV-2 that ignited the outbreak in Northern Italy originated, both to understand the effectiveness of Germany's early outbreak response and to understand the origins of viruses with the D614G substitution that became globally dominant. However, Forster et al. have misunderstood both our methods (2) and SARS-CoV-2 genomic evolution in general (3). Their conjecture that the Italian outbreak was seeded by a patient from a cluster of cases established in January 2020 near Munich, Germany, remains strongly refuted by the available evidence. Critically, Forster et al. fail to account for the enormous spatiotemporal sampling bias of genome sequences, a recurring problem in SARS-CoV-2 phylogenetic analysis that our paper explicitly sought to redress.
Forster et al.'s letter will leave readers with the misleading impression that Kühtai, Austria, where one of the Munich cluster cases travelled, is geographically close to the town of Cordogno, where the earliest documented symptomatic case in Northern Italy occurred. In fact, Cordogno is 420 km by the fastest road, or a ~5-hour drive, from Kühtai, not so different from the ~6-hour drive from Cordogno to Munich.
Moreover, we are perplexed by Forster et al.'s assertion that our simulations (2) did not account for the possibility of travel linking the Italian outbreak to the German one. We were aware that unrestricted travel between Germany and Italy was occurring during January and February of 2020, and a major point of our simulation study was to explicitly consider that possibility: "We simulated the Northern Italy outbreak under the hypothetical constraint that it was initiated by a virus imported from the German outbreak" (2). Our results showed that, if the Italian outbreak had been established by a virus identical to that sampled from the first German patient in the Munich cluster (Patient 1) (4, 5), then it would be exceedingly unlikely to observe the particular genomic variants that actually circulated in Italy and beyond (all of which contain a distinguishing C14408T mutation, absent in the Munich cluster) without also seeing additional lineages derived from a Patient 1-like virus (P = 0.004; see ref. 2). Nothing proposed by Forster et al. challenges that finding.
To augment our earlier findings, we took the opportunity of this rebuttal to retrieve all Italian B.1 D614G genomes sampled prior to April 2020, many of which have recently become available in the GISAID repository. Of the 546 genomes now available, many are from Lombardy (n = 387), the epicenter in Italy. The data show none of the patterns expected under the 'German-origin' scenario: no genomes identical to Patient 1 in the Munich cluster, no genomes one mutation away (and different from the Italian C14408T variant), and no polytomy lineages—lineages that descended from the Munich cluster but took a different path than the C14408T-containing viruses; such genomes are expected to be observed if the Italian outbreak really did emerge from Germany rather than China (2). Moreover, not a single SARS-CoV-2 genome out of >40,000 sampled throughout Europe after the Munich outbreak—including ones from Munich or elsewhere in Germany, or from Austria, where Forster et al. speculate cryptic transmission might have become established by Patient 4—bears the genomic fingerprint of having descended from the Munich cluster.
Regarding the claim that the "ancestral type" in the Italian genomes is found in Sichuan and Bavaria (i.e. the Munich area) but not Hubei Province (1), genomes identical to Patient 1 in the Munich cluster have indeed been sampled in Sichuan (as well as in Guangdong) and these were included in our reconstructions (2). But that does not mean that the virus was directly transmitted from either of these locations to Germany or when the ancestor of the B.1 lineage independently dispersed from China to Italy. It is more likely that they were all independently introduced from Hubei, the clear epicenter of the pandemic. In fact, it is well established that the patient who seeded the Munich cluster travelled from Shanghai but had been in contact with her parents who had been visiting from Wuhan just before her trip to Germany (4). This is the travel history information that was included in our reconstructions (2). It is odd that Forster et al. would make this assertion since they themselves note that the Munich cluster was established by a patient "with links to...Wuhan" (i.e. Hubei Province) (1).
Finally, with respect to Forster et al.'s comment that the Italian government had issued a flight ban between China and Italy from Jan 31 onwards, and the suggestion that this is somehow at odds with our conclusions, our estimate for the introduction of SARS-CoV-2 from China to Italy is January 28th, 2020 (95% highest posterior density interval: January 20th–February 6th) (2). So, not only is the credible interval compatible with an introduction prior to that travel ban, even the posterior mean estimate of the introduction date falls before those travel restrictions were put into effect.
If the Italian outbreak had originated in the fashion advocated by Forster et al. it would be important to know this and take what lessons the case would provide. The confusion or obfuscation on the part of Forster et al., however, threatens to muddy the waters our study helped clear: the available evidence indicates that the efforts to contain that early outbreak in Germany—rapid testing, contact tracing, case isolation, and quarantine—worked, and that the Italian outbreak, and all that followed from it, originated from an independent dispersal of the virus from China, not from Germany.
References
1. P. Forster, C. Becker, B. Brinkmann, C. Hohoff, L. Forster. Two views on the origin of the current SARS-CoV-2 strain. Science eLetter (2020).
JOW has received funding from Gilead Sciences, LLC (completed) and the CDC (ongoing) via grants and contracts to his institution unrelated to this research. MAS receives funding from Janssen Research & Development, IQVIA and Private Health Management via contracts unrelated to this research.2. M. Worobey, et al. The emergence of SARS-CoV-2 in Europe and North America. Science https://doi.org/10.1126/science.abc8169 (2020).
3. C. Mavian, et al. Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-CoV-2 infection unreliable. Proc. Natl. Acad. Sci. U. S. A. 117, 12522-12523 (2020).
4. M. M. Böhmer, et al. Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: a case series. Lancet Infect. Dis. doi:10.1016/S1473-3099(20)30314-5 (2020).
5. R. Wölfel, et al. Virological Assessment of Hospitalized Patients with COVID-2019." Nature 581, 465–469 (2020).
RE: Two views on the origin of the current SARS-CoV-2 strain
Two views on the origin of the current SARS-CoV-2 strain
Peter Forster1,2, Christoph Becker2, Bernd Brinkmann2, Carsten Hohoff2, Lucy Forster3
1 McDonald Institute for Archaeological Research, University of Cambridge, CB2 3ER UK
2Institute for Forensic Genetics, 48161 Muenster, Germany
3Lakeside Healthcare Group at Cedar House Surgery, St Neots PE191BQ, United Kingdom;
In the course of March and April 2020, a new SARS-CoV-2 genomic variant successfully spread worldwide, becoming the dominant viral subtype. We had distinguished ancestral A-types from derived B types and had classified this emergent type as a B-subtype in early April 2020 (Forster et al. 2020). We had suggested that such subtypes are the result of founder effects or of evolutionary adaptation. Korber et al. (2020) have since provided experimental data to support the latter suggestion. Merging our own nomenclature with that of Korber et al (2020), we will call this globally dominant subtype "B-D614G" for the purpose of this note.
Naturally it is of interest to understand where this globally dominant subtype arose. There is general agreement that B-D614G was prominent in the outbreak in Lombardy, northern Italy, with the first recorded indigenous Italian patient being a 38-year-old man from Codogno who reported symptoms starting on 2020 Feb 14 (Visetti 2020). The virus variant has since spread from Italy across Europe, North America, and the rest of the world. Our phylogenetic analysis (Forster et al. 2020) and initially that of others (reviewed by Kupferschmidt 2020) suggested a link to the Bavarian Patient 1, i.e. the first known German patient who fell ill on 2020 Jan 24 following a visit from a Chinese colleague with links to Shanghai and Wuhan (Boehmer 2020, Rothe et al. 2020, Wölfel et al. 2020) The possibility that the German outbreak penetrated to Italy was debated in Science in March 2020 (Kupferschmidt 2020). The alternative hypothesis discussed was that B-D614G was a direct introduction from China to Italy.
Now, Worobey et al. (2020) have used computer simulations to conclude that the German outbreak had been contained and that a hypothetical Chinese carrier from Hubei had by coincidence reintroduced the same viral genomic type from China to Italy between 2020 Jan 20 and Feb 7. However, we feel their conclusions may need reassessment in the light of the case history. First, the ancestral type in their data is found in Sichuan and Bavaria, but not in Hubei. Second, the Italian government had issued a flight ban between China and Italy from Jan 31 onwards, still in force at the time the first known Italian patient became ill on Feb 14. Third, it is far from certain that the German outbreak was contained by quarantine measures: it is not widely known that one of the first four German patients had travelled to Tyrol, Austria, and she stayed there in the small ski resort of Kuehtai Jan 24-26 in close contact with a group of 23 people according to Austrian state and private media (ORF Tirol 2020, Tiroler Tageszeitung 2020). We, the authors PF and LF, are keen skiers and happen to be familiar with Kuehtai. More specifically, the group stayed, with other residents, in the hotel "Dortmunder Huette", which is served by an enclosed cabin lift (the Kaiserbahn) seating eight people per cabin. The resort is situated at 2000 metres' altitude in the valley and exceeds 2500 metres on the upper slopes, with a correspondingly cold climate. At weekends it is a popular ski-ing destination for local day trippers from Innsbruck, situated near the major Brenner Pass to Italy. Neither the 23 companions nor the other residents of the hotel were quarantined or extensively traced, in line with official policy at a time when asymptomatic carriers were not considered a significant risk (Tiroler Tageszeitung 2020). The resort is only 20 miles (30km), as the crow flies, from the North Italian border, and 150 miles (250km) from Codogno, where the first known indigenous Italian patient developed symptoms 19 days later.
Clearly the theoretical simulations by Worobey and colleagues did not focus on this specific history. This may be because Austrian media reports dating from 2020 Jan 30 have not, to our knowledge, been taken up by the English-speaking media or by the scientific literature.
In conclusion, today's globally dominant virus strain B-D614G was in a symptomatic and unquarantined patient within a few miles of the Brenner Pass, one of the busiest gateways to northern Italy, in the period Jan 24-26, and in a genomic type that was immediately ancestral to the viral genome of the first known infected indigenous Italian across the open border, who fell ill 19 days later (Forster et al. 2020). No reproach can or should be made of the persons involved, as the alarm was not raised until Jan 27, and as there was no scientific consensus on asymptomatic transmission.
References
M. M. Böhmer, U. Buchholz, V. M. Corman, M. Hoch, K. Katz, D. V. Marosevic, S. Böhm, T. Woudenberg, N. Ackermann, R. Konrad, U. Eberle, B. Treis, A. Dangel, K. Bengs, V. Fingerle, A. Berger, S. Hörmansdorfer, S. Ippisch, B. Wicklein, A. Grahl, K. Pörtner, N. Muller, N. Zeitlmann, T. S. Boender, W. Cai, A. Reich, M. An der Heiden, U. Rexroth, O. Hamouda, J. Schneider, T. Veith, B. Mühlemann, R. Wölfel, M. Antwerpen, M. Walter, U. Protzer, B. Liebl, W. Haas, A. Sing, C. Drosten, A. Zapf, Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: A case series. Lancet Infect. Dis.20, 920–928 (2020). doi:10.1016/S1473-3099(20)30314-5
P. Forster, L. Forster, C. Renfrew, M. Forster, Phylogenetic network analysis of SARS-CoV-2 genomes. Proc. Natl. Acad. Sci. U.S.A.117, 9241–9243(2020). doi:10.1073/pnas.2004999117
B. Korber, W.M. Fischer, S. Gnanakaran, H. Yoon, J. Theiler, W. Abfalterer, N. Hengartner, E. E. Giorgi, T. Bhattacharya, B. Foley, K. M. Hastie, M. D. Parker, D. G. Partridge, C. M. Evans, T. M. Freeman,T. I. de Silva; Sheffield COVID-19 Genomics Group, C. McDanal, L. G. Perez, H. Tang, A. Moon-Walker, S. P. Whelan, C. C. LaBranche, E. O. Saphire, D. C. Montefiori (2020) Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell. 182:812-827.e19(2020). doi: 10.1016/j.cell.2020.06.043
K. Kupferschmidt (2020) Mutations can reveal how the coronavirus moves – but they're easy to overinterpret, Science, doi 10.1126/science.abb6526
ORF Tirol (2020) Frau mit Coronavirus im Kühtai. [Woman with coronavirus in the Kuhtai] ORF Tirol, 30 January 2020. https://tirol.orf.at/stories/3032480
C. Rothe, M. Schunk, P. Sothmann, G. Bretzel, G. Froeschl, C. Wallrauch, T. Zimmer, V. Thiel, C. Janke, W. Guggemos, M. Seilmaier, C. Drosten, P. Vollmar, K. Zwirglmaier, S. Zange, R. Wölfel, M. Hoelscher, Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany. N Engl J Med. 382:970-971(2020). doi: 10.1056/NEJMc2001468.
Tiroler Tageszeitung 2020 Jan 30. Coronavirus: Erkrankte Deutsche war in Tirol zu Besuch [Coronavirus: Ill German was in Tyrolia on a visit.] https://www.tt.com/artikel/30714426/coronavirus-erkrankte-aus-deutschlan...
G. Visetti, Codogno, i medici dell'ospedale in trincea "Quelle accuse del premier fanno piu male della malattia" [Codogno, hospital doctors in the trenches: "Those accusations by the prime minister do more harm than the disease"] La Repubblica 2020 Feb 26.
R. Wölfel, V. M. Corman, W. Guggemos, M. Seilmaier, S. Zange, M.A. Müller, D. Niemeyer, T.C. Jones, P. Vollmar, C. Rothe, M. Hoelscher, T. Bleicker, S. Brünink, J. Schneider, R. Ehmann, K. Zwirglmaier, C. Drosten, C. Wendtner, Virological assessment of hospitalized patients with COVID-2019. Nature, 581:465-469(2020). doi: 10.1038/s41586-020-2196-x.
M. Worobey, J. Pekar, B. B. Larsen, M. I. Nelson, V. Hill, J. B. Joy, A. Rambaut, M. A. Suchard, J. O. Wertheim, P. Lemey, The emergence of SARS-CoV-2 in Europe and North America, Science, 10.1126/science.abc8169 (2020)