Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study

This study estimated the incubation distribution of COVID-19 using the theory of renewal process.


INTRODUCTION
The Center for Disease Control and Prevention (CDC) of China and World Health Organization are closely monitoring the current outbreak of coronavirus disease 2019 .As of 22 February 2020, the National Health Commission of China had confirmed a total of 76,936 cases of COVID-19 in mainland China, including 2442 fatalities and 22,888 recoveries (1).Various containment measures, including travel restrictions, isolation, and quarantine have been implemented in China with the aim of minimizing virus transmission via human-to-human contact (2).Quarantine of individuals with exposure to infectious pathogens has always been an effective approach for containing contagious diseases in the past.One of the critical factors to determine the optimal quarantine of presymptomatic individuals is a good understanding of the incubation period, and this has been lacking for COVID-19.
The incubation period of an infectious disease is the time elapsed between infection and appearance of the first symptoms and signs of disease.Precise knowledge of the incubation period would help to provide an optimal length of quarantine period for disease control purpose and also is essential in the investigation of the mechanism of transmission and development of treatment.For example, the distribution of the incubation period is used to estimate the reproductive number R, that is, the average number of secondary infections produced by a primary case.The reproductive number is a key quantity that affects the potential size of an epidemic.Despite the impor tance of the incubation period, it is often poorly estimated on the basis of limited data.
To the best of our knowledge, there is only a handful of studies estimating the incubation period of COVID-19.Among them are  (6), Linton et al. (7), and Lauer et al. (8).The estimates of the incubation period from these five studies, together with other results of two other coronavirus disease, severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS), are listed in Table 1.In Li et al. (3), the first 425 lab-confirmed cases, reported as of 22 January 2020, were included in the study, but the exact dates of exposure could be identified in only 10 of these cases.The distribution of the incubation period was subsequently approximated by fitting a lognormal distribution to these 10 data points, resulting in a mean incubation period of 5.2 days [95% confidence interval (CI): 4.1 to 7.0], and the 95th percentile is 12.5 days.Similarly, in Zhang et al. (4), 49 cases with no travel history who were identified by prospective contact tracing were used to estimate incubation period by fitting a lognormal distribution, resulting in a mean incubation period of 5.2 days (1.8 to 12.4).However, given the limited sample size, it is challenging to make a solid inference on the distribution of the incubation period.A different result was reported by Guan et al. (5), based on 291 patients who had clear information regarding the specific date of exposure as of 29 January 2020, stating that the median incubation period was 4.0 days (interquartile range, 2 to 7).However, this study of the incubation period can be highly influenced by the individuals' recall bias or interviewers' judgment on the possible dates of exposure rather than the actual dates of exposure that, in turn, might not be accurately monitored and determined, thus leading to a high percentage of error.In Backer et al. (6), 88 confirmed cases detected outside Wuhan were used to estimate the distribution of the incubation period.For each selected case, a right-censored observation of the incubation period can be obtained by travel history and symptoms onset.The distribution of the incubation period can then be estimated by fitting a Weibull, Gamma, or lognormal distribution with censored data.However, this method contained two types of sampling biases: (i) With the longer incubation period, the patients who resided at Wuhan but developed symptoms outside Wuhan were easier to be observed (i.e., a patient with a shorter incubation period would develop symptoms before the planned trip and possibly cancel the trip; hence, such case would not be observed) and, therefore, lead to an overestimation; (ii) if the follow-up time (from infection to the end of the study) is short, then only the shorter incubation period would be observed and hence lead to an underestimation (i.e., assume information of confirmed cases from days 1 to 10 was collected, two patients, A and B, both got infected on the day 5, patient A had an incubation period of 2 days while patient B had an incubation period of 8 days, then only patient A with the shorter incubation period would be included in the data, patient B with the longer incubation period would develop symptoms after day 10 hence would not be included in the data).Linton et al. (7) proposed a similar approach to the study of Backer et al. (6) with a larger sample size of 152 but, in addition, corrected the second sampling bias aforementioned.However, the first problem in regard to the sampling bias is still an unsolved issue.In Lauer et al. (8), a pooled data with sample size of 181 were used to estimate the incubation period.All collected cases in the pooled data had identifiable exposure and symptom onset windows available, of which 161 had a known recent history of travel to or residence in Wuhan, which was the same kind of data collected in Backer et al. (6) and Linton et al. (7); others had evidence of contact with travelers from Hubei or persons with known infection.A similar approach to Backer et al. (6) was used, and the aforementioned two issues in regard to sampling bias remain unsolved.Lauer et al. (8) reported that 2.5% of patients developed symptoms after 11.5 days and claimed that it was highly unlikely that further symptomatic infections would be undetected after 14 days, while the same coauthors reported 5% of patients have symptoms onset after 14 days in the study of Bi et al. (9).
To overcome the aforementioned problems, we propose a novel method to estimate the incubation period of COVID-19 by using the well-known renewal theory in probability (10).Such a method enhances the accuracy of estimation by reducing recall bias and using abundance of the readily available forward time with a large sample size of 1084.To the best of our knowledge, our study of the distribution of the incubation period involves the largest number of samples to date.We find that the estimated median of the incubation period is 7.76 days (95% CI: 7.02 to 8.53), mean is 8.29 days (95% CI: 7.67 to 8.9), the 90th percentile is 14.28 days (95% CI: 13.64 to 14.90), and the 99th percentile is 20.31 days (95% CI: 19.15 to 21.47).Furthermore, by including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated tail probability that incubation period is longer than 14 days is between 5 and 10%.It is difficult to estimate the proportion of incubation beyond 14 days in general if the sample size is small.Because our sample size is much larger than that of other studies published to date, we have confidence in the robustness of our findings.Our estimated incubation period of COVID-19 is longer than those given by previous researches on SARS, MERS, and COVID-19 in Table 1.

Motivations
As described in the previous section, the distribution of the incubation period in most of the literature is either described through a parametric model or its empirical distribution based on the observed incubation period from the contact tracing data.However, the contact tracing data are challenging and expensive to obtain, and their accuracy can be highly influenced by recall bias.Hence, a low-cost and high-accuracy method to estimate the incubation distribution is needed.In this study, we make use of confirmed cases detected outside Wuhan with known histories of travel or residency in Wuhan to estimate the distribution of incubation times.The renewal theory is implemented by treating an incubation period of a prevalence case as a renewal process.See more details of the renewal process and corresponding assumptions in section S1.The date of symptoms onset in these data refers to the date reported by the patient on which the clinical symptoms first appeared, where the clinical symptoms include fever, cough, nausea, vomiting, diarrhea, and others.Among 12,963 confirmed cases, 6345 cases had their dates of symptom onset collected, 3169 cases had histories of travel or residency in Wuhan, 2514 cases had their dates of departure recorded, and 1922 cases had records of both dates of departure from Wuhan and dates of symptoms onset.However, not all 1922 cases should be taken in the analysis.After examining the collected data, there were a total of 1084 cases that meet the criteria described in section S2 and were followed forwardly.
Figure 1 shows the design of the cross-sectional and forward follow-up study.The dot on the left end of each segment is a date of infection, while the square on the right end is a date of symptoms onset.The date of departure from Wuhan cuts the line segment in between.Note that only solid lines were followed in our cohort, dashed lines are not followed in the cohort because the date of departure from Wuhan is not between 19 January 2020 and 23 January 2020.
Among the 1084 cases with gender information in the study, 468 (43.30%) are female.The mean age of patients was 41.31, and the median age was 40.More than 80% of the cases were between 20 and 60.The youngest confirmed case in our cohort was 6 months old, while the oldest was 86 years old.Table 2 shows the demographic characteristics of patients with COVID-19 in the Wuhan departure cohort and the entire data collected as of 15 February 2020.Although there are slight differences between the selected cases and all cases, we explored the correlation between forward time and age instead and found that the correlation between forward time and age was −0.0309.Hence, there is no evidence that the incubation time depends on age in this dataset, and the observed forward times should be able to represent that of in the general population.
More demographic characteristics of patients are summarized in section S2.

Estimation of incubation period distribution of COVID-19
Let Y be the incubation period of an infected case with probability density function f (y) where y > 0. Let A be the duration from infection in Wuhan to the departure of Wuhan, which can be considered as the backward time in a renewal process.Let V denote the duration between the departure from Wuhan and the onset of symptoms, which can be considered as the forward time in a renewal process.Then, V has the density as follows where ¯ F (• ) is the survival function corresponding to f( • ), and  = ∫ 0 ∞ yf(y ) dy is the mean incubation period.Note that A and V have the same density marginally, and the aforementioned sampling bias can be corrected by using Eq. 1. See more technical details in section S3.
In our cohort of COVID-19 cases, we assume that the incubation period is a Weibull random variable; the estimates in the Weibull model can be obtained by maximizing the corresponding likelihood function.The mean and percentiles of the incubation period can be calculated from the parametric Weibull distribution.The CIs in this study are obtained using bootstrap method with B = 1000 resamples.Note that Gamma distribution and lognormal distribution are also fitted for the incubation, both provide similar estimates of quantiles compared with Weibull.

Sensitivity analysis
It is arguable that people who left Wuhan might also be infected on the day of departure since they had a higher chance to be exposed to this highly contagious, human-to-human-transmitted virus in a crowded environment, as cases were increasing.In this case, the duration between departure from Wuhan and onset of symptoms is no longer only the forward time but a mixture of the incubation period and the forward time.Unfortunately, it is unclear who got infected before departure and who got infected at the event of departure.Hence, a mixture sensitivity forward time model is proposed, that is If  ≠ 1, then it is possible to identify all underlying parameters.We explore the sensitivity of estimates of incubation period by assuming a range of , that is,  = 0,0.05,0.1, and 0.2 and estimate  and  by maximizing the product of likelihoods, ∏ i=1 I h( v i ) , with respect to  and , where v i is the observed forward time of the ith individual and I is the sample size of the studying cohort.

RESULTS
By fitting the observed forward times v i of the 1084 cases in our cohort to the likelihood function (Eq.2), we find that  = 0 gives the largest log likelihood; hence, we set  = 0 as the reference scenario.The maximum likelihood estimates are ˆ  = 1 • 97 (95% CI: 1.75 to 2.28) and ˆ  = 0.11 (95% CI: 0.10 to 0.12) in our reference scenario.The estimated 5th, 25th, 50th, 75th, 90th, 95th, 99th, and 99.9th percentiles of the incubation period are 2.07 (95% CI: 1.60 to 2.69), 4.97 (95% CI: 4.25 to 5.78), 7.76 (95% CI: 7.02 to 8.53), 11.04 (95% CI: 10.34 to 11.66), 14.28 (95% CI: 13.64 to 14.90), 16.32 (95% CI: 15.62 to 17.04), 20.31 (95% CI: 19.15 to 21.47), and 24.95 (95% CI: 23.04 to 26.81) days, respectively.The mean incubation period is 8.29 (95% CI: 7.67 to 8.90) days.Estimates based on Gamma distribution and lognormal distribution provide very similar results, where the 50th percentile is 8.16 and 8.42, respectively, the 90th percentile is 14.23 and 14.11, respectively, and the log likelihoods are −2843.34and −2845.57,which are slightly smaller compared with the Weibull distribution.The average time from leaving Wuhan to the symptom onset is 5.30 days, the sample median is 5 days, and the maximum is 22 days.Figure 2 visualizes the fitted density function in Eq. 2 in a solid line onto the histogram of observed forward times, and the dashed line is the Weibull probability density function for incubation period distribution.Note that Eq. 2 fits the observed forward times well, suggesting that our model is reasonable and the results are therefore trustworthy.
Table 3 summarizes the estimates of the parameters and the mean and percentiles of the incubation period.We can see that the estimates for mean and percentiles decrease as the proportion of people who got infected at the event of departure, , increases.However, variation of the results from  = 0 to 0.2 is only about 1 day, which we believe is still in an acceptable range.

DISCUSSION
A sound estimate of the distribution of the incubation period plays a vital role in epidemiology.Its application includes decisions regarding the length of quarantine for prevention and control, dynamic models that accurately predict the disease process, and determining the contaminated source in foodborne outbreaks.Here, we propose a novel method to estimate the incubation distribution that only requires information on travel histories and dates of symptoms onset.This method enhances the accuracy of estimation by reducing recall bias and using abundance of the readily available forward time data.To the best of our knowledge, this study of incubation period involves the largest number of samples to date.In addition, this is the first article to consider the incubation period for COVID-19 as a renewal process, which is a well-studied methodology and has a solid theoretical foundation.The estimated incubation period has a median of 7.76 days (95% CI: 7.02 to 8.53) and a mean of 8.29 days (95% CI: 7.67 to 8.90), the 90th percentile is 14.28 days (95% CI: 13.64 to 14.90), and the 99th percentile is 20.31 days (95% CI: 19.15 to 21.47).By including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated tail probability that incubation period is longer than 14 days is between 5 and 10%.Compared with the results published in Li et al.
(3), Guan et al. (5), Backer et al. (6), and Linton et al. (7), the incubation period estimated in our study is notably longer.Below is some evidence that may potentially support our findings of the long incubation period: 1) In the study of Guan et al. (5) on behalf of the China Medical Treatment Expert Group for COVID-19, the incubation period had a reported median of 4 days, the first quartile of 2 days, and the third quartile of 7 days.By fitting a commonly used Weibull distribution to these quartiles, we can obtain ˆ  = 1.24 and ˆ  = 0.186 defined in Eq. 2. As a consequence, the estimated 90, 95, and 99% percentiles are, respectively, 10.54, 13.04, and 18.45 days, which indicates that some patients may have extended incubation periods.In addition, in the commentary published in NEJMqianyan by the authors Guan et al. (5), it was reported that the incubation period of one patient in each of the severe and nonsevere groups was up to 24 days, 13 cases (12.7%) with an incubation period greater than 14 days and 8 cases (7.3%) with an incubation period greater than 18 days, which were close to what have found in our study (11).2) One particular case reported by Yibin municipal health commissions in China stated that a 64-year-old female was diagnosed with COVID-19 on 11 February 2020 at Yibin, Sichuan Province 20 days after returning from Wuhan.This patient was under self-quarantine at home with the family for 18 days, from January 23 to February 9. On February 8, the patient developed mild symptoms of cough with sputum production (12).
3) It was reported in Bai et al. (13) that the incubation period for patient 1 was 19 days.However, the claimed 19-day incubation was the time difference between departure from Wuhan and symptoms onset, namely, the forward time in our study.The actual incubation period should be longer than 19 days.
On the basis of the estimated incubation distribution in this study, about 10% of patients with COVID-19 would develop symptoms after 14 days of infection.This may be a public health concern in regard to the current 14-day quarantine period.Our approach does require that certain assumptions are to be met, which we detail below.
1) The collection of forward time depends on the follow-up time, that is, if the follow-up time is not long enough, then we would only be able to include those with a shorter incubation period in the Wuhan departure cohort.This limitation may lead to an underestimation of the incubation period.The same limitation also applies to Backer et al. (6) and Linton et al. (7).However, as explained earlier, we only included cases who left Wuhan before January 23 in this study, which leaves an average follow-up time of 25 days.Hence, it is less likely that we missed those patients with longer incubation periods based on the largest incubation period of 24 days, as reported in Guan et al. (5).Note that the 24-day incubation period was reported as an outlier in Guan et al. (5).
2) We assume that the individuals included in our cohort were either infected in Wuhan or on the way to their destination from Wuhan, and violation of this assumption leads to an overestimation of incubation period.The same limitation also applies to Backer et al. (6), Linton et al. (7), and Lauer et al. (8).However, with a carefully selected cohort justified in Methods, the chance for an individual in the Wuhan departure cohort getting infected outside Wuhan should be relatively small.Nonetheless, we acknowledge that this possibility exists, for example, a family member could be uninfected by the time of departing Wuhan but got infected by other family members or outside contacts after leaving Wuhan.A sensitivity analysis was also conducted by removing all cases who left Wuhan with their families in the Wuhan departure cohort, and we found that it only resulted in a small change of the estimated distribution of the incubation period.
3) Individuals in our selected cohort were those who got infected in the early days of the outbreak.They were likely the first-or second-generation cases.Our results do not apply to higher generation cases if the virus mutates.

Fig. 1 .
Fig. 1.Illustration of our cross-sectional and forward follow-up study.Backward and incubation periods are not observed, while Wuhan departure and forward time are observed.

Table 1 . Estimates for the incubation periods of SARS, MERS, and COVID-19.
NA, not available.
Data collection and justificationPublicly available data were retrieved from provincial and municipal health commissions in China and the ministries of health in other countries, including 12,963 confirmed cases outside Hubei Province as of 15 February 2020.Detailed information on confirmed cases includes region, gender, age, date of symptom onset, date of confirmation, history of travel or residency in Wuhan, and date of departure from Wuhan.

Table 2 . Comparison between the demographic characteristics of patients with COVID-19 in the studying cohort and all publicly available cases collected as of 15 February 2020.
Fig. 2. Histogram and estimated probability density functions for the time from Wuhan departure to symptoms onset, i.e., forward time.5 of 7