Inference of SARS-CoV-2 infections with RmMAP
We aim to estimate the number of infected individuals over time
given a series of observed COVID-19–attributed deaths
and a known onset-to-death distribution T
. We use a Poisson deconvolution model for deaths given I
is the probability that the onset-to-death time equals s
days. Estimates of I
maximizing Eq. 1
can be obtained with an expectation maximization algorithm (6
), but the outcome is typically unstable (36
). RmMAP overcomes this issue by adding a quadratic penalty to the log-likelihood. The iterations of RmMAP write as
By scaling the final series by the inverse IFR, we obtain the inferred values of infected individuals over time. A detailed discussion of this method along with sensitivity analysis and comparison with existing methodology are presented in the in the supplementary materials.
Estimation of excess deaths
We used Gaussian processes (GP) regression (17
) to estimates excess deaths for 2020. GPs can be understood as an infinite dimensional Bayesian regression: In the finite dimensional case, one fits
are Gaussian independent identically distributed errors,
are covariates, and
are coefficients sampled from a prior
. Likewise, with GPs we fit
is a function sampled from a prior over function
. GPs are appealing because the level of complexity is automatically adjusted by the complexity of data and because they are computationally tractable.
Priors over f are specified through a kernel K, which encodes the correlational structure of data so that is simply the “prior” covariance between and . K depends on a finite number of unknowns θ (so ) that have to be inferred as well.
We used a GP to account for both long-term trends in mortality as well as seasonality. As in (17
), we consider kernels of the form
is an exponential kernel representing the long-term variation and is given by
is a periodic times exponential kernel representing seasonal variation
We considered an additional source of unstructured randomness through the term . We performed Bayesian inference (Markov chain Monte Carlo) over the joint distribution parameters and death counts for each time period of the 2020 year, based on 2000–2019 all-cause mortality data and suitable priors for the parameters. In the supplementary materials, we comment on more specific aspects and provide an extensive evaluation of our model.
We deployed a hierarchical Bayesian joint model for reporting rates (and, hence, IFR) per age group (a
taking values 0 to 40, 40 to 60, 60 to 80, and 80 +) and municipality m
, collapsing over the temporal dimension. We infer the number of infected individuals (and, hence, IFR) based on reported cases C
, positivity rates over time (t
, month), municipality, and COVID-19–attributed deaths D
. The main appeal of this framework is that although most of the components are not identifiable (e.g., if reporting rates and true cases are both unknown, the same observed case counts can be achieved by multiplying both by the same factor) (37
), we can borrow from better-known quantities (e.g., rough estimates of prevalence, reporting, etc.) to enhance identification while propagating the appropriate levels of uncertainty over the parameters.
Specifically, the reporting rate links to the observed positivity rates (in log-scale) through a logistic-linear relation (with parameters β), and we have included random effects to represent unobserved causes of reporting:
Total infections by municipality and age are a fraction of the total population , that is
An implicit assumption in Eq. 8
is the existence of an underlying municipality-specific proportion infected
so that on each age group, the number of infected people is (on average)
. We also assumed the following relation for
represents a baseline of the proportion infected and
is a municipality-specific random effect.
We use parameters to represent the temporal spread of infections; so that . Infections, cases, attributed deaths, and age-stratified population sizes are linked through a cascade of binomial models. We relate infections, cases, and reporting rates through
Infection fatality rates
relate to infections and deaths through another binomial model
where the IFRs follow a stratified logistic-linear relation with SES and age mediated by parameters α, η, δ:
A comprehensive explanation of this hierarchical Bayesian methodology, including a discussion of its assumptions and several sensitivity analyses, appear in the supplementary materials.