Model and reality

Effective reproduction number Reff

The effective reproduction number Reff is a fundamental epidemiological quantity. It describes the spread of an infectious disease by answering the question: How many people does an infected person infect on average? If Reff is bigger than , the number of new infections is growing quickly, if Reff is smaller than , it is decreasing. For example, if Reff = 2, the number of new infections doubles in every disease cycle and therefore grows exponentially.

Reff is not fixed over time, but depends on several factors. On the one hand, there are factors which we cannot actively influence (e.g., properties of the virus itself). On the other hand, it depends on factors such as how many people interact in confined spaces, which can be influenced by mitigation measures.

In order to monitor the efficacy of such mitigation measures, it is helpful to understand how Reff evolves.

Unfortunately, Reff cannot be directly assessed, since we typically do not know who infected whom. Instead, one tries to infer Reff from the series of case numbers. This procedure is based on a mathematical model, which tries to encapsulate our knowledge about the dynamics of the spread of infectious diseases.

An internationally recognized, frequently used base model is explained in the section Mathematical background. This model and the resulting estimation procedure was developed by Cori et al. (2013) and is implemented in the software package EpiEstim. This method is used to compute the estimates of Reff which are shown on the current numbers page. The estimation procedure depends on various input parameters, which we explain in detail here. The credible intervals shown on the index page are based on an extension of EpiEstim which was developed by the London School of Hygiene & Tropical Medicine (LSHTM), cf. epiforecasts.io.

Parameter

gives the number of days for which Reff is assumed to be constant.

To account for random and systematic variability in the number of new cases (e.g., on the weekend there are typically fewer cases), EpiEstim makes the assumption that Reff has remained constant over the previous days to obtain an estimate of Reff for the current date. This provides an estimate of Reff for the current date that can be roughly considered as an "average" of Reff over days.

The size of affects the behavior of the estimate of Reff. For example, a small τ has the advantage that changes in Reff can be identified more quickly. In contrast, if τ is larger, the estimate is more robust with respect to random fluctuations in the number of new cases. In our graphs, we present the estimates for τ=13 (this choice of parameter is for instance used by AGES) as well as estimates for τ=7 (as used, e.g. in Cori et al. (2013)).

These opposing effects are best illustrated via example: One can simulate case numbers from the model on which EpiEstim is based after providing Reff as an input parameter. The following graph shows a series of virtual case numbers that has been generated accordingly. The "virtual epidemic" has three phases. In the beginning, Reff=2.2, then Reff=0.6, and ultimately Reff=1.3.

Case numbers of a simulated epidemic

Based on this simulated number of cases, we can use EpiEstim to estimate the (actually known) value of Reff. In this scenario, we have nearly perfect conditions for the EpiEstim procedure, since all model assumptions of EpiEstim are fulfilled, except one: at 11.10 and 25.10 as well as some days thereafter the model assumption which says that the actual Reff has been constant across the last days is not met. The effect of this violation is clearly shown in the following figure. The graph shows the true Reff series and the corresponding estimate by EpiEstim (including credible intervals). Jumps in the actual Reff turn into "ramps" in the estimates. Furthermore, the actual value of Reff is correctly estimated after a delay of days. Observe that it takes longer to estimate Reff correctly when is large.

Estimated and actual R in a simulated epidemic
green lineR (EpiEstim, τ=7)
blue lineR (EpiEstim, τ=13)
grey lineR (specified)
Comparison of pre-specified Reff and estimates obtained from EpiEstim for a simulated epidemic. (Lighter shades show 90% credible intervals.)

Time delay

In the real world several reasons contribute to a time delay in the estimation of Reff. After someone is infected, it takes several days until symptoms emerge. It takes additional time until they are tested and till the testing result is digitally recorded. Hence, if one tries to attribute a case to the day when the infection actually occurred, it has to be moved backwards by several days. Accounting for these delays, our estimates of Reff are assigned to the date 10 days before the last date for which data has been used to compute the estimate on the main page.

Additionally, since the estimation procedure averages across days, one could consider shifting the assignment date by an additional days. This would have the benefit of centering the -day window over the estimation date. In the last figure of the previous section, this would mean that the estimated values of Reff move closer to the actual values, overall. But, this would be accompanied by the unwanted effect that the estimate of Reff starts decreasing before the true Reff drops. Therefore this additional shift is not incorporated in our estimates of Reff.

We reiterate that the time offset has to be considered when comparing different estimation procedures. For instance, it is also common to associate the estimates with the last day for which data has been used to calculate the esitmate. This would lead to significantly different plots.

Credible intervals

The 90% credible interval for Reff gives the range in which 90% of the plausible values for Reff lie (analogous for the 50% credible interval).

EpiEstim, as well as the extended method from epiforecasts.io, use Bayes estimation for Reff: Different possible values of Reff are weighted depending on how plausible they are given the current development of the case numbers. A more detailed explanation can be found in the Mathematical background.

For the interpretation of such estimates and credible intervals, it is crucial to notice that only uncertainties which were considered in the model are taken into account.

EpiEstim assumes that the serial interval is known accurately and that all infected individuals are equally infectious. Furthermore, it is assumed that all infected individuals test positively and that it is known exactly at which day the infection event occured. Additionally, Reff is considered to have been constant for a stretch of days. Under those model assumptions, the credible interval for the Bayes estimator for Reff is accurately inferred. For the parameters we used and the recorded infection cases, the corresponding credible interval is typically very small (i.e., single-digit percentage values for the 95% credible interval); however, the model assumptions are apparently not satisfied and one expects that the error of the model should be much larger than the credible interval of the Bayes estimator. Therefore, the credible interval of the EpiEstim estimation procedure could give a highly unrealistic measure of the uncertainties actually present and is not displayed in our graphics.

In contrast, the implementation of the LSTHM group (epiforecasts.io) considers model uncertainties in several additional ways: variation in the length of the serial interval is accounted for, the date of the infection event is considered stochastic, and the assumption of Reff being constant over the last days is relaxed. The resulting credible intervals should represent the uncertainties more realistically and are included in our graphs.

We want to stress that further uncertainties exist which are not accounted for. One can assume that many infected individuals (e.g., asymptomatic carriers) are not tested. In addition, uncertainties exist as to how infectious different individuals are and to which extent the behavior of individuals affects their infectiousness (e.g., "super-spreaders"). The last argument also questions the validity of the model assumption that the number of newly infected individuals per day is Poisson distributed (cf. the section Mathematical background).