Introduction

This vignette examines some estimates of cases of COVID-19 in England through time. These data should prove useful in the history matching of MetaWards. Daily lab-confirmed cases are available, but these will clearly underestimate the number of cases, as not nearly every case is recorded. The ONS run an infections survey that tests a large volunteer population, with the aim of estimating the number of infections in the wider population.

How much do the lab-confirmed cases underestimate new cases? We use ONS survey data to try and estimate by using ONS survey data. We aggregate the daily new cases to weekly, and interpolate in time to match the ONS survey.

NOTE: This is currently unfinished, as the “new cases” data period extends with each new report.

Conclusion

Best estimate suggest that in the period from late April to Early June, the total daily lab-confirmed cases were around 22% of true cases, but there is a lot of uncertainty in the estimates. Many “trajectories” are possible for the new cases, and you’d expect to be picking up a lower proportion of the cases early in that period and a higher number later in that period.

Link to the Rmarkdown: https://github.com/dougmcneall/covid/blob/master/docs/weekly_new_cases_data.Rmd

ONS data direct estimates of infections.

These surveys measure the number of people who have COVID-19 at a particular time via a survey. They were conducted by the Office for National Statistics weekly and released on the following days:

9th July 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/england9july2020

2nd July 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/2july2020

25th June 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/england25june2020

18th June 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/18june2020

12th June 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/12june2020

5th June 2020 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/5june2020

On cumulative total infections, from the 5th June report:
“As of 24 May 2020, 6.78% (95% confidence interval: 5.21% to 8.64%) of individuals from whom blood samples were taken tested positive for antibodies to the coronavirus (COVID-19). This is based on blood test results from 885 individuals since the start of the study on 26 April 2020.”

28th May 2020
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/28may2020

21st May 2020
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/england21may2020

14th May 2020
https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/england14may2020

ONS cases (absolute numbers in England)

Incidence rate is new cases per week per 100 people.

NOTE: the “new cases” are measured since the start of the study on April 26th 2020, so the time period for estimating them is overlapping and extends with time.

Also check out Cambridge MRC estimates
https://www.mrc-bsu.cam.ac.uk/tackling-covid-19/nowcasting-and-forecasting-of-covid-19/
https://www.mrc-bsu.cam.ac.uk/now-casting/

library(DiceKriging)
library(tidyverse)
library(lubridate)

ONS data (hand edited from the reports)

ons <- read.csv("../data/ons_covid_infection_survey.csv", header = TRUE, na.strings = c('NA'))
#ons <- read.csv("https://raw.githubusercontent.com/dougmcneall/covid/master/data/ons_covid_infection_survey.csv", header = TRUE, na.strings = c('NA'))

These data report the daily new lab-confirmed cases of COVID-19.

dat <- read.csv('https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv')

Use only England data, to match the ONS survey

england <- dat[dat[,'Area.name']=='England', ]

init <- data.frame('Specimen.date' = as.Date(england[,'Specimen.date']), 'Daily.lab.confirmed.cases' = england[,'Daily.lab.confirmed.cases'])

weekly_new_cases <- init %>% 
  group_by(week = format(Specimen.date, '%Y-%U')) %>%
  summarise(weekly_sum = sum(Daily.lab.confirmed.cases))

# manually calculate the date axis for now
#week_ax = seq(from = min(as.Date(england[, 'Specimen.date'])), to = max(as.Date(england[, 'Specimen.date'])), by = 7)

week_ax = seq(from = min(as.Date(england[, 'Specimen.date'])), by = 7, length.out = nrow(weekly_new_cases))
ons_midpoints <- as.Date(c("2020-05-03","2020-05-10","2020-05-17","2020-05-24", "2020-05-31", "2020-06-07", "2020-06-14", "2020-06-21", "2020-06-28"))

Estimates of the total number of active infections in England (from ONS infections survey pilot)

par(las = 1, mar = c(5,7,3,2))
plot(ons_midpoints, ons$cases_upper95, type = 'n', ylim = c(0, max(ons$cases_upper95)),
     xlab = '', ylab = '', bty = 'n')
abline(h = seq(from = 0, to = 250000, by = 50000), col = 'grey', lty = 'dashed')
points(ons_midpoints, ons$cases_med, col = 'black', bg = 'black', pch = 21)
segments(x0 = ons_midpoints, y0 = ons$cases_lower95, x1 = ons_midpoints, y1 = ons$cases_upper95)
legend('bottomleft', pch = NULL, legend = 'vertical lines represent 95% CI', bty = 'n')
mtext('Active cases in England', side = 3, adj = 0.25, outer = TRUE, line = -2, cex = 1.5)

Interpolate the weekly cases from the data.

# The NA was causing problems
weekly_new_cases$weekly_sum[1] <- 0
# Gaussian process fit for the aggregated new cases timeseries.
cases.fit <- km(~., design = as.matrix(week_ax, ncol = 1), response = weekly_new_cases$weekly_sum)
## 
## optimisation start
## ------------------
## * estimation method   : MLE 
## * optimisation method : BFGS 
## * analytical gradient : used
## * trend model : ~design
## * covariance model : 
##   - type :  matern5_2 
##   - nugget : NO
##   - parameters lower bounds :  1e-10 
##   - parameters upper bounds :  350 
##   - best initial criterion value(s) :  -247.6172 
## 
## N = 1, M = 5 machine precision = 2.22045e-16
## At X0, 0 variables are exactly at the bounds
## At iterate     0  f=       247.62  |proj g|=       1.6226
## At iterate     1  f =       245.35  |proj g|=         1.175
## At iterate     2  f =       242.47  |proj g|=       0.25763
## At iterate     3  f =       242.27  |proj g|=       0.09025
## At iterate     4  f =       242.23  |proj g|=      0.015449
## At iterate     5  f =       242.23  |proj g|=     0.0012534
## At iterate     6  f =       242.23  |proj g|=    1.9927e-05
## At iterate     7  f =       242.23  |proj g|=    2.6368e-08
## 
## iterations 7
## function evaluations 8
## segments explored during Cauchy searches 7
## BFGS updates skipped 0
## active bounds at final generalized Cauchy point 0
## norm of the final projected gradient 2.63676e-08
## final function value 242.231
## 
## F = 242.231
## final  value 242.230790 
## converged
# Predict the cases at the timepoints where we have the ONS measurements.
pred <- predict.km(cases.fit, newdata = as.matrix(ons_midpoints, ncol = 1), type = 'UK')
## Warning in checkNames(X1 = X, X2 = newdata, X1.name = "the design", X2.name
## = "newdata"): newdata not named: newdata's variables are inherited from the
## design

Now calculate weekly new cases on the same time baselines as the ONS data (26th April 2020, until the end of the reporting period).

weekly_rebased <- rep(NA, nrow(ons))
chunk.days.vec <- rep(NA, nrow(ons))
chunk.weeks.vec <- rep(NA, nrow(ons))

for(i in 1:nrow(ons)){
  
init.chunk <- init[ init$Specimen.date > as.Date("2020-04-26") & init$Specimen.date < as.Date(ons[i, 'end_date']), ]

chunk.days <- nrow(init.chunk)
chunk.days.vec[i] <- chunk.days

chunk.weeks <- chunk.days/7
chunk.weeks.vec[i] <- chunk.weeks

weekly_rebased[i] <- sum(init.chunk['Daily.lab.confirmed.cases'], na.rm = TRUE) / chunk.weeks

}

proportion of new cases that we pick up over each reporting period (%, from 26th April)

print((weekly_rebased / ons[,'new_cases_med']) * 100)
## [1]       NA 34.73836 35.84787 45.29005 49.71588 54.08099 60.41983 49.82531
## [9] 99.20053

Plot new cases

par(mar = c(5,6,3,2), las = 1)
plot(week_ax, weekly_new_cases$weekly_sum,
     ylim = c(0, 120000), xlim = as.Date(c("2020-03-01", "2020-07-21")),
     xlab = '', ylab = '',
     pch = 19,
     bty = 'n',
     type = 'n')
abline(h = seq(from = 0,  to = 120000, by = 20000), col = 'lightgrey', lty = 'dashed')
points(week_ax, weekly_new_cases$weekly_sum, pch = 19)
points(as.Date(ons[,'end_date']), ons[,'new_cases_med'], col = 'red', pch = 19)
points(ons_midpoints, pred$mean, col = 'skyblue2', pch = 19)
segments(x0 = as.Date(ons[,'end_date']), y0 = ons[,'new_cases_lower95'], x1 = as.Date(ons[,'end_date']), y1 = ons[,'new_cases_upper95'], col = 'tomato2')

startdates <- c(NA, rep(as.Date("2020-04-26"),5), as.Date(ons[6:9,'end_date']))

segments(x0 = startdates,y0 = ons[, 'new_cases_med'], x1 = as.Date(ons[,'end_date']),  y1 = ons[,'new_cases_med'], col = 'red')

#segments(x0 = as.Date("2020-04-26"),y0 = weekly_rebased , x1 = as.Date(ons[,'end_date']),  y1 = weekly_rebased , col = 'skyblue2')


mtext('Weekly new cases in England', side = 3, adj = 0.06, line = -2,outer = TRUE, cex = 1.5)
legend('topleft', pch = c(19,19,19,NA), col = c('black', 'skyblue2','red', 'red'),
       legend = c('lab-confirmed cases', 'interpolated', 'ONS estimates + 95% CI', 'Horizontal lines are new cases period'),
       bty = 'n')

Daily lab-confirmed cases in England

plot(as.Date(england[, 'Specimen.date']), england[, 'Daily.lab.confirmed.cases'], pch = 20,
     ylab = 'Daily lab-confirmed cases in England', xlab = '')