This vignette examines some estimates of cases of COVID-19 in England through time. These data should prove useful in the history matching of MetaWards. Daily lab-confirmed cases are available, but these will clearly underestimate the number of cases, as not nearly every case is recorded. The ONS run an infections survey that tests a large volunteer population, with the aim of estimating the number of infections in the wider population.
How much do the lab-confirmed cases underestimate new cases? We use ONS survey data to try and estimate by using ONS survey data. We aggregate the daily new cases to weekly, and interpolate in time to match the ONS survey.
NOTE: This is currently unfinished, as the “new cases” data period extends with each new report.
Best estimate suggest that in the period from late April to Early June, the total daily lab-confirmed cases were around 22% of true cases, but there is a lot of uncertainty in the estimates. Many “trajectories” are possible for the new cases, and you’d expect to be picking up a lower proportion of the cases early in that period and a higher number later in that period.
Link to the Rmarkdown: https://github.com/dougmcneall/covid/blob/master/docs/weekly_new_cases_data.Rmd
These surveys measure the number of people who have COVID-19 at a particular time via a survey. They were conducted by the Office for National Statistics weekly and released on the following days:
On cumulative total infections, from the 5th June report:
“As of 24 May 2020, 6.78% (95% confidence interval: 5.21% to 8.64%) of individuals from whom blood samples were taken tested positive for antibodies to the coronavirus (COVID-19). This is based on blood test results from 885 individuals since the start of the study on 26 April 2020.”
Incidence rate is new cases per week per 100 people.
NOTE: the “new cases” are measured since the start of the study on April 26th 2020, so the time period for estimating them is overlapping and extends with time.
Also check out Cambridge MRC estimates
https://www.mrc-bsu.cam.ac.uk/tackling-covid-19/nowcasting-and-forecasting-of-covid-19/
https://www.mrc-bsu.cam.ac.uk/now-casting/
library(DiceKriging)
library(tidyverse)
library(lubridate)
ONS data (hand edited from the reports)
ons <- read.csv("../data/ons_covid_infection_survey.csv", header = TRUE, na.strings = c('NA'))
#ons <- read.csv("https://raw.githubusercontent.com/dougmcneall/covid/master/data/ons_covid_infection_survey.csv", header = TRUE, na.strings = c('NA'))
These data report the daily new lab-confirmed cases of COVID-19.
dat <- read.csv('https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv')
Use only England data, to match the ONS survey
england <- dat[dat[,'Area.name']=='England', ]
init <- data.frame('Specimen.date' = as.Date(england[,'Specimen.date']), 'Daily.lab.confirmed.cases' = england[,'Daily.lab.confirmed.cases'])
weekly_new_cases <- init %>%
group_by(week = format(Specimen.date, '%Y-%U')) %>%
summarise(weekly_sum = sum(Daily.lab.confirmed.cases))
# manually calculate the date axis for now
#week_ax = seq(from = min(as.Date(england[, 'Specimen.date'])), to = max(as.Date(england[, 'Specimen.date'])), by = 7)
week_ax = seq(from = min(as.Date(england[, 'Specimen.date'])), by = 7, length.out = nrow(weekly_new_cases))
ons_midpoints <- as.Date(c("2020-05-03","2020-05-10","2020-05-17","2020-05-24", "2020-05-31", "2020-06-07", "2020-06-14", "2020-06-21", "2020-06-28"))
par(las = 1, mar = c(5,7,3,2))
plot(ons_midpoints, ons$cases_upper95, type = 'n', ylim = c(0, max(ons$cases_upper95)),
xlab = '', ylab = '', bty = 'n')
abline(h = seq(from = 0, to = 250000, by = 50000), col = 'grey', lty = 'dashed')
points(ons_midpoints, ons$cases_med, col = 'black', bg = 'black', pch = 21)
segments(x0 = ons_midpoints, y0 = ons$cases_lower95, x1 = ons_midpoints, y1 = ons$cases_upper95)
legend('bottomleft', pch = NULL, legend = 'vertical lines represent 95% CI', bty = 'n')
mtext('Active cases in England', side = 3, adj = 0.25, outer = TRUE, line = -2, cex = 1.5)
Interpolate the weekly cases from the data.
# The NA was causing problems
weekly_new_cases$weekly_sum[1] <- 0
# Gaussian process fit for the aggregated new cases timeseries.
cases.fit <- km(~., design = as.matrix(week_ax, ncol = 1), response = weekly_new_cases$weekly_sum)
##
## optimisation start
## ------------------
## * estimation method : MLE
## * optimisation method : BFGS
## * analytical gradient : used
## * trend model : ~design
## * covariance model :
## - type : matern5_2
## - nugget : NO
## - parameters lower bounds : 1e-10
## - parameters upper bounds : 350
## - best initial criterion value(s) : -247.6172
##
## N = 1, M = 5 machine precision = 2.22045e-16
## At X0, 0 variables are exactly at the bounds
## At iterate 0 f= 247.62 |proj g|= 1.6226
## At iterate 1 f = 245.35 |proj g|= 1.175
## At iterate 2 f = 242.47 |proj g|= 0.25763
## At iterate 3 f = 242.27 |proj g|= 0.09025
## At iterate 4 f = 242.23 |proj g|= 0.015449
## At iterate 5 f = 242.23 |proj g|= 0.0012534
## At iterate 6 f = 242.23 |proj g|= 1.9927e-05
## At iterate 7 f = 242.23 |proj g|= 2.6368e-08
##
## iterations 7
## function evaluations 8
## segments explored during Cauchy searches 7
## BFGS updates skipped 0
## active bounds at final generalized Cauchy point 0
## norm of the final projected gradient 2.63676e-08
## final function value 242.231
##
## F = 242.231
## final value 242.230790
## converged
# Predict the cases at the timepoints where we have the ONS measurements.
pred <- predict.km(cases.fit, newdata = as.matrix(ons_midpoints, ncol = 1), type = 'UK')
## Warning in checkNames(X1 = X, X2 = newdata, X1.name = "the design", X2.name
## = "newdata"): newdata not named: newdata's variables are inherited from the
## design
Now calculate weekly new cases on the same time baselines as the ONS data (26th April 2020, until the end of the reporting period).
weekly_rebased <- rep(NA, nrow(ons))
chunk.days.vec <- rep(NA, nrow(ons))
chunk.weeks.vec <- rep(NA, nrow(ons))
for(i in 1:nrow(ons)){
init.chunk <- init[ init$Specimen.date > as.Date("2020-04-26") & init$Specimen.date < as.Date(ons[i, 'end_date']), ]
chunk.days <- nrow(init.chunk)
chunk.days.vec[i] <- chunk.days
chunk.weeks <- chunk.days/7
chunk.weeks.vec[i] <- chunk.weeks
weekly_rebased[i] <- sum(init.chunk['Daily.lab.confirmed.cases'], na.rm = TRUE) / chunk.weeks
}
print((weekly_rebased / ons[,'new_cases_med']) * 100)
## [1] NA 34.73836 35.84787 45.29005 49.71588 54.08099 60.41983 49.82531
## [9] 99.20053
par(mar = c(5,6,3,2), las = 1)
plot(week_ax, weekly_new_cases$weekly_sum,
ylim = c(0, 120000), xlim = as.Date(c("2020-03-01", "2020-07-21")),
xlab = '', ylab = '',
pch = 19,
bty = 'n',
type = 'n')
abline(h = seq(from = 0, to = 120000, by = 20000), col = 'lightgrey', lty = 'dashed')
points(week_ax, weekly_new_cases$weekly_sum, pch = 19)
points(as.Date(ons[,'end_date']), ons[,'new_cases_med'], col = 'red', pch = 19)
points(ons_midpoints, pred$mean, col = 'skyblue2', pch = 19)
segments(x0 = as.Date(ons[,'end_date']), y0 = ons[,'new_cases_lower95'], x1 = as.Date(ons[,'end_date']), y1 = ons[,'new_cases_upper95'], col = 'tomato2')
startdates <- c(NA, rep(as.Date("2020-04-26"),5), as.Date(ons[6:9,'end_date']))
segments(x0 = startdates,y0 = ons[, 'new_cases_med'], x1 = as.Date(ons[,'end_date']), y1 = ons[,'new_cases_med'], col = 'red')
#segments(x0 = as.Date("2020-04-26"),y0 = weekly_rebased , x1 = as.Date(ons[,'end_date']), y1 = weekly_rebased , col = 'skyblue2')
mtext('Weekly new cases in England', side = 3, adj = 0.06, line = -2,outer = TRUE, cex = 1.5)
legend('topleft', pch = c(19,19,19,NA), col = c('black', 'skyblue2','red', 'red'),
legend = c('lab-confirmed cases', 'interpolated', 'ONS estimates + 95% CI', 'Horizontal lines are new cases period'),
bty = 'n')
plot(as.Date(england[, 'Specimen.date']), england[, 'Daily.lab.confirmed.cases'], pch = 20,
ylab = 'Daily lab-confirmed cases in England', xlab = '')