-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Actual Behavior:
The HHS COVID-19 admissions 7-day-average signal ( "hhs" "confirmed_admissions_covid_1d_7dav"), for at least the state level, appears to have a couple of time periods >= 1mo where no updates were issued, while the 1d signal confirmed_admissions_covid_1d does not have these large gaps in issue availability. These 7dav issue/as-of availability gaps do not appear to be documented on the hhs data source documentation page.
library("pipeR")
library("tidyverse")
library("delphi.epidata")
issue.data.for.1d = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d",
"day", "state",
delphi.epidata::epirange(12340101,34560101), "*",
issues = delphi.epidata::epirange(12340101,34560101)) %>%
delphi.epidata::fetch_tbl()
issue.data.for.7dav = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d_7dav",
"day", "state",
delphi.epidata::epirange(12340101,34560101), "*",
issues = delphi.epidata::epirange(12340101,34560101)) %>%
delphi.epidata::fetch_tbl()
## Large gaps in 7dav issue data:
issue.data.for.7dav %>>% distinct(issue,geo_value) %>>% count(issue) %>>% complete(issue=full_seq(issue,1L), fill=list(n=0L)) %>>% ggplot(aes(issue,n)) %>>% `+`(geom_line()) %>>% `+`(ylab("Number of states&territories with update at all for 7dav"))
## But the 1d signal does not have these large gaps, even if we only count an (issue,geo) if it contains sort-of-real-time (<7d-lag) data:
issue.data.for.1d %>>% filter(time_value>=issue-7L) %>>% distinct(issue,geo_value) %>>% count(issue) %>>% complete(issue=full_seq(issue,1L), fill=list(n=0L)) %>>% ggplot(aes(issue,n)) %>>% `+`(geom_line()) %>>% `+`(ylab("Number of states&territories with update for previous week for 1d"))
## Dates for 7dav issue gaps:
issue.data.for.7dav %>>%
distinct(issue) %>>%
arrange(issue) %>>%
transmute(gap.start = lag(issue),
gap.end = issue,
gap.length = as.integer(issue - lag(issue))) %>>%
filter(gap.length != 1L) %>>%
print(n=100L)
## gap.start gap.end gap.length
## <date> <date> <int>
## [...]
## 14 2021-01-24 2021-03-06 41
## 15 2021-03-20 2021-06-30 102
## [...]
## And within these gaps, it appears possible to calculate updates to 7-day averages; e.g., on issue 2021-01-24 for `time_value` 2021-01-23:
as.of.20210124.data.for.1d = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d",
"day", "state",
delphi.epidata::epirange(12340101,34560101), "*",
as_of = 20210124) %>%
delphi.epidata::fetch_tbl()
as.of.20210124.data.for.1d %>>%
filter(as.Date("2021-01-24")-7L <= time_value, time_value <= as.Date("2021-01-24")-1L, !is.na(value)) %>>%
count(geo_value) %>>%
filter(n == 7L) %>>%
nrow()
min(issue.data.for.7dav[["lag"]]) # min lag is 1; it couldn't be that somehow there was data for time_value 2021-01-23 and the 2021-01-24 update didn't do anything, as min lag 1 means there was no 7dav for 2021-01-23 on 2021-01-23.
Expected behavior
Either (a) there should not be such large gaps in 7dav data, or (b) they should be documented on the hhs data source doc page and anywhere such as-of or issue data is discussed or visualized.
Context
Forecast exploration work uses as-of and/or issue queries to do pseudoprospective analysis. The time ranges for which this analysis seemed to be possible differed based on querying the 7dav signal vs. querying the 1d signal and performing a rolling average.