Skip to content

Fill or document historical gaps in issue/as-of availability for "hhs" "confirmed_admissions_covid_1d_7dav" not present in the 1d signal #1430

@brookslogan

Description

@brookslogan

Actual Behavior:

The HHS COVID-19 admissions 7-day-average signal ( "hhs" "confirmed_admissions_covid_1d_7dav"), for at least the state level, appears to have a couple of time periods >= 1mo where no updates were issued, while the 1d signal confirmed_admissions_covid_1d does not have these large gaps in issue availability. These 7dav issue/as-of availability gaps do not appear to be documented on the hhs data source documentation page.

library("pipeR")
library("tidyverse")
library("delphi.epidata")
issue.data.for.1d = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d",
                                              "day", "state",
                                              delphi.epidata::epirange(12340101,34560101), "*",
                                              issues = delphi.epidata::epirange(12340101,34560101)) %>%
  delphi.epidata::fetch_tbl()

issue.data.for.7dav = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d_7dav",
                                                "day", "state",
                                                delphi.epidata::epirange(12340101,34560101), "*",
                                                issues = delphi.epidata::epirange(12340101,34560101)) %>%
  delphi.epidata::fetch_tbl()

## Large gaps in 7dav issue data:
issue.data.for.7dav %>>% distinct(issue,geo_value) %>>% count(issue) %>>% complete(issue=full_seq(issue,1L), fill=list(n=0L)) %>>% ggplot(aes(issue,n)) %>>% `+`(geom_line()) %>>% `+`(ylab("Number of states&territories with update at all for 7dav"))

## But the 1d signal does not have these large gaps, even if we only count an (issue,geo) if it contains sort-of-real-time (<7d-lag) data:
issue.data.for.1d %>>% filter(time_value>=issue-7L) %>>% distinct(issue,geo_value) %>>% count(issue) %>>% complete(issue=full_seq(issue,1L), fill=list(n=0L)) %>>% ggplot(aes(issue,n)) %>>% `+`(geom_line()) %>>% `+`(ylab("Number of states&territories with update for previous week for 1d"))

## Dates for 7dav issue gaps:
issue.data.for.7dav %>>%
  distinct(issue) %>>%
  arrange(issue) %>>%
  transmute(gap.start  = lag(issue),
            gap.end    = issue,
            gap.length = as.integer(issue - lag(issue))) %>>%
  filter(gap.length != 1L) %>>%
  print(n=100L)
##    gap.start  gap.end    gap.length
##   <date>     <date>          <int>
## [...]
## 14 2021-01-24 2021-03-06         41
## 15 2021-03-20 2021-06-30        102
## [...]



## And within these gaps, it appears possible to calculate updates to 7-day averages; e.g., on issue 2021-01-24 for `time_value` 2021-01-23:

as.of.20210124.data.for.1d = delphi.epidata::covidcast("hhs", "confirmed_admissions_covid_1d",
                                                       "day", "state",
                                                       delphi.epidata::epirange(12340101,34560101), "*",
                                                       as_of = 20210124) %>%
  delphi.epidata::fetch_tbl()

as.of.20210124.data.for.1d %>>%
  filter(as.Date("2021-01-24")-7L <= time_value, time_value <= as.Date("2021-01-24")-1L, !is.na(value)) %>>%
  count(geo_value) %>>%
  filter(n == 7L) %>>%
  nrow()

min(issue.data.for.7dav[["lag"]]) # min lag is 1; it couldn't be that somehow there was data for time_value 2021-01-23 and the 2021-01-24 update didn't do anything, as min lag 1 means there was no 7dav for 2021-01-23 on 2021-01-23.

Expected behavior

Either (a) there should not be such large gaps in 7dav data, or (b) they should be documented on the hhs data source doc page and anywhere such as-of or issue data is discussed or visualized.

Context

Forecast exploration work uses as-of and/or issue queries to do pseudoprospective analysis. The time ranges for which this analysis seemed to be possible differed based on querying the 7dav signal vs. querying the 1d signal and performing a rolling average.

Metadata

Metadata

Assignees

No one assigned

    Labels

    data qualityMissing data, weird data, broken data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions