Skip to content

Incorrect values in usa-facts:confirmed_incidence_num  #1696

@dshemetov

Description

@dshemetov

Working on JIT A/B tests cmu-delphi/delphi-epidata#947, I found this. Very similar to #1685.

Actual Behavior

The incidence values in the state 'co' don't agree with their cumulative counterparts.

from epidatpy.request import EpiRange, Epidata

manual_incidence = Epidata.covidcast("usa-facts", "confirmed_cumulative_num", "day", "state", EpiRange(20201220, 20201227), "co").df().value.diff()[1:]
api_incidence    = Epidata.covidcast("usa-facts", "confirmed_incidence_num",  "day", "state", EpiRange(20201221, 20201227), "co").df().value
manual_incidence.to_numpy()
# array([2146., 2514., 2949., 3031., 2656., 1430., 1402.])
api_incidence.to_numpy()
# array([2146., 2514., 2949., 3028., 2659., 1430., 1402.])

Expected behavior

These should match. The dates line up.

Context

Not sure what the problem is. Need to explore the data more.

The data had two different issues in the above range, so that may be related.

Full tables for the above queries:

>>> Epidata.covidcast("usa-facts", "confirmed_cumulative_num", "day", "state", EpiRange(20201220, 20201227), "co").df()
      source                    signal geo_type geo_value time_type time_value      issue  lag     value stderr sample_size direction  missing_value  missing_stderr  missing_sample_size
0  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-20 2021-02-11   53  308890.0   None        None      None              0               5                    5
1  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-21 2021-02-11   52  311036.0   None        None      None              0               5                    5
2  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-22 2021-02-11   51  313550.0   None        None      None              0               5                    5
3  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-23 2021-02-11   50  316499.0   None        None      None              0               5                    5
4  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-24 2020-12-26    2  319530.0   None        None      None              0               5                    5
5  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-25 2020-12-31    6  322186.0   None        None      None              0               5                    5
6  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-26 2020-12-31    5  323616.0   None        None      None              0               5                    5
7  usa-facts  confirmed_cumulative_num    state        co       day 2020-12-27 2020-12-31    4  325018.0   None        None      None              0               5                    5
>>> Epidata.covidcast("usa-facts", "confirmed_incidence_num",  "day", "state", EpiRange(20201221, 20201227), "co").df()
      source                   signal geo_type geo_value time_type time_value      issue  lag   value stderr sample_size direction  missing_value  missing_stderr  missing_sample_size
0  usa-facts  confirmed_incidence_num    state        co       day 2020-12-21 2020-12-22    1  2146.0   None        None      None              0               5                    5
1  usa-facts  confirmed_incidence_num    state        co       day 2020-12-22 2020-12-25    3  2514.0   None        None      None              0               5                    5
2  usa-facts  confirmed_incidence_num    state        co       day 2020-12-23 2021-02-11   50  2949.0   None        None      None              0               5                    5
3  usa-facts  confirmed_incidence_num    state        co       day 2020-12-24 2021-02-11   49  3028.0   None        None      None              0               5                    5
4  usa-facts  confirmed_incidence_num    state        co       day 2020-12-25 2020-12-31    6  2659.0   None        None      None              0               5                    5
5  usa-facts  confirmed_incidence_num    state        co       day 2020-12-26 2020-12-31    5  1430.0   None        None      None              0               5                    5
6  usa-facts  confirmed_incidence_num    state        co       day 2020-12-27 2020-12-31    4  1402.0   None        None      None              0               5                    5

Fuller Context

The above is just an example from this batch of locations-dates with the same problem (I only checked latest state data so far):

                       value  value_api_jit
geo_value time_value                       
co        2020-12-24  3028.0         3031.0
          2020-12-25  2659.0         2656.0
          2021-01-17  1501.0         1498.0
ga        2020-12-25  5123.0         5122.0
ia        2021-01-17   731.0          730.0
ky        2021-01-13  4609.0         4610.0
          2021-01-17  2349.0         2348.0
la        2020-12-24  2565.0         2569.0
          2020-12-25     3.0           -1.0
          2021-01-17  4103.0         4092.0
mt        2020-12-24   403.0          436.0
          2020-12-25     0.0          -33.0
nd        2021-01-14   242.0          243.0
          2021-01-15   221.0          220.0
oh        2021-01-17  5248.0         5247.0
va        2021-01-17  9913.0         9912.0
vt        2020-12-25     0.0           -1.0
          2021-01-17   144.0          142.0
wi        2021-01-17  1890.0         1889.0
wv        2021-01-17     0.0         -115.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    data qualityMissing data, weird data, broken data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions