-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
data qualityMissing data, weird data, broken dataMissing data, weird data, broken data
Milestone
Description
Working on JIT A/B tests cmu-delphi/delphi-epidata#947, I found this. Very similar to #1685.
Actual Behavior
The incidence values in the state 'co' don't agree with their cumulative counterparts.
from epidatpy.request import EpiRange, Epidata
manual_incidence = Epidata.covidcast("usa-facts", "confirmed_cumulative_num", "day", "state", EpiRange(20201220, 20201227), "co").df().value.diff()[1:]
api_incidence = Epidata.covidcast("usa-facts", "confirmed_incidence_num", "day", "state", EpiRange(20201221, 20201227), "co").df().value
manual_incidence.to_numpy()
# array([2146., 2514., 2949., 3031., 2656., 1430., 1402.])
api_incidence.to_numpy()
# array([2146., 2514., 2949., 3028., 2659., 1430., 1402.])Expected behavior
These should match. The dates line up.
Context
Not sure what the problem is. Need to explore the data more.
The data had two different issues in the above range, so that may be related.
Full tables for the above queries:
>>> Epidata.covidcast("usa-facts", "confirmed_cumulative_num", "day", "state", EpiRange(20201220, 20201227), "co").df()
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
0 usa-facts confirmed_cumulative_num state co day 2020-12-20 2021-02-11 53 308890.0 None None None 0 5 5
1 usa-facts confirmed_cumulative_num state co day 2020-12-21 2021-02-11 52 311036.0 None None None 0 5 5
2 usa-facts confirmed_cumulative_num state co day 2020-12-22 2021-02-11 51 313550.0 None None None 0 5 5
3 usa-facts confirmed_cumulative_num state co day 2020-12-23 2021-02-11 50 316499.0 None None None 0 5 5
4 usa-facts confirmed_cumulative_num state co day 2020-12-24 2020-12-26 2 319530.0 None None None 0 5 5
5 usa-facts confirmed_cumulative_num state co day 2020-12-25 2020-12-31 6 322186.0 None None None 0 5 5
6 usa-facts confirmed_cumulative_num state co day 2020-12-26 2020-12-31 5 323616.0 None None None 0 5 5
7 usa-facts confirmed_cumulative_num state co day 2020-12-27 2020-12-31 4 325018.0 None None None 0 5 5
>>> Epidata.covidcast("usa-facts", "confirmed_incidence_num", "day", "state", EpiRange(20201221, 20201227), "co").df()
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
0 usa-facts confirmed_incidence_num state co day 2020-12-21 2020-12-22 1 2146.0 None None None 0 5 5
1 usa-facts confirmed_incidence_num state co day 2020-12-22 2020-12-25 3 2514.0 None None None 0 5 5
2 usa-facts confirmed_incidence_num state co day 2020-12-23 2021-02-11 50 2949.0 None None None 0 5 5
3 usa-facts confirmed_incidence_num state co day 2020-12-24 2021-02-11 49 3028.0 None None None 0 5 5
4 usa-facts confirmed_incidence_num state co day 2020-12-25 2020-12-31 6 2659.0 None None None 0 5 5
5 usa-facts confirmed_incidence_num state co day 2020-12-26 2020-12-31 5 1430.0 None None None 0 5 5
6 usa-facts confirmed_incidence_num state co day 2020-12-27 2020-12-31 4 1402.0 None None None 0 5 5
Fuller Context
The above is just an example from this batch of locations-dates with the same problem (I only checked latest state data so far):
value value_api_jit
geo_value time_value
co 2020-12-24 3028.0 3031.0
2020-12-25 2659.0 2656.0
2021-01-17 1501.0 1498.0
ga 2020-12-25 5123.0 5122.0
ia 2021-01-17 731.0 730.0
ky 2021-01-13 4609.0 4610.0
2021-01-17 2349.0 2348.0
la 2020-12-24 2565.0 2569.0
2020-12-25 3.0 -1.0
2021-01-17 4103.0 4092.0
mt 2020-12-24 403.0 436.0
2020-12-25 0.0 -33.0
nd 2021-01-14 242.0 243.0
2021-01-15 221.0 220.0
oh 2021-01-17 5248.0 5247.0
va 2021-01-17 9913.0 9912.0
vt 2020-12-25 0.0 -1.0
2021-01-17 144.0 142.0
wi 2021-01-17 1890.0 1889.0
wv 2021-01-17 0.0 -115.0
Metadata
Metadata
Assignees
Labels
data qualityMissing data, weird data, broken dataMissing data, weird data, broken data