-
Notifications
You must be signed in to change notification settings - Fork 67
Closed
Labels
Description
Sample error:
{
"detail": [
"Pandas(geo_id='ca', val='6.6088201', se=nan, sample_size=nan, missing_val=nan, missing_se=nan, missing_sample_size=nan)",
"missing_val"
],
"file": "/common/covidcast/receiving/chng/20210926_state_smoothed_adj_outpatient_cli.csv",
"event": "invalid value for row",
"logger": "load_csv",
"level": "warning",
"timestamp": "2021-10-04T03:09:39.319281Z"
}
The file listed above was saved to /common/covidcast/archive/failed/chng with the following content:
geo_id,val,se,sample_size,missing_val,missing_se,missing_sample_size
ak,4.2654434,NA,NA,NA,NA,NA
al,2.3107292,NA,NA,NA,NA,NA
ar,1.5595885,NA,NA,NA,NA,NA
az,3.3219389,NA,NA,NA,NA,NA
ca,6.6088201,NA,NA,NA,NA,NA
co,1.5949461,NA,NA,NA,NA,NA
ct,1.2578854,NA,NA,NA,NA,NA
dc,4.2415408,NA,NA,NA,NA,NA
de,2.158752,NA,NA,NA,NA,NA
fl,1.7564632,NA,NA,NA,NA,NA
ga,3.1381255,NA,NA,NA,NA,NA
gu,0.9206689,NA,NA,NA,NA,NA
[...]
However the S3 ArchiveDiffer cache for this file has the following content:
geo_id,val,se,sample_size
ak,4.2654434,NA,NA
al,2.3107292,NA,NA
ar,1.5595885,NA,NA
az,3.3219389,NA,NA
ca,6.6088201,NA,NA
co,1.5949461,NA,NA
ct,1.2578854,NA,NA
dc,4.2415408,NA,NA
de,2.158752,NA,NA
fl,1.7564632,NA,NA
ga,3.1381255,NA,NA
gu,0.9206689,NA,NA
I'm not sure what happened here -- did acquisition fill the missingness columns with NA and then save the data frame to failed, or did ArchiveDiffer add them to the files left in receiving without updating the files in the S3 cache?