Skip to content

covidcast acquisition rejecting rows with absent or NaN missingness columns #729

@krivard

Description

@krivard

Sample error:

{
  "detail": [
    "Pandas(geo_id='ca', val='6.6088201', se=nan, sample_size=nan, missing_val=nan, missing_se=nan, missing_sample_size=nan)",
    "missing_val"
  ],
  "file": "/common/covidcast/receiving/chng/20210926_state_smoothed_adj_outpatient_cli.csv",
  "event": "invalid value for row",
  "logger": "load_csv",
  "level": "warning",
  "timestamp": "2021-10-04T03:09:39.319281Z"
}

The file listed above was saved to /common/covidcast/archive/failed/chng with the following content:

geo_id,val,se,sample_size,missing_val,missing_se,missing_sample_size
ak,4.2654434,NA,NA,NA,NA,NA
al,2.3107292,NA,NA,NA,NA,NA
ar,1.5595885,NA,NA,NA,NA,NA
az,3.3219389,NA,NA,NA,NA,NA
ca,6.6088201,NA,NA,NA,NA,NA
co,1.5949461,NA,NA,NA,NA,NA
ct,1.2578854,NA,NA,NA,NA,NA
dc,4.2415408,NA,NA,NA,NA,NA
de,2.158752,NA,NA,NA,NA,NA
fl,1.7564632,NA,NA,NA,NA,NA
ga,3.1381255,NA,NA,NA,NA,NA
gu,0.9206689,NA,NA,NA,NA,NA
[...]

However the S3 ArchiveDiffer cache for this file has the following content:

geo_id,val,se,sample_size
ak,4.2654434,NA,NA
al,2.3107292,NA,NA
ar,1.5595885,NA,NA
az,3.3219389,NA,NA
ca,6.6088201,NA,NA
co,1.5949461,NA,NA
ct,1.2578854,NA,NA
dc,4.2415408,NA,NA
de,2.158752,NA,NA
fl,1.7564632,NA,NA
ga,3.1381255,NA,NA
gu,0.9206689,NA,NA

I'm not sure what happened here -- did acquisition fill the missingness columns with NA and then save the data frame to failed, or did ArchiveDiffer add them to the files left in receiving without updating the files in the S3 cache?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions