Skip to content

Conversation

@dshemetov
Copy link
Contributor

Add support for the state_x -> state_y where x,y are in {code, id, name} mappings

- state_x -> state_y where x,y are in {code, id, name}
# state_name -> state_id
new_data = gmpr.add_geocode(self.zip_data, "zip", "state_name")
new_data2 = gmpr.add_geocode(new_data, "state_name", "state_id")
assert new_data2.shape == (12, 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there also be an assert statement for new_data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, new_data is just the test data for state_name to state_id, but it starts in zip form

@krivard krivard requested a review from chinandrew October 14, 2020 19:10
* update replace_geocode documentation to be clear about data columns
* add test cases for renaming columns in replace_geocode
* fix the state to state conversion dropped columns issue
@chinandrew
Copy link
Contributor

not specifically related to this PR, I just missed this in the previous one: date_cols is undocumented in the replace_geocode() docstring

@dshemetov
Copy link
Contributor Author

Ahhh, thanks, might as well take care of that here.

assert new_data["population"].sum() == 274963
assert new_data.shape == (5, 5)
new_data = gmpr.add_population_column(self.zip_data, "zip")
assert new_data["population"].sum() == 274902
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to keep both of these checks?

Also, if it's 5x5 output it may be simpler juts to do a direct df comparison on values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old test I don't see anymore. Pull again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just asking why it was deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I decided to move away from tests based on data-derived population counts. I figured the tests should catch whether the underlying logic or arithmetic breaks, not whether the data file changed. Am open to reasons for keeping though.

Comment on lines 423 to 437
# hrr -> nation
with pytest.raises(ValueError):
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation")

# hrr -> nation
with pytest.raises(ValueError):
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation")

# hrr -> nation
with pytest.raises(ValueError):
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is a new one 😄


# hrr -> nation
with pytest.raises(ValueError):
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure if the first line valueerrors but the second one doesnt, it still passes, so may want to split this into two with pytest raises.... statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that have been caught in the zip -> hrr test some lines before that though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe im understanding this test wrong. I read it as testing that both

            new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")

and

            new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation")

raise valueerrors. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing the second one, since we should not be mapping hrr -> nation (it's an incomplete mapping, so it's unsupported).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh got it, didn't see the second line calls new_data. Thanks.

Copy link
Contributor

@chinandrew chinandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

new_code="state_id",
dropna=False)
# To use the original column name, reassign original column and drop new one
hosp_df[APIConfig.STATE_COL] = hosp_df["state_id"].str.upper()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinandrew btw, this may be cdc_covidnet specific, but the state abbreviation in other indicators (like JHU) is assumed to be lower case. Are we sure this is what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason cdc covidnet was uppercase, so i kept it consistent. if it's something we can change, would definitely recommend we standardize, but haven't looked into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krivard thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slacked Katie on this while discussing a similar issue, apparently downstream ingestion will standardize everything and accepts lower or uppercase, so this can be removed and we can just move to lowercase for the indicator code.

@krivard krivard merged commit c9fea0a into main Oct 26, 2020
@krivard krivard deleted the geoutil_state_extension branch October 29, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants