-
Notifications
You must be signed in to change notification settings - Fork 17
Extend geocode utility to actually support the state to state mappings #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- state_x -> state_y where x,y are in {code, id, name}
# state_name -> state_id | ||
new_data = gmpr.add_geocode(self.zip_data, "zip", "state_name") | ||
new_data2 = gmpr.add_geocode(new_data, "state_name", "state_id") | ||
assert new_data2.shape == (12, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there also be an assert statement for new_data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, new_data is just the test data for state_name to state_id, but it starts in zip form
* update replace_geocode documentation to be clear about data columns * add test cases for renaming columns in replace_geocode * fix the state to state conversion dropped columns issue
not specifically related to this PR, I just missed this in the previous one: |
Ahhh, thanks, might as well take care of that here. |
Refactor cdc_covidnet to use geo utils
new_data = gmpr.add_population_column(self.zip_data, "zip") | ||
assert new_data["population"].sum() == 274902 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to keep both of these checks?
Also, if it's 5x5 output it may be simpler juts to do a direct df comparison on values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old test I don't see anymore. Pull again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just asking why it was deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I decided to move away from tests based on data-derived population counts. I figured the tests should catch whether the underlying logic or arithmetic breaks, not whether the data file changed. Am open to reasons for keeping though.
# hrr -> nation | ||
with pytest.raises(ValueError): | ||
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr") | ||
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation") | ||
|
||
# hrr -> nation | ||
with pytest.raises(ValueError): | ||
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr") | ||
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation") | ||
|
||
# hrr -> nation | ||
with pytest.raises(ValueError): | ||
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr") | ||
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this is a new one 😄
@@ -411,6 +420,23 @@ def test_add_geocode(self): | |||
) | |||
) | |||
|
|||
# hrr -> nation | |||
with pytest.raises(ValueError): | |||
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure if the first line valueerrors but the second one doesnt, it still passes, so may want to split this into two with pytest raises....
statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that have been caught in the zip -> hrr test some lines before that though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe im understanding this test wrong. I read it as testing that both
new_data = gmpr.replace_geocode(self.zip_data, "zip", "hrr")
and
new_data2 = gmpr.replace_geocode(new_data, "hrr", "nation")
raise valueerrors. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing the second one, since we should not be mapping hrr -> nation (it's an incomplete mapping, so it's unsupported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh got it, didn't see the second line calls new_data. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
new_code="state_id", | ||
dropna=False) | ||
# To use the original column name, reassign original column and drop new one | ||
hosp_df[APIConfig.STATE_COL] = hosp_df["state_id"].str.upper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinandrew btw, this may be cdc_covidnet specific, but the state abbreviation in other indicators (like JHU) is assumed to be lower case. Are we sure this is what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason cdc covidnet was uppercase, so i kept it consistent. if it's something we can change, would definitely recommend we standardize, but haven't looked into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krivard thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slacked Katie on this while discussing a similar issue, apparently downstream ingestion will standardize everything and accepts lower or uppercase, so this can be removed and we can just move to lowercase for the indicator code.
Add support for the state_x -> state_y where x,y are in {code, id, name} mappings