-
Notifications
You must be signed in to change notification settings - Fork 17
JHU generating bogus ".0000" geo id #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We are also seeing a FIPS code of 80001, which I can't find in the geo coding materials:
It may be associated with this line from the JHU csv:
...which JHU seems to have completely made up? |
The 840800XX codes are listed as "Out of [State]" in their UID lookup. My impression from the way they report Puerto Rico is that these are reserved for values they can't pin down to a particular FIPS. For most FIPS codes, they are either blank or zero. Update: looking through the confirmed US cases time series, the following states use that field frequently:
|
@krivard hm, with regard to the '.0001' geoid, since this doesn't appear in the JHU time series, it must be generated somewhere in our code. I would like to see a set difference between the geo_ids of |
Unfortunately, I don't seem to have kept copies of the bad files. I've modified the cron job I'm using for interim repairs to make backups; hopefully I'll have that for you tomorrow morning PDT. We wouldn't expect 1287.0 to show up in the CSSE time series, because those files are cumulative and 1287.0 was an incidence figure. |
Here's a bundle of https://delphi.midas.cs.cmu.edu/~krivard/jhu-.0000.tgz
|
Thanks Katie! I found the bug source. It's in this line. There is a string selection method str[-2:] attempting to select the last two values of a UID like 84072001, but the UID is a type float at that point, so the str[-2:] receives just ".0". This should already be fixed in my refactor #217, because I handle those UIDs manually elsewhere. |
* added the functions zip_to_state_code, zip_to_state_id (and the convert_* versions), zip_to_msa and convert_zip_to_msa * added two functions add_geocode and replace_geocode meant to consolidate the logic in the utility and reduce the code size by a factor of 5. These functions work along side with the rest of the deprecated functions and are meant to replace e.g. zip_to_msa(df, ...) with replace_geocode(df, "zip", "msa", ...). * renamed functions that referred to fips or county interchangeably to consistently use fips, e.g. zip_to_county to zip_to_fips * enforced the string type on all geocodes, with zero padding as necessary * renamed instances of stcode to state_code for clarity * removed non-JHU UID functions for JHU conversion * updated tests to match Bugfixes: * Removed .0000, 9xxx in output mappings - fixes most of #254 * Puerto Rico deaths should now be reported - fixes #179 * Generally fixes #215
JHU is generating a geo id of ".0000", which is not a valid geo id. It seems to affect some signals but not others. For example, from the ingestion log:
& the CSV files in question:
The text was updated successfully, but these errors were encountered: