Skip to content

Propagate geomapper changes into JHU #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 76 commits into from
Oct 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
96e9406
diff uploads
vishakha1812 Sep 10, 2020
707ce49
remove return statement
vishakha1812 Sep 10, 2020
8bdadde
Merge pull request #276 from cmu-delphi/deploy-jhu
krivard Sep 11, 2020
3a8e719
add code
Sep 16, 2020
7c08761
Merge branch 'run-quidel' of https://github.com/cmu-delphi/covidcast-…
Sep 16, 2020
2c7a6e4
update unit tests
Sep 16, 2020
330cb66
switched to shallow copy
Sep 16, 2020
41ca43b
update documentation
Sep 16, 2020
7362d48
Add max_borrow_obs
Sep 16, 2020
10e1546
update documentation
Sep 16, 2020
0c3366f
Copied .github directory from www-covidcast
nmdefries Sep 16, 2020
4324431
Removed pull request template and contributing info. Added data quali…
nmdefries Sep 16, 2020
e81fe16
Recovered county level & Commented out unreleased signals
Sep 16, 2020
2740c26
Update params.json.template
vishakha1812 Sep 16, 2020
a74b944
Update params.json.template
vishakha1812 Sep 16, 2020
de16118
Update test_update_sensor.py
vishakha1812 Sep 16, 2020
a42155a
Update the title for DETAILS
jingjtang Sep 16, 2020
7627876
Mocked S3 for update_sensor
eujing Sep 16, 2020
8e6c337
Removed bug_report template. Updated other templates to have more detail
nmdefries Sep 17, 2020
98652c6
Removed Severity section from data_quality template
nmdefries Sep 17, 2020
3f64649
Merge pull request #269 from cmu-delphi/diff_emr
krivard Sep 18, 2020
0c6b4ad
Code and documentation for producing geo mapping files
krivard Sep 18, 2020
b08ed3f
Static geo mapping files
krivard Sep 18, 2020
0c6b422
Updated geo mapping/aggregation utility
krivard Sep 18, 2020
5394dcb
Merge pull request #277 from cmu-delphi/run-quidel
krivard Sep 18, 2020
e81102f
Merge pull request #279 from cmu-delphi/add-issue-templates
krivard Sep 18, 2020
528ef2e
Remove 8XXXX and 9XXYY, YY > 56 JHU FIPS codes, updated Puerto Rico
dshemetov Sep 18, 2020
b564e9f
Code review updates:
dshemetov Sep 22, 2020
149e16d
Update _delphi_utils_python/data_proc/geomap/geo_data_proc.py
dshemetov Sep 22, 2020
eafdf13
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov Sep 22, 2020
e16d51d
Replace assert with ValueError exception
dshemetov Sep 22, 2020
fde9525
Add doc string for megacounty code
dshemetov Sep 22, 2020
baf0c14
Link todo list to github issues
dshemetov Sep 22, 2020
9c8e1b3
Taking ownership in the README
dshemetov Sep 22, 2020
2c3de60
Add crosswalk sanity checks to test_geomap
dshemetov Sep 23, 2020
806db8a
Merge branch 'rf_geo_refactor' of https://github.com/cmu-delphi/covid…
dshemetov Sep 23, 2020
c26743e
Uncomment work functions
dshemetov Sep 24, 2020
5fbff78
Code review updates
dshemetov Sep 24, 2020
655f21d
String conversion check coverage
dshemetov Sep 24, 2020
7d34fb9
Two final features
dshemetov Sep 25, 2020
b88e89f
Part of previous commit
dshemetov Sep 25, 2020
9a0eb10
Final set of tests:
dshemetov Sep 25, 2020
0025cb7
template files
mariajahja Sep 28, 2020
b65d1ff
claims based hosp indicator package
mariajahja Sep 28, 2020
ad4a48c
unit tests
mariajahja Sep 28, 2020
cb2e025
review, addl doc, pylint, and minor fixes
mariajahja Sep 28, 2020
f4af917
code review fixes, change signal name
mariajahja Sep 28, 2020
4a75ba3
Release under MIT license
krivard Sep 29, 2020
9512675
Split off contribution guide; add context to README
krivard Sep 29, 2020
549840e
Add sections on branches, issues, and project boards
krivard Sep 29, 2020
58806ad
Add license information
krivard Sep 29, 2020
669d01b
Moved contribution guide to where github expects it
krivard Sep 29, 2020
98b1992
Update LICENSE with correct copyright date
krivard Sep 30, 2020
e42e566
Update README.md
RoniRos Sep 30, 2020
e3e80af
Small edits to README
ryantibs Sep 30, 2020
f06e085
Complete national in todo, add dropna=True/False tests
dshemetov Sep 30, 2020
9f0c228
Merge pull request #288 from cmu-delphi/dev/public-release
krivard Sep 30, 2020
8689a8a
Fix too long lines
dshemetov Sep 30, 2020
5f7c28c
A few comment fixes and additions, minor change of "is" to "=="
dshemetov Oct 1, 2020
9a3ffe9
Linting fixes for the tests
dshemetov Oct 1, 2020
2ddd9b1
Update EMR hosp geomapper with new changes
dshemetov Oct 1, 2020
eccc72b
Emr hosp update that should've been in previous commit
dshemetov Oct 1, 2020
9aacec5
Update the jhu indicator with geomapper changes
dshemetov Oct 1, 2020
ef426f5
smoothed_claims_covid19 -> smoothed_covid19_from_claims
mariajahja Oct 1, 2020
0cfe78b
A few minor changes:
dshemetov Oct 1, 2020
cb171dc
Merge branch 'rf_geo_refactor' of github.com:cmu-delphi/covidcast-ind…
dshemetov Oct 1, 2020
8c9a143
Remove unneeded numpy import
dshemetov Oct 1, 2020
3ab1871
Small update to README
dshemetov Oct 1, 2020
64eb227
Merge pull request #285 from cmu-delphi/dev-hosp-claims
krivard Oct 2, 2020
4e6ce63
Modify national level code support:
dshemetov Oct 6, 2020
21cb473
Add archive bypass flag to JHU
dshemetov Oct 6, 2020
78ffb9a
Merge branch 'rf_geo_refactor' of github.com:cmu-delphi/covidcast-ind…
dshemetov Oct 6, 2020
90ba3d6
Important fixes to JHU hand additions:
dshemetov Oct 7, 2020
64fb9b7
Add indicator testing notebook and geocoding utility demo notebook
dshemetov Oct 7, 2020
b830cc8
Remove trailing whitespace
krivard Oct 7, 2020
8867c14
Merge pull request #217 from cmu-delphi/rf_geo_refactor
krivard Oct 7, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Contributing to COVIDcast indicator pipelines

## Branches

* `main`

The primary/authoritative branch of this repository is called `main`, and contains up-to-date code and supporting libraries. This should be your starting point when creating a new indicator. It is protected so that only reviewed pull requests can be merged in.

* `deploy-*`

Each automated pipeline has a corresponding branch which automatically deploys to a runtime host which runs the pipeline at a designated time each day. New features and bugfixes are merged into this branch using a pull request, so that our CI system can run the lint and test cycles and make sure the package will run correctly on the runtime host. If an indicator does not have a branch named after it starting with `deploy-`, that means the indicator has not yet been automated, and has a designated human keeper who is responsible for making sure the indicator runs each day -- whether that is manually or using a scheduler like cron is the keeper's choice.

* everything else

All other branches are development branches. We don't enforce a naming policy.

## Issues

Issues are the main communication point when it comes to bugfixes, new features, or other possible changes. The repository has several issue templates that help to structure issues.

If you ensure that each issue deals with a single topic (ie a single new proposed data source, or a single data quality problem), we'll all be less likely to drop subordinate tasks on the floor, but we also recognize that a lot of the people filing issues in this repository are new to large project management and not used to focusing their thoughts in this way. It's okay, we'll all learn and get better together.

Admins will assign issues to one or more people based on balancing expediency, expertise, and team robustness. It may be faster for one person to fix something, but we can reduce the risk of having too many single points of failure if two people work on it together.

## Project Boards

The Delphi Engineering team uses project boards to structure its weekly calls and track active tasks.

Immediate work is tracked on [Release Planning](https://github.com/cmu-delphi/covidcast-indicators/projects/2)

Long-term work and modeling collaborations are tracked on [Refactoring](https://github.com/cmu-delphi/covidcast-indicators/projects/3)


## General workflow for indicators creation and deployment

So, how does one go about developing a pipeline for a new data source?

**tl;dr**

1. Create your new indicator branch from `main`.
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
3. Make some stuff!
4. When your stuff works, push your `dev-*` branch to remote for review.
5. Consult with a platform engineer for the remaining production setup needs. They will create a branch called `deploy-*` for your indicator.
6. Initiate a pull request against this new branch.
7. Following [the source documentation template](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md), create public API documentation for the source. You can submit this as a pull request against the delphi-epidata repository.
8. If your peers like the code, the documentation is ready, and Jenkins approves, deploy your changes by merging the PR.
9. An admin will propagate your successful changes to `main`.
10. Rejoice!

### Starting out

The `main` branch should contain up-to-date code and supporting libraries. This should be your starting point when creating a new indicator.

```shell
# Hint
#
git checkout main
git checkout -b dev-my-feature-branch
```

### Creating your indicator

Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.

- Consult your peers with questions! :handshake:

Once you have something that runs locally and passes tests you set up your remote branch eventual review and production deployment.

```shell
# Hint
#
git push -u origin dev-my-feature-branch
```

You can then draft public API documentation for people who would fetch this
data from the API. Public API documentation is kept in the delphi-epidata
repository, and there is a [template Markdown
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
that outlines the features that need to be documented. You can create a pull
request to add a new file to `docs/api/covidcast-signals/` for your source. Our
goal is to have public API documentation for the data at the same time as it
becomes available to the public.

### Setting up for review and deployment

Once you have your branch set up you should get in touch with a platform engineer to pair up on the remaining production needs. These include:

- Creating the corresponding `deploy-*` branch in the repo.
- Adding the necessary Jenkins scripts for your indicator.
- Preparing the runtime host with any Automation configuration necessities.
- Reviewing the workflow to make sure it meets the general guidelines and will run as expected on the runtime host.

Once all the last mile configuration is in place you can create a pull request against the correct `deploy-*` branch to initiate the CI/CD pipeline which will build, test, and package your indicator for deployment.

If everything looks ok, you've drafted source documentation, platform engineering has validated the last mile, and the pull request is accepted, you can merge the PR. Deployment will start automatically.

Hopefully it'll be a full on :tada:, after that :crossed_fingers:

If not, circle back and try again.

## Production overview

### Running production code

Currently, the production indicators all live and run on the venerable and perennially useful Delphi primary server (also known generically as "the runtime host").

### Delivering an indicator to the production environment

We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io/) and [Ansible](https://www.ansible.com/) to build, test, package, and deploy each indicator individually to the runtime host.

- Jenkins dutifully manages the whole process for us by executing several "stages" in the context of a [CI/CD pipeline](https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch). Each stage does something unique, building on the previous stage. The stages are:
- Environment - Sets up some environment-specific needs that the other stages depend on.
- Build - Create the Python venv on the Jenkins host.
- Test - Run linting and unit tests.
- Package - Tar and gzip the built environment.
- Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).

There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: true
21 changes: 21 additions & 0 deletions .github/ISSUE_TEMPLATE/data_quality_issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
name: Data quality issue
about: Missing data, weird data, broken data
title: ''
labels: 'data quality'
assignees: 'nmdefries'
---

**Actual Behavior:**

<!--Provide a description of the problem and a minimal reproducible example, if relevant. Please include the source and signal names, as well as sample observations, with geo region name, date, and data, demonstrating the problem.-->

When I...

**Expected behavior**

<!--A clear and concise description of what you expected to happen.-->

**Context**

<!--Add any context about the problem here.-->
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/source_signal_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: 🚀 New Source or Signal
about: Suggest incorporation of a new source or signal
title: ''
labels: 'API addition'
assignees: ''
---

<!--A clear and concise description of the source or signal you would like to add or modify, and how you imagine it working.-->

It would be great if ...

**Data details**

<!--Please link and briefly describe the proposed source. How is the raw data made available? Describe the geographic and time resolution of the raw data. Which fields should be extracted? Please describe proposed processing of the data, especially if this is a variant of an existing source or signal. -->

**Additional context**

<!--Add any other context or screenshots about the feature request here.-->
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ venv.bak/
# mkdocs documentation
/site

# VSCode settings
*.vscode

# mypy
.mypy_cache/

Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2020 The Delphi Group at Carnegie Mellon University

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
98 changes: 20 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,35 @@
# Covidcast Indicators

Pipeline code and supporting libraries for the **Real-time COVID-19 Indicators** used in the Delphi Group's [**COVIDcast** map](https://covidcast.cmu.edu).
[![License: MIT][mit-image]][mit-url]

## The indicators
In early April 2020, Delphi developed a uniform data schema for [a new Epidata endpoint focused on COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.

Each subdirectory contained here that is named after an indicator has specific documentation. Please review as necessary!
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).

## General workflow for indicators creation and deployment
For client access to the API, along with a variety of other utilities, see our [R](https://cmu-delphi.github.io/covidcast/covidcastR/) and [Python](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) packages.

**tl;dr**
For interactive visualizations (of a subset of the available indicators), see our [COVIDcast map](https://covidcast.cmu.edu).

1. Create your new indicator branch from `main`.
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
3. Make some stuff!
4. When your stuff works, push your `dev-*` branch to remote for review.
5. Consult with a platform engineer for the remaining production setup needs. They will create a branch called `deploy-*` for your indicator.
6. Initiate a pull request against this new branch.
7. Following [the source documentation template](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md), create public API documentation for the source. You can submit this as a pull request against the delphi-epidata repository.
8. If your peers like the code, the documentation is ready, and Jenkins approves, deploy your changes by merging the PR.
9. Rejoice!
## Organization

### Starting out
Utilities:
* `_delphi_utils_python` - common behaviors
* `_template_python` & `_template_r` - starting points for new data sources
* `ansible` & `jenkins` - automated testing and deployment
* `sir_complainsalot` - a Slack bot to check for missing data

The `main` branch should contain up-to-date code and supporting libraries. This should be your starting point when creating a new indicator.
Indicator pipelines: all remaining directories.

```shell
# Hint
#
git checkout main
git checkout -b dev-my-feature-branch
```
Each indicator pipeline includes its own documentation.

### Creating your indicator
* Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
* Consult REVIEW.md for the checklist to use for code reviews.
* Consult DETAILS.md (if present) for implementation details, including handling of corner cases.

Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.

- Consult your peers with questions! :handshake:
## License

Once you have something that runs locally and passes tests you set up your remote branch eventual review and production deployment.
This repository is released under the **MIT License**.

```shell
# Hint
#
git push -u origin dev-my-feature-branch
```

You can then set draft public API documentation for people who would fetch this
data from the API. Public API documentation is kept in the delphi-epidata
repository, and there is a [template Markdown
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
that outlines the features that need to be documented. You can create a pull
request to add a new file to `docs/api/covidcast-signals/` for your source. Our
goal is to have public API documentation for the data at the same time as it
becomes available to the public.

### Setting up for review and deployment

Once you have your branch set up you should get in touch with a platform engineer to pair up on the remaining production needs. These include:

- Creating the corresponding `deploy-*` branch in the repo.
- Adding the necessary Jenkins scripts for your indicator.
- Preparing the runtime host with any Automation configuration necessities.
- Reviewing the workflow to make sure it meets the general guidelines and will run as expected on the runtime host.

Once all the last mile configuration is in place you can create a pull request against the correct `deploy-*` branch to initiate the CI/CD pipeline which will build, test, and package your indicator for deployment.

If everything looks ok, you've drafted source documentation, platform engineering has validated the last mile, and the pull request is accepted, you can merge the PR. Deployment will start automatically.

Hopefully it'll be a full on :tada:, after that :crossed_fingers:

If not, circle back and try again.

## Production overview

### Running production code

Currently, the production indicators all live and run on the venerable and perennially useful Delphi primary server (also known generically as "the runtime host").

- This is a virtual machine running RHEL 7.5 and living in CMU's Campus Cloud vSphere-based infrastructure environemnt.

### Delivering an indicator to the production environment

We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io/) and [Ansible](https://www.ansible.com/) to build, test, package, and deploy each indicator individually to the runtime host.

- Jenkins dutifully manages the whole process for us by executing several "stages" in the context of a [CI/CD pipeline](https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch). Each stage does something unique, building on the previous stage. The stages are:
- Environment - Sets up some environment-specific needs that the other stages depend on.
- Build - Create the Python venv on the Jenkins host.
- Test - Run linting and unit tests.
- Package - Tar and gzip the built environment.
- Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).

There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
[mit-image]: https://img.shields.io/badge/License-MIT-yellow.svg
[mit-url]: https://opensource.org/licenses/MIT
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading