Skip to content

Commit fbd87a2

Browse files
authored
Merge pull request #401 from cmu-delphi/dv-georefactor
Refactor doctor visits to use geo utils package
2 parents 47016ab + bf34663 commit fbd87a2

File tree

1,265 files changed

+1351148
-35230
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,265 files changed

+1351148
-35230
lines changed

.github/CONTRIBUTING.md

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Contributing to COVIDcast indicator pipelines
2+
3+
## Branches
4+
5+
* `main`
6+
7+
The primary/authoritative branch of this repository is called `main`, and contains up-to-date code and supporting libraries. This should be your starting point when creating a new indicator. It is protected so that only reviewed pull requests can be merged in.
8+
9+
* `deploy-*`
10+
11+
Each automated pipeline has a corresponding branch which automatically deploys to a runtime host which runs the pipeline at a designated time each day. New features and bugfixes are merged into this branch using a pull request, so that our CI system can run the lint and test cycles and make sure the package will run correctly on the runtime host. If an indicator does not have a branch named after it starting with `deploy-`, that means the indicator has not yet been automated, and has a designated human keeper who is responsible for making sure the indicator runs each day -- whether that is manually or using a scheduler like cron is the keeper's choice.
12+
13+
* everything else
14+
15+
All other branches are development branches. We don't enforce a naming policy.
16+
17+
## Issues
18+
19+
Issues are the main communication point when it comes to bugfixes, new features, or other possible changes. The repository has several issue templates that help to structure issues.
20+
21+
If you ensure that each issue deals with a single topic (ie a single new proposed data source, or a single data quality problem), we'll all be less likely to drop subordinate tasks on the floor, but we also recognize that a lot of the people filing issues in this repository are new to large project management and not used to focusing their thoughts in this way. It's okay, we'll all learn and get better together.
22+
23+
Admins will assign issues to one or more people based on balancing expediency, expertise, and team robustness. It may be faster for one person to fix something, but we can reduce the risk of having too many single points of failure if two people work on it together.
24+
25+
## Project Boards
26+
27+
The Delphi Engineering team uses project boards to structure its weekly calls and track active tasks.
28+
29+
Immediate work is tracked on [Release Planning](https://github.com/cmu-delphi/covidcast-indicators/projects/2)
30+
31+
Long-term work and modeling collaborations are tracked on [Refactoring](https://github.com/cmu-delphi/covidcast-indicators/projects/3)
32+
33+
34+
## General workflow for indicators creation and deployment
35+
36+
So, how does one go about developing a pipeline for a new data source?
37+
38+
**tl;dr**
39+
40+
1. Create your new indicator branch from `main`.
41+
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
42+
3. Make some stuff!
43+
4. When your stuff works, push your `dev-*` branch to remote for review.
44+
5. Consult with a platform engineer for the remaining production setup needs. They will create a branch called `deploy-*` for your indicator.
45+
6. Initiate a pull request against this new branch.
46+
7. Following [the source documentation template](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md), create public API documentation for the source. You can submit this as a pull request against the delphi-epidata repository.
47+
8. If your peers like the code, the documentation is ready, and Jenkins approves, deploy your changes by merging the PR.
48+
9. An admin will propagate your successful changes to `main`.
49+
10. Rejoice!
50+
51+
### Starting out
52+
53+
The `main` branch should contain up-to-date code and supporting libraries. This should be your starting point when creating a new indicator.
54+
55+
```shell
56+
# Hint
57+
#
58+
git checkout main
59+
git checkout -b dev-my-feature-branch
60+
```
61+
62+
### Creating your indicator
63+
64+
Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.
65+
66+
- Consult your peers with questions! :handshake:
67+
68+
Once you have something that runs locally and passes tests you set up your remote branch eventual review and production deployment.
69+
70+
```shell
71+
# Hint
72+
#
73+
git push -u origin dev-my-feature-branch
74+
```
75+
76+
You can then draft public API documentation for people who would fetch this
77+
data from the API. Public API documentation is kept in the delphi-epidata
78+
repository, and there is a [template Markdown
79+
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
80+
that outlines the features that need to be documented. You can create a pull
81+
request to add a new file to `docs/api/covidcast-signals/` for your source. Our
82+
goal is to have public API documentation for the data at the same time as it
83+
becomes available to the public.
84+
85+
### Setting up for review and deployment
86+
87+
Once you have your branch set up you should get in touch with a platform engineer to pair up on the remaining production needs. These include:
88+
89+
- Creating the corresponding `deploy-*` branch in the repo.
90+
- Adding the necessary Jenkins scripts for your indicator.
91+
- Preparing the runtime host with any Automation configuration necessities.
92+
- Reviewing the workflow to make sure it meets the general guidelines and will run as expected on the runtime host.
93+
94+
Once all the last mile configuration is in place you can create a pull request against the correct `deploy-*` branch to initiate the CI/CD pipeline which will build, test, and package your indicator for deployment.
95+
96+
If everything looks ok, you've drafted source documentation, platform engineering has validated the last mile, and the pull request is accepted, you can merge the PR. Deployment will start automatically.
97+
98+
Hopefully it'll be a full on :tada:, after that :crossed_fingers:
99+
100+
If not, circle back and try again.
101+
102+
## Production overview
103+
104+
### Running production code
105+
106+
Currently, the production indicators all live and run on the venerable and perennially useful Delphi primary server (also known generically as "the runtime host").
107+
108+
### Delivering an indicator to the production environment
109+
110+
We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io/) and [Ansible](https://www.ansible.com/) to build, test, package, and deploy each indicator individually to the runtime host.
111+
112+
- Jenkins dutifully manages the whole process for us by executing several "stages" in the context of a [CI/CD pipeline](https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch). Each stage does something unique, building on the previous stage. The stages are:
113+
- Environment - Sets up some environment-specific needs that the other stages depend on.
114+
- Build - Create the Python venv on the Jenkins host.
115+
- Test - Run linting and unit tests.
116+
- Package - Tar and gzip the built environment.
117+
- Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).
118+
119+
There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.

.github/ISSUE_TEMPLATE/config.yml

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
blank_issues_enabled: true
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
name: Data quality issue
3+
about: Missing data, weird data, broken data
4+
title: ''
5+
labels: 'data quality'
6+
assignees: 'nmdefries'
7+
---
8+
9+
**Actual Behavior:**
10+
11+
<!--Provide a description of the problem and a minimal reproducible example, if relevant. Please include the source and signal names, as well as sample observations, with geo region name, date, and data, demonstrating the problem.-->
12+
13+
When I...
14+
15+
**Expected behavior**
16+
17+
<!--A clear and concise description of what you expected to happen.-->
18+
19+
**Context**
20+
21+
<!--Add any context about the problem here.-->
+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
name: Feature release
3+
about: Begin the finishing work for features ready to be included in a release
4+
title: 'Release NEW_THING'
5+
labels: 'release'
6+
assignees: 'benjaminysmith'
7+
---
8+
9+
- [Link to issue]()
10+
- [Link to PR]()
11+
- Proposed release version: <!-- eg 1.12 -->
12+
13+
<!-- Additional information about the feature: -->
14+
15+
16+
<!-- relevant for most work -->
17+
18+
- [ ] API [documentation](https://github.com/cmu-delphi/delphi-epidata/tree/main/docs/api) and/or [changelog](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast_changelog.md)
19+
- [ ] API mailing list notification
20+
21+
<!-- relevant for new signals -->
22+
23+
- [ ] Statistical review (usually [correlations](https://github.com/cmu-delphi/covidcast/tree/main/docs/R-notebooks))
24+
- [ ] Signal / source name review (usually [Roni](https://docs.google.com/document/d/10hGd4Evce4lJ4VkWaQEKFQxvmw2P4xyYGtIAWF52Sf8/edit?usp=sharing))
25+
26+
<!-- relevant for new map signals -->
27+
28+
- [ ] Visual review
29+
- [ ] [Signal description pop-up text](https://docs.google.com/document/d/1kDqRg8EaI4WQXMaUUbbCGPlsUqEql8kgXCNt6AvMA9I/edit?usp=sharing) review
30+
- [ ] [Map release notes](https://docs.google.com/document/d/1BpxGgIma_Lkd2kxtwEo2DBdHQ3zk6dHRz-leUIRlOIA/edit?usp=sharing)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
name: 🚀 New Source or Signal
3+
about: Suggest incorporation of a new source or signal
4+
title: ''
5+
labels: 'API addition'
6+
assignees: ''
7+
---
8+
9+
<!--A clear and concise description of the source or signal you would like to add or modify, and how you imagine it working.-->
10+
11+
It would be great if ...
12+
13+
**Data details**
14+
15+
<!--Please link and briefly describe the proposed source. How is the raw data made available? Describe the geographic and time resolution of the raw data. Which fields should be extracted? Please describe proposed processing of the data, especially if this is a variant of an existing source or signal. -->
16+
17+
**Additional context**
18+
19+
<!--Add any other context or screenshots about the feature request here.-->

.gitignore

+8
Original file line numberDiff line numberDiff line change
@@ -116,5 +116,13 @@ venv.bak/
116116
# mkdocs documentation
117117
/site
118118

119+
# VSCode settings
120+
*.vscode
121+
119122
# mypy
120123
.mypy_cache/
124+
125+
# Ansible
126+
.retry
127+
.indicators-ansible-vault-pass
128+
indicators-ansible-vault-pass

Jenkinsfile

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
#!groovy
2+
3+
// import shared library: https://github.com/cmu-delphi/jenkins-shared-library
4+
@Library('jenkins-shared-library') _
5+
6+
pipeline {
7+
8+
agent any
9+
10+
stages {
11+
12+
stage ("Environment") {
13+
when {
14+
anyOf {
15+
branch "deploy-*";
16+
changeRequest target: "deploy-*", comparator: "GLOB"
17+
}
18+
}
19+
steps {
20+
script {
21+
// Get the indicator name from the pipeline env.
22+
if ( env.CHANGE_TARGET ) {
23+
INDICATOR = env.CHANGE_TARGET.replaceAll("deploy-", "")
24+
}
25+
else if ( env.BRANCH_NAME ) {
26+
INDICATOR = env.BRANCH_NAME.replaceAll("deploy-", "")
27+
}
28+
else {
29+
INDICATOR = ""
30+
}
31+
}
32+
}
33+
}
34+
35+
stage('Build') {
36+
when {
37+
changeRequest target: "deploy-*", comparator: "GLOB"
38+
}
39+
steps {
40+
sh "jenkins/${INDICATOR}-jenkins-build.sh"
41+
}
42+
}
43+
44+
stage('Test') {
45+
when {
46+
changeRequest target: "deploy-*", comparator: "GLOB"
47+
}
48+
steps {
49+
sh "jenkins/${INDICATOR}-jenkins-test.sh"
50+
}
51+
}
52+
53+
stage('Package') {
54+
when {
55+
changeRequest target: "deploy-*", comparator: "GLOB"
56+
}
57+
steps {
58+
sh "jenkins/${INDICATOR}-jenkins-package.sh"
59+
}
60+
}
61+
62+
stage('Deploy') {
63+
when {
64+
branch "deploy-*"
65+
}
66+
steps {
67+
sh "jenkins/${INDICATOR}-jenkins-deploy.sh"
68+
}
69+
}
70+
}
71+
72+
post {
73+
always {
74+
script {
75+
/*
76+
Use slackNotifier.groovy from shared library and provide current
77+
build result as parameter.
78+
*/
79+
slackNotifier(currentBuild.currentResult)
80+
}
81+
}
82+
}
83+
}

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2020 The Delphi Group at Carnegie Mellon University
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Covidcast Indicators
2+
3+
[![License: MIT][mit-image]][mit-url]
4+
5+
In early April 2020, Delphi developed a uniform data schema for [a new Epidata endpoint focused on COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.
6+
7+
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).
8+
9+
For client access to the API, along with a variety of other utilities, see our [R](https://cmu-delphi.github.io/covidcast/covidcastR/) and [Python](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) packages.
10+
11+
For interactive visualizations (of a subset of the available indicators), see our [COVIDcast map](https://covidcast.cmu.edu).
12+
13+
## Organization
14+
15+
Utilities:
16+
* `_delphi_utils_python` - common behaviors
17+
* `_template_python` & `_template_r` - starting points for new data sources
18+
* `ansible` & `jenkins` - automated testing and deployment
19+
* `sir_complainsalot` - a Slack bot to check for missing data
20+
21+
Indicator pipelines: all remaining directories.
22+
23+
Each indicator pipeline includes its own documentation.
24+
25+
* Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
26+
* Consult REVIEW.md for the checklist to use for code reviews.
27+
* Consult DETAILS.md (if present) for implementation details, including handling of corner cases.
28+
29+
30+
## License
31+
32+
This repository is released under the **MIT License**.
33+
34+
[mit-image]: https://img.shields.io/badge/License-MIT-yellow.svg
35+
[mit-url]: https://opensource.org/licenses/MIT

0 commit comments

Comments
 (0)