Skip to content

Add 7-day average of each signal to Safegraph indicator #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 170 commits into from
Oct 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
170 commits
Select commit Hold shift + click to select a range
b566d4c
try until pull the new data successfully
Aug 24, 2020
8db2799
used tenacity
Aug 26, 2020
c9b16a5
Add .gitignore
vishakha1812 Aug 26, 2020
f969d96
Update params.json.template
vishakha1812 Aug 26, 2020
56db650
Update run.py
vishakha1812 Aug 26, 2020
775efe1
Update setup.py
vishakha1812 Aug 26, 2020
df20f05
Update test_update_sensor.py
vishakha1812 Aug 26, 2020
9295eac
Update update_sensor.py
vishakha1812 Aug 26, 2020
786cef4
Add constants.py
vishakha1812 Aug 26, 2020
3391ff3
Add static files
vishakha1812 Aug 26, 2020
40091b7
Minor changes
vishakha1812 Aug 26, 2020
db54eb8
removed wildcard import error
vishakha1812 Aug 27, 2020
2a9a68f
update params.json.template
vishakha1812 Aug 27, 2020
319f038
Add wip_signal key in ansible
vishakha1812 Aug 27, 2020
17fb5ae
Add missing receiving directory
vishakha1812 Aug 27, 2020
8a40c1c
Merge pull request #239 from cmu-delphi/main
krivard Aug 27, 2020
dd4085b
Merge branch 'deploy-google_health' into fix_ght_pulling
vishakha1812 Aug 27, 2020
65490cc
Add requirement tenacity
Aug 28, 2020
d4b6f69
fixed invalid character
Aug 28, 2020
a24dc4d
add code
Aug 18, 2020
e833515
fixed an error in unit tests
Aug 18, 2020
5f7ffa1
removed unused files
Aug 18, 2020
870fd20
changed format of intermediate output dataframes
Aug 18, 2020
b9a788f
uncommented code for pulling raw data
jingjtang Aug 18, 2020
ad1d4d5
Added code for using geomap utils
Aug 28, 2020
130a9a7
Minor changes
vishakha1812 Sep 1, 2020
b450f42
Minor change in update_sensor.py
vishakha1812 Sep 2, 2020
a6e07b8
Update test_update_sensor.py
vishakha1812 Sep 2, 2020
008bd9b
Merge pull request #250 from cmu-delphi/wip_emr
krivard Sep 3, 2020
3553e00
Merge pull request #247 from cmu-delphi/fix_ght_pulling
krivard Sep 3, 2020
a7a8e6a
Merge pull request #263 from cmu-delphi/deploy-google_health
krivard Sep 3, 2020
55a41fd
Merge pull request #266 from cmu-delphi/deploy-safegraph
krivard Sep 8, 2020
96e9406
diff uploads
vishakha1812 Sep 10, 2020
9b48cce
[sir-complainsalot] Add grace period and dry run
krivard Sep 9, 2020
707ce49
remove return statement
vishakha1812 Sep 10, 2020
dd20817
remove old static file that is no longer used
huisaddison Sep 11, 2020
6fda1b2
hotfix for nyc boroughs
huisaddison Sep 11, 2020
d30ab47
Skip linting because of an unknown issue with (seemingly) Jenkins
korlaxxalrok Sep 11, 2020
5f3d8f0
Merge pull request #274 from cmu-delphi/test-and-delete/jhu-nyc-support
krivard Sep 11, 2020
8bdadde
Merge pull request #276 from cmu-delphi/deploy-jhu
krivard Sep 11, 2020
3a8e719
add code
Sep 16, 2020
7c08761
Merge branch 'run-quidel' of https://github.com/cmu-delphi/covidcast-…
Sep 16, 2020
2c7a6e4
update unit tests
Sep 16, 2020
330cb66
switched to shallow copy
Sep 16, 2020
41ca43b
update documentation
Sep 16, 2020
7362d48
Add max_borrow_obs
Sep 16, 2020
10e1546
update documentation
Sep 16, 2020
0c3366f
Copied .github directory from www-covidcast
nmdefries Sep 16, 2020
4324431
Removed pull request template and contributing info. Added data quali…
nmdefries Sep 16, 2020
e81fe16
Recovered county level & Commented out unreleased signals
Sep 16, 2020
2740c26
Update params.json.template
vishakha1812 Sep 16, 2020
a74b944
Update params.json.template
vishakha1812 Sep 16, 2020
de16118
Update test_update_sensor.py
vishakha1812 Sep 16, 2020
a42155a
Update the title for DETAILS
jingjtang Sep 16, 2020
7627876
Mocked S3 for update_sensor
eujing Sep 16, 2020
8e6c337
Removed bug_report template. Updated other templates to have more detail
nmdefries Sep 17, 2020
98652c6
Removed Severity section from data_quality template
nmdefries Sep 17, 2020
3f64649
Merge pull request #269 from cmu-delphi/diff_emr
krivard Sep 18, 2020
0c6b4ad
Code and documentation for producing geo mapping files
krivard Sep 18, 2020
b08ed3f
Static geo mapping files
krivard Sep 18, 2020
0c6b422
Updated geo mapping/aggregation utility
krivard Sep 18, 2020
5394dcb
Merge pull request #277 from cmu-delphi/run-quidel
krivard Sep 18, 2020
e81102f
Merge pull request #279 from cmu-delphi/add-issue-templates
krivard Sep 18, 2020
528ef2e
Remove 8XXXX and 9XXYY, YY > 56 JHU FIPS codes, updated Puerto Rico
dshemetov Sep 18, 2020
b564e9f
Code review updates:
dshemetov Sep 22, 2020
149e16d
Update _delphi_utils_python/data_proc/geomap/geo_data_proc.py
dshemetov Sep 22, 2020
eafdf13
Update _delphi_utils_python/delphi_utils/geomap.py
dshemetov Sep 22, 2020
e16d51d
Replace assert with ValueError exception
dshemetov Sep 22, 2020
fde9525
Add doc string for megacounty code
dshemetov Sep 22, 2020
baf0c14
Link todo list to github issues
dshemetov Sep 22, 2020
9c8e1b3
Taking ownership in the README
dshemetov Sep 22, 2020
2c3de60
Add crosswalk sanity checks to test_geomap
dshemetov Sep 23, 2020
806db8a
Merge branch 'rf_geo_refactor' of https://github.com/cmu-delphi/covid…
dshemetov Sep 23, 2020
c26743e
Uncomment work functions
dshemetov Sep 24, 2020
5fbff78
Code review updates
dshemetov Sep 24, 2020
655f21d
String conversion check coverage
dshemetov Sep 24, 2020
7d34fb9
Two final features
dshemetov Sep 25, 2020
b88e89f
Part of previous commit
dshemetov Sep 25, 2020
9a0eb10
Final set of tests:
dshemetov Sep 25, 2020
0025cb7
template files
mariajahja Sep 28, 2020
b65d1ff
claims based hosp indicator package
mariajahja Sep 28, 2020
ad4a48c
unit tests
mariajahja Sep 28, 2020
cb2e025
review, addl doc, pylint, and minor fixes
mariajahja Sep 28, 2020
f4af917
code review fixes, change signal name
mariajahja Sep 28, 2020
4a75ba3
Release under MIT license
krivard Sep 29, 2020
9512675
Split off contribution guide; add context to README
krivard Sep 29, 2020
549840e
Add sections on branches, issues, and project boards
krivard Sep 29, 2020
58806ad
Add license information
krivard Sep 29, 2020
669d01b
Moved contribution guide to where github expects it
krivard Sep 29, 2020
98b1992
Update LICENSE with correct copyright date
krivard Sep 30, 2020
e42e566
Update README.md
RoniRos Sep 30, 2020
e3e80af
Small edits to README
ryantibs Sep 30, 2020
f06e085
Complete national in todo, add dropna=True/False tests
dshemetov Sep 30, 2020
9f0c228
Merge pull request #288 from cmu-delphi/dev/public-release
krivard Sep 30, 2020
8689a8a
Fix too long lines
dshemetov Sep 30, 2020
5f7c28c
A few comment fixes and additions, minor change of "is" to "=="
dshemetov Oct 1, 2020
9a3ffe9
Linting fixes for the tests
dshemetov Oct 1, 2020
2ddd9b1
Update EMR hosp geomapper with new changes
dshemetov Oct 1, 2020
eccc72b
Emr hosp update that should've been in previous commit
dshemetov Oct 1, 2020
9aacec5
Update the jhu indicator with geomapper changes
dshemetov Oct 1, 2020
ef426f5
smoothed_claims_covid19 -> smoothed_covid19_from_claims
mariajahja Oct 1, 2020
0cfe78b
A few minor changes:
dshemetov Oct 1, 2020
cb171dc
Merge branch 'rf_geo_refactor' of github.com:cmu-delphi/covidcast-ind…
dshemetov Oct 1, 2020
8c9a143
Remove unneeded numpy import
dshemetov Oct 1, 2020
3ab1871
Small update to README
dshemetov Oct 1, 2020
64eb227
Merge pull request #285 from cmu-delphi/dev-hosp-claims
krivard Oct 2, 2020
4e6ce63
Modify national level code support:
dshemetov Oct 6, 2020
21cb473
Add archive bypass flag to JHU
dshemetov Oct 6, 2020
78ffb9a
Merge branch 'rf_geo_refactor' of github.com:cmu-delphi/covidcast-ind…
dshemetov Oct 6, 2020
4b8ef09
refactor archive tests to use common data
sgsmob Oct 6, 2020
530aba3
ability to run S3 archive as module
sgsmob Oct 6, 2020
4eba4e1
lint the test_archive file
sgsmob Oct 6, 2020
90ba3d6
Important fixes to JHU hand additions:
dshemetov Oct 7, 2020
7b31d71
add test for GitArchiveDiffer.run()
sgsmob Oct 7, 2020
0e0bfaf
increase comment coverage in test_run tests
sgsmob Oct 7, 2020
62b0376
increase comment coverage in test_run tests
sgsmob Oct 7, 2020
64fb9b7
Add indicator testing notebook and geocoding utility demo notebook
dshemetov Oct 7, 2020
b830cc8
Remove trailing whitespace
krivard Oct 7, 2020
8867c14
Merge pull request #217 from cmu-delphi/rf_geo_refactor
krivard Oct 7, 2020
bcda25f
Reduce df memory usage, and vectorize more
eujing Oct 7, 2020
650c12e
Merge pull request #297 from cmu-delphi/main
krivard Oct 7, 2020
4a58a16
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 8, 2020
547fab8
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 8, 2020
8d77066
Fix docs typo in _delphi_utils_python/delphi_utils/archive.py
krivard Oct 9, 2020
8678fba
Merge pull request #295 from sgsmob/main
krivard Oct 9, 2020
c687428
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 9, 2020
309fb79
Fix JHU bug that renamed fips to county in receiving
dshemetov Oct 9, 2020
93b5ba2
Merge pull request #304 from cmu-delphi/jhu_fips_county_fix
krivard Oct 9, 2020
d357d01
Merge pull request #305 from cmu-delphi/deploy-jhu
krivard Oct 9, 2020
81be4f4
update code for geomapping using utils
Oct 12, 2020
eec5f50
fix typo in documentation
huisaddison Oct 12, 2020
0c502f7
fix typo in documentation
huisaddison Oct 12, 2020
37edbf5
fix typo in documentation
huisaddison Oct 12, 2020
3a0f9a9
fix typo in documentation
huisaddison Oct 12, 2020
a2d5768
fix typo in documentation
huisaddison Oct 12, 2020
ed4c306
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 12, 2020
6a616c4
refactor safegraph.process to pave the way for multifile processing
sgsmob Oct 12, 2020
45307ad
tests for finding the file names in the past week
sgsmob Oct 12, 2020
efdf3fd
testing process_window
sgsmob Oct 13, 2020
7ed90e1
comments and formatting for pylint compliance
sgsmob Oct 13, 2020
d0151e8
docstring updates
sgsmob Oct 13, 2020
8918ada
lint compliance in test cases
sgsmob Oct 13, 2020
6b24185
move location of VALID_GEO_RESOLUTIONS
sgsmob Oct 13, 2020
cf6f4c1
Merge pull request #287 from cmu-delphi/sir-dryrun
krivard Oct 13, 2020
10d3711
file existence checking in process
sgsmob Oct 13, 2020
8604ec1
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 13, 2020
8c665c6
refactor CSV name
sgsmob Oct 13, 2020
e0ed614
add test for process
sgsmob Oct 14, 2020
8db87c5
fix line too long
sgsmob Oct 14, 2020
e6502e5
remove extraneous prints
sgsmob Oct 15, 2020
922432b
documentation on process_file wrapper
sgsmob Oct 15, 2020
4563d1f
Merge branch 'main' into safegraph_patterns
Oct 16, 2020
1633036
uncomment code for using geo utils
Oct 16, 2020
6358975
fixed errors in geo mapping functions
Oct 16, 2020
0909ca7
fixed errors in geo mapping function and updated the unit tests
Oct 16, 2020
c4808fa
Merge branch 'safegraph_patterns' of https://github.com/cmu-delphi/co…
Oct 16, 2020
424ceb4
deleted extra keyword argument in process
Oct 16, 2020
3495bee
added a dry-run mode
Oct 16, 2020
7b7934b
updated unit tests
Oct 16, 2020
986e585
fixed the dir to sample data
Oct 16, 2020
b0ae5bb
added static folder and params.json for unit tests
Oct 16, 2020
4ce4630
fix whitespacing for linter
huisaddison Oct 17, 2020
3d76de4
remove unused imports
huisaddison Oct 17, 2020
261503a
Merge pull request #225 from cmu-delphi/safegraph_patterns
krivard Oct 19, 2020
7c8e702
Merge branch 'main' of github.com:cmu-delphi/covidcast-indicators int…
sgsmob Oct 19, 2020
90a22c4
don't overwrite files
sgsmob Oct 19, 2020
5117b9d
remove unused import
sgsmob Oct 19, 2020
509f4d7
substring testing with 'in'
sgsmob Oct 19, 2020
9d6394a
update tests to process to include wip and 7d_avg signals
sgsmob Oct 19, 2020
470e95d
added wip signals to params file
sgsmob Oct 19, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Contributing to COVIDcast indicator pipelines

## Branches

* `main`

The primary/authoritative branch of this repository is called `main`, and contains up-to-date code and supporting libraries. This should be your starting point when creating a new indicator. It is protected so that only reviewed pull requests can be merged in.

* `deploy-*`

Each automated pipeline has a corresponding branch which automatically deploys to a runtime host which runs the pipeline at a designated time each day. New features and bugfixes are merged into this branch using a pull request, so that our CI system can run the lint and test cycles and make sure the package will run correctly on the runtime host. If an indicator does not have a branch named after it starting with `deploy-`, that means the indicator has not yet been automated, and has a designated human keeper who is responsible for making sure the indicator runs each day -- whether that is manually or using a scheduler like cron is the keeper's choice.

* everything else

All other branches are development branches. We don't enforce a naming policy.

## Issues

Issues are the main communication point when it comes to bugfixes, new features, or other possible changes. The repository has several issue templates that help to structure issues.

If you ensure that each issue deals with a single topic (ie a single new proposed data source, or a single data quality problem), we'll all be less likely to drop subordinate tasks on the floor, but we also recognize that a lot of the people filing issues in this repository are new to large project management and not used to focusing their thoughts in this way. It's okay, we'll all learn and get better together.

Admins will assign issues to one or more people based on balancing expediency, expertise, and team robustness. It may be faster for one person to fix something, but we can reduce the risk of having too many single points of failure if two people work on it together.

## Project Boards

The Delphi Engineering team uses project boards to structure its weekly calls and track active tasks.

Immediate work is tracked on [Release Planning](https://github.com/cmu-delphi/covidcast-indicators/projects/2)

Long-term work and modeling collaborations are tracked on [Refactoring](https://github.com/cmu-delphi/covidcast-indicators/projects/3)


## General workflow for indicators creation and deployment

So, how does one go about developing a pipeline for a new data source?

**tl;dr**

1. Create your new indicator branch from `main`.
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
3. Make some stuff!
4. When your stuff works, push your `dev-*` branch to remote for review.
5. Consult with a platform engineer for the remaining production setup needs. They will create a branch called `deploy-*` for your indicator.
6. Initiate a pull request against this new branch.
7. Following [the source documentation template](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md), create public API documentation for the source. You can submit this as a pull request against the delphi-epidata repository.
8. If your peers like the code, the documentation is ready, and Jenkins approves, deploy your changes by merging the PR.
9. An admin will propagate your successful changes to `main`.
10. Rejoice!

### Starting out

The `main` branch should contain up-to-date code and supporting libraries. This should be your starting point when creating a new indicator.

```shell
# Hint
#
git checkout main
git checkout -b dev-my-feature-branch
```

### Creating your indicator

Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.

- Consult your peers with questions! :handshake:

Once you have something that runs locally and passes tests you set up your remote branch eventual review and production deployment.

```shell
# Hint
#
git push -u origin dev-my-feature-branch
```

You can then draft public API documentation for people who would fetch this
data from the API. Public API documentation is kept in the delphi-epidata
repository, and there is a [template Markdown
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
that outlines the features that need to be documented. You can create a pull
request to add a new file to `docs/api/covidcast-signals/` for your source. Our
goal is to have public API documentation for the data at the same time as it
becomes available to the public.

### Setting up for review and deployment

Once you have your branch set up you should get in touch with a platform engineer to pair up on the remaining production needs. These include:

- Creating the corresponding `deploy-*` branch in the repo.
- Adding the necessary Jenkins scripts for your indicator.
- Preparing the runtime host with any Automation configuration necessities.
- Reviewing the workflow to make sure it meets the general guidelines and will run as expected on the runtime host.

Once all the last mile configuration is in place you can create a pull request against the correct `deploy-*` branch to initiate the CI/CD pipeline which will build, test, and package your indicator for deployment.

If everything looks ok, you've drafted source documentation, platform engineering has validated the last mile, and the pull request is accepted, you can merge the PR. Deployment will start automatically.

Hopefully it'll be a full on :tada:, after that :crossed_fingers:

If not, circle back and try again.

## Production overview

### Running production code

Currently, the production indicators all live and run on the venerable and perennially useful Delphi primary server (also known generically as "the runtime host").

### Delivering an indicator to the production environment

We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io/) and [Ansible](https://www.ansible.com/) to build, test, package, and deploy each indicator individually to the runtime host.

- Jenkins dutifully manages the whole process for us by executing several "stages" in the context of a [CI/CD pipeline](https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch). Each stage does something unique, building on the previous stage. The stages are:
- Environment - Sets up some environment-specific needs that the other stages depend on.
- Build - Create the Python venv on the Jenkins host.
- Test - Run linting and unit tests.
- Package - Tar and gzip the built environment.
- Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).

There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: true
21 changes: 21 additions & 0 deletions .github/ISSUE_TEMPLATE/data_quality_issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
name: Data quality issue
about: Missing data, weird data, broken data
title: ''
labels: 'data quality'
assignees: 'nmdefries'
---

**Actual Behavior:**

<!--Provide a description of the problem and a minimal reproducible example, if relevant. Please include the source and signal names, as well as sample observations, with geo region name, date, and data, demonstrating the problem.-->

When I...

**Expected behavior**

<!--A clear and concise description of what you expected to happen.-->

**Context**

<!--Add any context about the problem here.-->
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/source_signal_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: 🚀 New Source or Signal
about: Suggest incorporation of a new source or signal
title: ''
labels: 'API addition'
assignees: ''
---

<!--A clear and concise description of the source or signal you would like to add or modify, and how you imagine it working.-->

It would be great if ...

**Data details**

<!--Please link and briefly describe the proposed source. How is the raw data made available? Describe the geographic and time resolution of the raw data. Which fields should be extracted? Please describe proposed processing of the data, especially if this is a variant of an existing source or signal. -->

**Additional context**

<!--Add any other context or screenshots about the feature request here.-->
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ venv.bak/
# mkdocs documentation
/site

# VSCode settings
*.vscode

# mypy
.mypy_cache/

Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2020 The Delphi Group at Carnegie Mellon University

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
98 changes: 20 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,35 @@
# Covidcast Indicators

Pipeline code and supporting libraries for the **Real-time COVID-19 Indicators** used in the Delphi Group's [**COVIDcast** map](https://covidcast.cmu.edu).
[![License: MIT][mit-image]][mit-url]

## The indicators
In early April 2020, Delphi developed a uniform data schema for [a new Epidata endpoint focused on COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.

Each subdirectory contained here that is named after an indicator has specific documentation. Please review as necessary!
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).

## General workflow for indicators creation and deployment
For client access to the API, along with a variety of other utilities, see our [R](https://cmu-delphi.github.io/covidcast/covidcastR/) and [Python](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) packages.

**tl;dr**
For interactive visualizations (of a subset of the available indicators), see our [COVIDcast map](https://covidcast.cmu.edu).

1. Create your new indicator branch from `main`.
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
3. Make some stuff!
4. When your stuff works, push your `dev-*` branch to remote for review.
5. Consult with a platform engineer for the remaining production setup needs. They will create a branch called `deploy-*` for your indicator.
6. Initiate a pull request against this new branch.
7. Following [the source documentation template](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md), create public API documentation for the source. You can submit this as a pull request against the delphi-epidata repository.
8. If your peers like the code, the documentation is ready, and Jenkins approves, deploy your changes by merging the PR.
9. Rejoice!
## Organization

### Starting out
Utilities:
* `_delphi_utils_python` - common behaviors
* `_template_python` & `_template_r` - starting points for new data sources
* `ansible` & `jenkins` - automated testing and deployment
* `sir_complainsalot` - a Slack bot to check for missing data

The `main` branch should contain up-to-date code and supporting libraries. This should be your starting point when creating a new indicator.
Indicator pipelines: all remaining directories.

```shell
# Hint
#
git checkout main
git checkout -b dev-my-feature-branch
```
Each indicator pipeline includes its own documentation.

### Creating your indicator
* Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
* Consult REVIEW.md for the checklist to use for code reviews.
* Consult DETAILS.md (if present) for implementation details, including handling of corner cases.

Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.

- Consult your peers with questions! :handshake:
## License

Once you have something that runs locally and passes tests you set up your remote branch eventual review and production deployment.
This repository is released under the **MIT License**.

```shell
# Hint
#
git push -u origin dev-my-feature-branch
```

You can then set draft public API documentation for people who would fetch this
data from the API. Public API documentation is kept in the delphi-epidata
repository, and there is a [template Markdown
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
that outlines the features that need to be documented. You can create a pull
request to add a new file to `docs/api/covidcast-signals/` for your source. Our
goal is to have public API documentation for the data at the same time as it
becomes available to the public.

### Setting up for review and deployment

Once you have your branch set up you should get in touch with a platform engineer to pair up on the remaining production needs. These include:

- Creating the corresponding `deploy-*` branch in the repo.
- Adding the necessary Jenkins scripts for your indicator.
- Preparing the runtime host with any Automation configuration necessities.
- Reviewing the workflow to make sure it meets the general guidelines and will run as expected on the runtime host.

Once all the last mile configuration is in place you can create a pull request against the correct `deploy-*` branch to initiate the CI/CD pipeline which will build, test, and package your indicator for deployment.

If everything looks ok, you've drafted source documentation, platform engineering has validated the last mile, and the pull request is accepted, you can merge the PR. Deployment will start automatically.

Hopefully it'll be a full on :tada:, after that :crossed_fingers:

If not, circle back and try again.

## Production overview

### Running production code

Currently, the production indicators all live and run on the venerable and perennially useful Delphi primary server (also known generically as "the runtime host").

- This is a virtual machine running RHEL 7.5 and living in CMU's Campus Cloud vSphere-based infrastructure environemnt.

### Delivering an indicator to the production environment

We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io/) and [Ansible](https://www.ansible.com/) to build, test, package, and deploy each indicator individually to the runtime host.

- Jenkins dutifully manages the whole process for us by executing several "stages" in the context of a [CI/CD pipeline](https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch). Each stage does something unique, building on the previous stage. The stages are:
- Environment - Sets up some environment-specific needs that the other stages depend on.
- Build - Create the Python venv on the Jenkins host.
- Test - Run linting and unit tests.
- Package - Tar and gzip the built environment.
- Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).

There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
[mit-image]: https://img.shields.io/badge/License-MIT-yellow.svg
[mit-url]: https://opensource.org/licenses/MIT
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading