title | parent | grand_parent |
---|---|---|
Indicator Combination |
Inactive Signals |
COVIDcast Main Endpoint |
{: .no_toc}
- Source name:
indicator-combination
This source provides signals which are combinations of the other sources, calculated or composed by Delphi. It is not a primary data source.
{: .no_toc .text-delta}
- TOC {:toc}
- Earliest issue available: May 20, 2020
- Number of data revisions since May 19, 2020: 1
- Date of last change: June 3, 2020
- Available for: county, msa, state (see geography coding docs)
- Time type: day (see date format docs)
- License: CC BY
These signals combine Delphi's indicators---not including cases and deaths, but including other signals expected to be related to the underlying rate of coronavirus infection---to produce a single indicator. The goal is to provide a single map of the COVID-19 activity at each geographic level each day that summarizes other indicators so that users can study its fluctuations over space and time and not be overwhelmed by having to necessarily monitor many individual sensors.
These signals were updated daily until March 17, 2021.
nmf_day_doc_fbc_fbs_ght
: This signal uses a rank-1 approximation, from a nonnegative matrix factorization approach, to identify an underlying signal that best reconstructs the Doctor Visits (smoothed_adj_cli
), Facebook Symptoms surveys (smoothed_cli
), Facebook Symptoms in Community surveys (smoothed_hh_cmnty_cli
), and Search Trends (smoothed_search
) indicators. It does not include official reports (cases and deaths from thejhu-csse
source). Higher values of the combined signal correspond to higher values of the other indicators, but the scale (units) of the combination is arbitrary. Note that the Search Trends source is not available at the county level, so county values of this signal do not use it. This signal is deprecated and is no longer updated as of March 17, 2021.nmf_day_doc_fbs_ght
: This signal is calculated in the same way asnmf_day_doc_fbc_fbs_ght
, but does not include the Symptoms in Community survey signal, which was not available at the time this signal was introduced. It also usessmoothed_cli
from thedoctor-visits
source instead ofsmoothed_adj_cli
. This signal is deprecated and is no longer updated as of May 28, 2020.
Let
At each time
over combined indicator values
We constrain the sensor-specific transformation
where
If
Furthermore, since all sensors should be increasing in the combined indicator
value, we further constrain the entries of
Since different sensor values are on different scales, we perform global column
scaling. Before approximating the matrix
Note that one might consider local column scaling before approximating the
matrix
The matrix
To ensure that our combined indicator value has comparable scaling over time and is free from erratic jumps that are just due to missingness, we use the following imputation strategies:
- lag imputation, where if a sensor is missing for all regions on a given day, we copy all observations from the last day on which any observation was available for that sensor;
-
recent imputation, where if a sensor value if missing on a given day is
missing but at least one of past
$$T$$ values is observed, we impute it with the most recent value. We limit$$T$$ to be 7 days.
Even with the above imputation strategies, we still have issues that some
sensors are never available in a given region. The result is that combined
indicator values for that region that may be on a completely different scale
from values in other regions with additional observed sensors. This can only be
overcome by regularizing or pooling information across space. Note that a very
similar problem occurs when a sensor is unavailable for a very long period of
time (so long that recent imputation is inadvisable and avoided by setting
We deal with this problem by geographic imputation, where we impute values from regions that share a higher level of aggregation (e.g., the median observed score in an MSA or state), or by imputing values from megacounties (since the counties in question are missing and hence should be reflected in the rest of state estimate). The order in which we look to perform geographic imputations is observed values from megacounties, followed by median observed values in the geographic hierarchy (county, MSA, state, or country). We chose this imputation sequence among different options by evaluating their effectiveness to mimic the actual observed sensor values in validation experiments.
We compute standard errors for the combined indicator using the bootstrap.
The input data sources are resampled individually, and the combined indicator
is recomputed for the resampled input. Then, the standard error is given by
taking the standard deviation of the resampled combined indicators. We take
The resampling method for each input source is as follows:
-
Doctor Visits: We inject a single additional observation with value 0.5 into
the calculation of the proportion, and then resample from the binomial
distribution, using the "Jeffreysized" proportion and sample size
$$n+1$$ . -
Symptom Survey: We first inject a single additional observation with value
0.35 into the calculation of the proportion
$$p$$ . Then, we sample an average of$$n+1$$ independent$$\text{Binomial}(p, m)/m$$ variables, where we choose$$m$$ so the household variance$$p(1-p)/m = n\text{SE}^2$$ , which is equivalent to sampling$$\text{Binomial}(p, mn) / mn$$ . The prior proportion of 0.35 was chosen to match the resampling distribution with the original distribution. - Facebook Community Survey: We resample from the binomial distribution, with the reported proportion and sample size.
- Google Health Trends: Because we do not have access to the sampling distribution, we do not resample this signal.
- Earliest issue available: 7 July 2020
- Number of data revisions since 19 May 2020: 1
- Date of last change: 12 October 2020
- Available for: county, msa, hrr, state (see geography coding docs)
- Time type: day (see date format docs)
These signals combine the cases and deaths data from JHU and USA Facts. This is a straight composition: the signals below use the JHU signal data for Puerto Rico, and the USA Facts signal data everywhere else. Consult each signal's documentation for information about geographic reporting, backfill, and other limitations.
These signals were updated daily until November 18, 2021.
Signal | 7-day average signal | Description |
---|---|---|
confirmed_cumulative_num |
Cumulative number of confirmed COVID-19 cases Earliest date available: 2020-02-20 |
|
confirmed_cumulative_prop |
Cumulative number of confirmed COVID-19 cases per 100,000 population Earliest date available: 2020-02-20 |
|
confirmed_incidence_num |
confirmed_7dav_incidence_num |
Number of new confirmed COVID-19 cases, daily Earliest date available: 2020-02-20 |
confirmed_incidence_prop |
confirmed_7dav_incidence_prop |
Number of new confirmed COVID-19 cases per 100,000 population, daily Earliest date available: 2020-02-20 |
deaths_cumulative_num |
Cumulative number of confirmed deaths due to COVID-19 Earliest date available: 2020-02-20 |
|
deaths_cumulative_prop |
Cumulative number of confirmed due to COVID-19, per 100,000 population Earliest date available: 2020-02-20 |
|
deaths_incidence_num |
deaths_7dav_incidence_num |
Number of new confirmed deaths due to COVID-19, daily Earliest date available: 2020-02-20 |
deaths_incidence_prop |
deaths_7dav_incidence_prop |
Number of new confirmed deaths due to COVID-19 per 100,000 population, daily Earliest date available: 2020-02-20 |