GitHub - cmu-delphi/DelphiRF

DelphiRF: Delphi - Revision Forecast

Delphi-RF is a nonparametric quantile regression framework designed to correct real-time data revisions in public health surveillance. By modeling revision dynamics and incorporating all available updates up to a given observation date, Delphi-RF generates distributional forecasts of finalized surveillance values for both count and fraction-based data.

Installation

You can install the package directly from GitHub using:

devtools::install_github("cmu-delphi/DelphiRF")

If you have downloaded the package to a local directory, you can install it with:

install.packages("devtools")  # Install devtools if not already installed
devtools::install("path/to/DelphiRF")  # Replace with the actual path

DelphiRF requires the quantgen R package and linear programming solvers.
Follow the installation instructions provided here: quantgen GitHub repository.

Data Preprocessing

data_preprocessing processes input data for DelphiRF, ensuring it is structured correctly for modeling. Users are required to specify:

df: A data frame containing the input data
value_col: A string indicating the column containing the observed values
refd_col: A string indicating the column containing the reference dates
lag_col: A string indicating the column containing lag values
ref_lag: The reference lag used to align the forecast response
suffixes: A character vector specifying suffixes for generated value columns (default: c(""))
lagged_term_list: An optional list of lagged terms to include in preprocessing
value_type: A string indicating the type of value (“count” or “fraction”) (default: “count”)
temporal_resol: The temporal resolution of the data, either "daily" or “weekly” (default: “daily”)
smoothed: A logical value (TRUE/FALSE) indicating whether the value column is alrealdy smoothed (default: FALSE)

For example, ma_dph is a count-based dataset containing the 7-day moving average of daily reported COVID-19 confirmed cases in MA.

library(DelphiRF)
library(ggplot2)
library(dplyr)
data(ma_dph)
tail(ma_dph)

##         test_date total_positive new_positive confirmed_case_7d_avg issue_date lag
## 257376 2022-07-14        1788698         1533              1470.286 2022-07-21   7
## 257377 2022-07-15        1790062         1364              1451.857 2022-07-21   6
## 257378 2022-07-16        1790912          850              1424.857 2022-07-21   5
## 257379 2022-07-17        1791840          928              1416.429 2022-07-21   4
## 257380 2022-07-18        1793065         1225              1329.286 2022-07-21   3
## 257381 2022-07-19        1793437          372              1122.714 2022-07-21   2

During the preprocessing step, the user would specify the following for ma_dph:

value_col <- "confirmed_case_7d_avg"
refd_col <- "test_date"
lag_col <- "lag"
ref_lag <- 14
smoothed <- TRUE

The data is preprocessed using the following steps:

# Preprocess the data for a specific location
df <- data_preprocessing(
  ma_dph, 
  value_col = "confirmed_case_7d_avg", 
  refd_col = "test_date", 
  lag_col = "lag", 
  ref_lag = 14, 
  smoothed = TRUE
)

# Define the test date range
test_dates <-  seq(from = as.Date("2022-06-15"),
                   to = as.Date("2022-06-30"),
                   by = "day")

# Filter training data
train_data <- df %>%
  filter(report_date < min(test_dates))

# Add weighting-related features to training data
# This should be done after the train_data is filtered with test_date
train_data <- add_weights_related(train_data)

# Filter test data
test_data <- df %>%
  filter(report_date %in% test_dates)

Model Training and Forecast Generation

Running DelphiRF

To perform forecasting using DelphiRF, you need to specify:

df: A data frame containing the input data.
testing_start_date: A Date object indicating the start date for testing (e.g., as.Date("2022-06-01")).

The function takes the entire revision dataset as input and automatically divides it into multiple test_lag_groups, where each group is assumed to have the same revision pattern. For each test_lag_group, the function trains the model and generates forecasts as needed using fixed hyperparameters.

# Run DelphiRF for forecasting  
results <- DelphiRF(df, as.Date("2022-06-01"))  

# Compute mean WIS score for each lag
mean_wis_by_lag <- results %>%
  group_by(lag) %>%
  summarise(mean_wis = mean(wis, na.rm = TRUE))

# Compute mean WIS score for each lag  
head(mean_wis_by_lag)  

## # A tibble: 6 × 2
##     lag mean_wis
##   <dbl>    <dbl>
## 1     0  0.0796 
## 2     1  0.0793 
## 3     2  0.0830 
## 4     3  0.0272 
## 5     4  0.0155 
## 6     5  0.00970

# Plot using ggplot2
ggplot(mean_wis_by_lag, aes(x = lag, y = mean_wis)) +
  geom_line(color = "black", linewidth = 1) +  # Line plot
  geom_point(color = "red", size = 2) +  # Add points
  labs(title = "Mean WIS Score by Lag",
       x = "Lag",
       y = "Mean WIS Score") +
  theme_minimal()  # Apply a clean theme

visual_part <- results %>%
  filter(report_date == as.Date("2022-06-30")) %>%
  filter(reference_date >= as.Date("2022-06-10"))

ggplot(visual_part) + geom_line(aes(reference_date, exp(predicted_tau0.5),col="Forecasted"), linetype="longdash") +
  geom_line(aes(reference_date, exp(log_value_7dav), col="Reported"),linetype="solid") +
  geom_line(aes(reference_date, exp(log_value_target_7dav), col="Target"),linetype="solid") +
  scale_colour_manual(name="",values=c("indianred3","gray", "black"))+
  theme_classic()+
  geom_ribbon(fill="indianred3",aes(x = reference_date,ymin=exp(predicted_tau0.25),
                                    ymax=exp(predicted_tau0.75)),alpha=0.3)+
  xlab("Reference Date") + ylab("Estimated cases") +
  ggtitle("Observed and forecasted number of cases on date 2022-06-30")

Optional Arguments

The user may optionally specify the following arguments: - taus: A numeric vector of quantiles to be estimated (default: TAUS).

test_lag_groups: A list specifying the lag groups to be tested (default: TEST_LAG_GROUPS).
smoothed_target: A logical value indicating whether the target variable is smoothed (TRUE) or not (FALSE) (default: TRUE).
lagged_term_list: An optional list of lagged terms to include in the model.
params_list: An optional list of parameters for model training (more details in the create_params_list function in model.R).
lambda: A numeric value representing the regularization parameter for the model (default: 0.1).
gamma: A numeric value representing the weight for the loss function (default: 0.1).
lag_pad: A numeric value indicating the padding for lag variables (default: 1).
temporal_resol: A string specifying the temporal resolution of the data, either "daily" or "weekly" (default: "daily").
lp_solver: A string indicating the linear programming solver to be used (default: LP_SOLVER). - training_days: A numeric value indicating the number of days to include in the training period (default: 365).
train_models: A logical value indicating whether to train models (TRUE) or not (FALSE) (default: TRUE).
make_predictions: A logical value indicating whether to generate predictions (TRUE) or not (FALSE) (default: TRUE).

Output File Naming Parameters:

These parameters define how output files will be named: - geo: A string specifying the geographical location code (default: "ma").

value_type: A string indicating the type of the target variable, either "count" or "fraction" (default: "count").
model_save_dir: A string specifying the directory path where the model will be saved (default: "./receiving").
indicator: A string representing the indicator name for the data (default: "testdata").
signal: A string specifying the signal name associated with the indicator (default: "").
geo_level: A string indicating the geographical level, such as "state" or "county" (default: "state").
signal_suffix: A string to be appended to the signal name (default: "").
training_end_date: An optional string specifying the end date for the training period (default: "").

Generate a Revision Forecast for a Specific `test_lag_group`

This function trains a forecasting model and generates forecasts as needed for a single test_lag_group using pre-filtered data. It is utilized within the DelphiRF function.

Users can directly call the revision_forecast function if they assume a consistent revision pattern in the given data. To use this function, users must specify train_data and test_data separately.

# Prepare training and test data for lag = 1  
train_data_for_lag1 <- data_filteration(1, train_data, 0)  
test_data_for_lag1 <- data_filteration(1, test_data, 0)  

# Perform revision forecasting  
result <- revision_forecast(
  train_data_for_lag1, test_data_for_lag1, TAUS,
  lambda = 0.1, gamma = 0.1,
  train_models = TRUE,
  make_prediction = TRUE
)

# Display the last few rows of the result  
tail(result[c("reference_date", "report_date", "lag", "log_value_7dav", "log_value_target_7dav", "wis")])  

##    reference_date report_date lag log_value_7dav log_value_target_7dav        wis
## 11     2022-06-24  2022-06-25   1       0.000000              7.189599 0.04887828
## 12     2022-06-25  2022-06-26   1       0.000000              7.195723 0.05724033
## 13     2022-06-26  2022-06-27   1       7.036148              7.204893 0.04950035
## 14     2022-06-27  2022-06-28   1       7.029088              7.239830 0.05721481
## 15     2022-06-28  2022-06-29   1       6.992096              7.240342 0.05309328
## 16     2022-06-29  2022-06-30   1       7.027188              7.247691 0.05278094

# Compute the mean WIS score  
mean(result$wis, na.rm = TRUE)    

## [1] 0.05371732

Visualize the forecast result:

ggplot(result) + geom_line(aes(reference_date, exp(predicted_tau0.5),col="Forecasted"), linetype="longdash") +
  geom_line(aes(reference_date, exp(log_value_7dav), col="Reported"),linetype="solid") +
  geom_line(aes(reference_date, exp(log_value_target_7dav), col="Target"),linetype="solid") +
  scale_colour_manual(name="",values=c("indianred3","gray", "black"))+
  theme_classic()+
  geom_ribbon(fill="indianred3",aes(x = reference_date,ymin=exp(predicted_tau0.25),
                                    ymax=exp(predicted_tau0.75)),alpha=0.3)+
  xlab("Reference Date") + ylab("Estimated cases") +
  ggtitle("Observed and forecasted number of cases at lag 1")

Cross-Validation for Hyperparameter Tuning

The cv_revision_forecast function performs cross-validation over a grid of hyperparameters for a single test_lag_group using pre-filtered data. It returns the optimal set of hyperparameters that maximize forecast accuracy.

# Perform cross-validation to select optimal hyperparameters  
hyperparams <- cv_revision_forecast(
  train_data, 1,
  lambda_candidates = c(0.01, 0.1, 1),
  gamma_candidates = c(0.1, 1, 10),
  lag_pad_candidates = c(0, 1, 2, 3),
  n_folds = 2
)

# Display selected hyperparameters  
hyperparams  

## $best_lambda
## [1] 0.1
## 
## $best_gamma
## [1] 0.1
## 
## $best_lag_pad
## [1] 2

# Training and Forecasting with Optimized Parameters
# Prepare training and test data using optimized lag padding  
train_data_for_lag1 <- data_filteration(1, train_data, hyperparams$best_lag_pad)  
test_data_for_lag1 <- data_filteration(1, test_data, 0)  

# Perform revision forecasting with optimized parameters  
result <- revision_forecast(
  train_data_for_lag1, test_data_for_lag1, TAUS,
  lambda = hyperparams$best_lambda, 
  gamma = hyperparams$best_gamma,
  train_models = TRUE,
  make_prediction = TRUE
)

# Compute the mean WIS score  
mean(result$wis, na.rm = TRUE)  

## [1] 0.05498269

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
R		R
README_files/figure-markdown_strict		README_files/figure-markdown_strict
cache		cache
data		data
man-roxygen		man-roxygen
man		man
receiving/testdata_count_state		receiving/testdata_count_state
tests		tests
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
DelphiRF.Rproj		DelphiRF.Rproj
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DelphiRF: Delphi - Revision Forecast

Installation

Data Preprocessing

Model Training and Forecast Generation

Running DelphiRF

Optional Arguments

Output File Naming Parameters:

Generate a Revision Forecast for a Specific `test_lag_group`

Cross-Validation for Hyperparameter Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

cmu-delphi/DelphiRF

Folders and files

Latest commit

History

Repository files navigation

DelphiRF: Delphi - Revision Forecast

Installation

Data Preprocessing

Model Training and Forecast Generation

Running DelphiRF

Optional Arguments

Output File Naming Parameters:

Generate a Revision Forecast for a Specific test_lag_group

Cross-Validation for Hyperparameter Tuning

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Generate a Revision Forecast for a Specific `test_lag_group`

Packages