Skip to content

Rolling Cross-validation for Time-series #1026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
justinormont opened this issue Sep 25, 2018 · 1 comment
Open

Rolling Cross-validation for Time-series #1026

justinormont opened this issue Sep 25, 2018 · 1 comment
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@justinormont
Copy link
Contributor

To properly handle time-series (and time-dependent data in general), we should implement a Rolling Cross-validation to add to our existing CV & TrainTest modes.

We are currently merging various time-series functionality from the internal repo to this repo via #977 "Port Time Series". This PR does not include a rolling cross-validation, used heavily in time-series tasks.

Rolling CV is better for time dependent datasets by always testing on data which is newer than the training data. Standard CV leaks future data in to the training set. Other names of Rolling CV include { walk-forward / roll-forward / rolling origin / window } CV.

Background on method:
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
https://otexts.org/fpp2/accuracy.html#time-series-cross-validation
https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection
https://robjhyndman.com/hyndsight/tscv/
https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/46602
https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

To further investigate missing time-series components, the Azure ML Forecasting Toolkit is a good package listing components needed for this task:

@justinormont justinormont added enhancement New feature or request Time Series labels Sep 25, 2018
@codemzs
Copy link
Member

codemzs commented Sep 26, 2018

Thanks, @justinormont ! I too feel we should consider rolling CV as part of time series effort.

@codemzs codemzs mentioned this issue Oct 2, 2018
@ganik ganik added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

4 participants