Skip to content

--finetune_forcasting overwrites models from pre-training #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kctezcan opened this issue May 20, 2025 · 6 comments
Closed

--finetune_forcasting overwrites models from pre-training #258

kctezcan opened this issue May 20, 2025 · 6 comments
Labels
enhancement New feature or request

Comments

@kctezcan
Copy link
Contributor

kctezcan commented May 20, 2025

Is your feature request related to a problem? Please describe.

When I do forecasting finetuning with "--finetune_forecast", it starts training from epoch0000 and overwrites all the models that were saved before during the pre-training.

So technically we lose the models from pre-training.

Technically, if we run finetuning longer epoches than pre-training, we will even lose the last pre-training checkpoint.

Describe the solution you'd like

Either

  1. we change the name of the saved model to indicate it is finetuning, e.g. /bqcywx9m_epoch00083_mtm.chkpt, /bqcywx9m_epoch00083_ft1.chkpt
  2. or we put these into a new folder in the run_id. Example, if run_id=bqcywx9m, then we have
    /models/bqcywx9m/mtm_training/bqcywx9m_epoch00054.chkpt
    /models/bqcywx9m/finetuning/bqcywx9m_epoch00021.chkpt
    ...
  3. or we use different run_id's

Ideally I would prefer 2, then 1, then 3.

We can call these stage1, stage2 etc as well...

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

@kctezcan kctezcan added the enhancement New feature or request label May 20, 2025
@kctezcan
Copy link
Contributor Author

What do you think @kacpnowak @clessig ?

@clessig
Copy link
Collaborator

clessig commented May 21, 2025

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.

In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

@clessig clessig closed this as completed May 21, 2025
@kctezcan
Copy link
Contributor Author

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.

In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

OK, I will use new run_id's then. Thanks for the clarification

@clessig
Copy link
Collaborator

clessig commented May 21, 2025

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.
In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

OK, I will use new run_id's then. Thanks for the clarification

Can you open a PR to fix the default value?

@tjhunter
Copy link
Collaborator

I am in the process of writing a small design document so that agree on the semantics of the run_ids

@kctezcan
Copy link
Contributor Author

@clessig #259 Feel free to merge if OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants