--finetune_forcasting overwrites models from pre-training #258

kctezcan · 2025-05-20T14:28:09Z

Is your feature request related to a problem? Please describe.

When I do forecasting finetuning with "--finetune_forecast", it starts training from epoch0000 and overwrites all the models that were saved before during the pre-training.

So technically we lose the models from pre-training.

Technically, if we run finetuning longer epoches than pre-training, we will even lose the last pre-training checkpoint.

Describe the solution you'd like

Either

we change the name of the saved model to indicate it is finetuning, e.g. /bqcywx9m_epoch00083_mtm.chkpt, /bqcywx9m_epoch00083_ft1.chkpt
or we put these into a new folder in the run_id. Example, if run_id=bqcywx9m, then we have
/models/bqcywx9m/mtm_training/bqcywx9m_epoch00054.chkpt
/models/bqcywx9m/finetuning/bqcywx9m_epoch00021.chkpt
...
or we use different run_id's

Ideally I would prefer 2, then 1, then 3.

We can call these stage1, stage2 etc as well...

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

kctezcan · 2025-05-20T14:28:32Z

What do you think @kacpnowak @clessig ?

clessig · 2025-05-21T06:34:44Z

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.

In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

kctezcan · 2025-05-21T07:00:06Z

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.

In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

OK, I will use new run_id's then. Thanks for the clarification

clessig · 2025-05-21T07:11:59Z

The intended solution is to have a new run_id when you fine-tune, ie. whenever any settings in the config are changed. This should be changed to true by default.
In general, we will pre-train a model and fine-tune it into different directions. This is only possible if each "stage" has a unique run_id. It might seem more complicated than necessary but I think it's the only solution in the longer term. The heritage can than be tracked through a list/DAG of run_ids.

OK, I will use new run_id's then. Thanks for the clarification

Can you open a PR to fix the default value?

tjhunter · 2025-05-21T07:53:43Z

I am in the process of writing a small design document so that agree on the semantics of the run_ids

kctezcan · 2025-05-21T08:05:10Z

@clessig #259 Feel free to merge if OK.

kctezcan added the enhancement New feature or request label May 20, 2025

clessig closed this as completed May 21, 2025

kctezcan mentioned this issue May 21, 2025

Ktezcan/dev/iss258 new runid finetng #259

Closed

13 tasks

This was referenced May 27, 2025

Evaluation not returning new run_id by default #261

Open

clean up assignment of run_ids #272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

--finetune_forcasting overwrites models from pre-training #258

--finetune_forcasting overwrites models from pre-training #258

kctezcan commented May 20, 2025 •

edited

Loading

kctezcan commented May 20, 2025

Uh oh!

clessig commented May 21, 2025 •

edited

Loading

Uh oh!

kctezcan commented May 21, 2025

Uh oh!

clessig commented May 21, 2025

Uh oh!

tjhunter commented May 21, 2025

Uh oh!

kctezcan commented May 21, 2025

Uh oh!

--finetune_forcasting overwrites models from pre-training #258

--finetune_forcasting overwrites models from pre-training #258

Comments

kctezcan commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

kctezcan commented May 20, 2025

Uh oh!

clessig commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kctezcan commented May 21, 2025

Uh oh!

clessig commented May 21, 2025

Uh oh!

tjhunter commented May 21, 2025

Uh oh!

kctezcan commented May 21, 2025

Uh oh!

kctezcan commented May 20, 2025 •

edited

Loading

clessig commented May 21, 2025 •

edited

Loading