You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using multiple loggers with the Trainer, the ModelCheckpoint callback simply appends the experiment names and version names of both loggers to generate its dirpath. Furthermore, the checkpoints get saved in the current working directory (default_root_dir which is os.get_cwd() in the Trainer) instead of the save_dir specified in the loggers.
Logically, even when using multiple loggers with a Trainer we are still logging the same experiment. If the save_dir generated from multiple loggers is consistent then the checkpoints should be saved at that location, instead of saving in the default_root_dir
To Reproduce
In the BoringModel code, I added two loggers (CSVLogger and TensorBoardLogger), both with the same save directory ('logs') and version ('0' in example code - inferred by default). Now, add a ModelCheckpoint callback with no dirpath specified.
The expected behaviour (for example what happens when using a single logger) is that the checkpoints folder gets created in the os.path.join(save_dir, name, version) as specified by the loggers.
However, if there is more than one logger then name and versions from all loggers are appended to generate the ModelCheckpoint dirpath (e.g. default_default instead of default, and 0_0 instead of 0). And the checkpoints get saved in the current working directory (default_root_dir which is os.getcwd() in the Trainer) instead of the save_dir specified in the logger.
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
When using multiple loggers with the Trainer, the ModelCheckpoint callback simply appends the experiment names and version names of both loggers to generate its dirpath. Furthermore, the checkpoints get saved in the current working directory (
default_root_dir
which isos.get_cwd()
in the Trainer) instead of the save_dir specified in the loggers.Logically, even when using multiple loggers with a Trainer we are still logging the same experiment. If the save_dir generated from multiple loggers is consistent then the checkpoints should be saved at that location, instead of saving in the
default_root_dir
To Reproduce
In the BoringModel code, I added two loggers (CSVLogger and TensorBoardLogger), both with the same save directory ('logs') and version ('0' in example code - inferred by default). Now, add a ModelCheckpoint callback with no dirpath specified.
Also reproduced on Colab notebook:
https://colab.research.google.com/drive/1zk77y5A9xkuAPN0Oip8pIkO_KgJvgx0H?usp=sharing
Expected behavior
The expected behaviour (for example what happens when using a single logger) is that the checkpoints folder gets created in the
os.path.join(save_dir, name, version)
as specified by the loggers.However, if there is more than one logger then name and versions from all loggers are appended to generate the ModelCheckpoint dirpath (e.g.
default_default
instead ofdefault
, and0_0
instead of0
). And the checkpoints get saved in the current working directory (default_root_dir
which isos.getcwd()
in the Trainer) instead of the save_dir specified in the logger.Environment
Additional context
cc @awaelchli @edward-io @Borda @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy @carmocca @ninginthecloud @jjenniferdai
The text was updated successfully, but these errors were encountered: