Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModelCheckpoint need monitor average loss #20652

Open
Johnson-yue opened this issue Mar 18, 2025 · 0 comments
Open

ModelCheckpoint need monitor average loss #20652

Johnson-yue opened this issue Mar 18, 2025 · 0 comments
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers

Comments

@Johnson-yue
Copy link

Johnson-yue commented Mar 18, 2025

Description & Motivation

I know ModelCheckpoint can monitor like "train_loss" , "val_loss" , when the value is min and "every_n_train_steps" is true. but I want to cache "train_loss" or "val_loss" from "every_n_train_steps" to next "every_n_train_steps" ,when cache loss is lower than before, then save model.
Example:
every_n_train_steps = 50, monitor = "train_loss"

from step=50 to step=100, if average "train_loss" is lower than step=[0,50] then save model

Pitch

No response

Alternatives

No response

Additional context

No response

cc @lantiga @Borda

@Johnson-yue Johnson-yue added feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers labels Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant