-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Follow-up changes to #10575 #10957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up changes to #10575 #10957
Conversation
@@ -144,12 +144,10 @@ def closure(self, *args: Any, **kwargs: Any) -> ClosureResult: | |||
) | |||
|
|||
if self._zero_grad_fn is not None: | |||
with self._profiler.profile("zero_grad"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get rid of profile("training_step_and_backward")
then we could remove the profiler reference from this class. Just pointing it out for a possible follow-up.
They are removed from here because each function will already profile them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove it. PyTorch Profiler would track those operations already at the PyTorch level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@@ -144,12 +144,10 @@ def closure(self, *args: Any, **kwargs: Any) -> ClosureResult: | |||
) | |||
|
|||
if self._zero_grad_fn is not None: | |||
with self._profiler.profile("zero_grad"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove it. PyTorch Profiler would track those operations already at the PyTorch level.
@@ -164,8 +164,7 @@ def optimizer_step( | |||
|
|||
def optimizer_zero_grad(self, current_epoch: int, batch_idx: int, optimizer: Optimizer, opt_idx: int) -> None: | |||
"""Zeros all model parameter's gradients.""" | |||
model_ref = self.lightning_module | |||
model_ref.optimizer_zero_grad(current_epoch, batch_idx, optimizer, opt_idx) | |||
self.lightning_module.optimizer_zero_grad(current_epoch, batch_idx, optimizer, opt_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not _call_LM_hook
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's called already with _call_ttp_hook
.
No TTP subclasses this hook, so technically the optimizer_loop could call _call_LM_hook
directly and this could be removed
@@ -1479,16 +1479,15 @@ def _call_callback_hooks( | |||
hook_name: str, | |||
*args: Any, | |||
**kwargs: Any, | |||
) -> Optional[Any]: | |||
output = None | |||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we returning None for callback hooks? I thought part of the motivation for #8506 was to allow returning something for callback hooks? i.e. 4) in the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was described as a limitation, but it's not something we want to support with the refactor. However, the refactor allows for adding it easily when we want to.
What does this PR do?
Follow-up to #10575
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
cc @carmocca @awaelchli @Borda @ninginthecloud