-
Notifications
You must be signed in to change notification settings - Fork 3.5k
LoggerConnector Refactor #7183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for starting this!
Any more detail around what would be simplified? |
@tchaton I think the main tension is that historically the Results object (and then
If we don't address how to split these responsibilities up inside of the Results object/LightningModule, I think the simplification on the trainer side will be limited. Here's a writeup @maximsch2 @SkafteNicki and @Borda have https://docs.google.com/document/d/16HwB8QGg3khnJWmpt4UOZlTi1kG8X9EmxS5aHyC8sYo/edit |
@carmocca @tchaton @awaelchli what do you think of adding properties to the lightning module for train/val/test/predict(?) metrics. I think elevating metrics to a top-level API inside the lightning module would bring a lot of benefit. Some of the pros:
|
Answering a few of the points...
Plan is for it to stay the same.
I don't like giving metrics a different treatment to tensors/numbers
We should flush any existing logging on running stage change. |
self.log() already treats these differently because of differences for
I meant Metric state contained inside the If the metric is logged with self.log("name", metric.compute()) - we won't do the flush for them. |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Motivation
Pitch
The LoggerConnector Logic is pretty opaque and hard to follow.
The EpochResultStore and HookResult add an extra layer of complexity and the tests are possibly too sparse to catch wrong behaviours.
One of the reason of the complexity is the non-uniformity of the stored logged data.
Description of internal functionalities:
2 . A new Result Object is created when running a new hook.
Result Object are enhanced dictionary containing a mapping key - value with the extra logged meta data and inferred batch_size.
How Result Object are stored is different between TRAIN and TEST/VALIDATION making the code complex and hard to follow.
As Logged value can either be a Metric or a float/tensor creating extra internal check for properly reduce on EpochEnd.
Proposition: Uniformize Logged Values to simplify storing them and reduction.
TODOs:
Here is the pseudo code for the
LoggedMetric
. It will wrap both Metric + tensors and greatly simplify the internal way to store information.It would also make fault tolerant training simpler as the state could be reduced and stored/reloaded as 'weighted_mean, sum(batch_sizes)'
The text was updated successfully, but these errors were encountered: