-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[feature request] Can we decouple metrics publishing from LightningLoggerBase? #11209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@wilson100hong to better understand your request: It seems like the publisher is the Do you feel like you have to work around the existing logger APIs? If so, is that because the logger API carries bloat, which leads to a lack of clarity as a developer? If so, do you think we could improve this instead by streamlining the existing logger API? Or do you think we truly need a separate publisher interface? For a "publisher" like TensorBoard or MLFlow, does this mean there'd need to be both publisher and logger implementations to use them? Where do you draw the lines between them? Do you have an example of how you'd configure this as an end user? Some areas we could streamline:
This exists to check if there are multiple (key, value) pairs logged for the same step index. Looking back at the issue which drove this addition (#1173), I don't think the motivation for having a dedicated @carmocca is my understanding correct? I personally have never used or even seen others use the Nor have I seen users update them via the setter: https://github.com/PyTorchLightning/pytorch-lightning/blob/eb5b350f9a6bd27a66dfebcb00b3acb33b7bbb89/pytorch_lightning/loggers/base.py#L83-L102 Moreover, none of the logger implementations even pass through those args to the Would deprecating this provide clarity around when data is actually sent to the publishing clients inside of the logger?
Such a collection is somewhat handy for writing data, but is much less helpful when reading data, especially when individual loggers have different properties. It's not clear at all what Importantly, we offer no such LoggerCollection-equivalent for Callbacks. We treat them as a flat list. Why are loggers any different? A simplification here be to formally support a sequence of loggers in the trainer, drop the assumption of a single logger being used (from the trainer's POV), and simply iterate over a sequence of loggers for the write calls.
Some areas we have streamlined:
As you mentioned throttling, would the same principle apply here? Throttling could also sit at the logger implementation. |
Thanks for writing this up! A few questions:
|
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Problem
In our production environment for Lightning, we log metrics by using
LightningLoggerBase
s and publish them into multiple sources. For example, we publish QPS to TensorBoard, databases, dashboards and monitors. Each publishing share the same metrics and aggregation logics, but can have different flush frequency settings to avoid DDoS their backend.What makes things more complex is, we have multiple Loggers to record metrics from difference sources (model, framework, etc) and some Loggers may share publishing destinations (but not identical). For example, we have a model metrics Logger and a system metrics Logger both publishes to TensorBoard, but model metrics Logger does not publish to system monitor as system metrics Logger does.
Motivation
The motivation is in our use case, we need to publish the same metrics to multiple databases, including Tensorboard, dashboards and monitoring system. Since those data sinks share the same metrics data, and the same aggregation logics, so we want to use "single" logger instead of LoggerCollection to simplify the code. Also, decoupling MetricsPublisher out enables us to customize arbitrary combinations of metrics publishing by reusing existing implementations.
Pitch
This decoupling provides isolation from logging and publishing, and simplifies LightningLoggerBase. It increase code reusability without re-implement similar publishing logics and throttling control across Loggers
It does not introduce compatibility issue since it does not change existing APIs; users can still use exiting logger implementation coupled with publishing.
Proposal
MetrcsPublisher
. Users implement subclass to publish metrics to specific data sinks. MetricsPublisher also provides configurable publishing frequency control to throttle flushing, thus to avoid DDoS the backend. The interface would be like:MetricsLogger
, a subclass ofLightningLoggerBase
. MetricsLogger is a pure Logger but can be attached withMetricsPublisher
s. Whensave()
orclose()
is called, it will iterate all attached MetricsPublishers and invoke theirpublish()
:Alternatives
N/A, so far I didn't see alternative for such decoupling. But if so, happy to discuss.
Additional context
N/A
cc @Borda @tchaton @justusschock @awaelchli @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy
The text was updated successfully, but these errors were encountered: