Skip to content

Refactor trainer._log_device_info() method and warnings #11014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
four4fish opened this issue Dec 9, 2021 · 4 comments
Open

Refactor trainer._log_device_info() method and warnings #11014

four4fish opened this issue Dec 9, 2021 · 4 comments
Assignees
Milestone

Comments

@four4fish
Copy link
Contributor

four4fish commented Dec 9, 2021

Proposed refactor

Raised in discussion by @ananthsub and @justusschock in #11001 1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type

Motivation

_log_device_info() in trainer is too verbose and message are not helpful
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L1630-L1661

Accelerator/device selection Only happens in accelerator_connector, and related warning and logging should happen in accelerator_connector as well.

The warning and logic can be merged into select_accelerator_type()

Pitch

Simplify the log warning in trainer._log_device_info() and make it less verbose, remove unnecessary warnings, reduce log level from warning to debug

Move _log_device_info() to accelerator_connector and call at the end of the init(), or merge the logic into accelerator_connector

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @carmocca @edward-io @ananthsub @kaushikb11 @ninginthecloud

@kaushikb11
Copy link
Contributor

related warning and logging should happen in accelerator_connector as well.

@four4fish It was initially part of the Accelerator Connector. But we had come across a use case where log_device_info needed to be overridden with custom logic. Hence, we introduced it as a method in Trainer for this purpose.

@four4fish
Copy link
Contributor Author

@kaushikb11 thank you for sharing the background! Could you share more detail about the use cases? Do you mean user want to override log_device_info()? There was no issue and I couldn't see any details from the PR
Is this still required?

@four4fish four4fish added this to the 1.6 milestone Dec 9, 2021
@four4fish four4fish self-assigned this Dec 9, 2021
@kaushikb11
Copy link
Contributor

Sure! We could have frameworks building on top of Lightning Trainer. For instance, they introduced a new IPEX accelerator and added modifications for it. _log_device_info would help them customize device info logging as well.

@four4fish
Copy link
Contributor Author

Interesting use case!
But seems user extended trainer class with Accelerator() and Plugins selection logic, then the accelerator, plugins and args been passed into super()init, which will call our accelerator_connector()? In that case, the log_device could be in either place? Am I missing something?

@carmocca carmocca removed the logging Related to the `LoggerConnector` and `log()` label Feb 16, 2022
@carmocca carmocca modified the milestones: 1.6, future Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants