-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Reduce the import time of pytorch_lightning #12786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the thoughtful analysis! We should definitely do this |
@carmocca Can I work on this ? I am more than happy to contribute. |
Sure @Atharva-Phatak, you can do one PR per import. We can also do #11826 jointly |
Jointly doing the lazy functions and the move to the files makes sense. |
@awaelchli This makes sense. I will start working on this :) |
@lru_cache()
def _is_x_available() -> bool:
return _module_available("x") @awaelchli can we redefine the above function as, I was confused since above function looks like we need to implement it for every module "x". @lru_cache()
def _is_x_available(x : str) -> bool:
"""x: name of the library"""
return _module_available(x)
Please let me know. |
why not directly? @lru_cache()
def _module_available(x : str) -> bool:
# this is the original function |
@Borda can you clarify a bit ? :) Sorry for such a beginner question |
@Atharva-Phatak to clarify, what @Borda is saying is that the This detail and the naming aside, the important point of this issue is that we want to evaluate module availability lazily. |
oh I see so basically I just have to decorate the original _module_available function using |
Yes the Would it help if I made an example PR for a single import, and then you could work on the remaining ones? Appreciate the help. |
@awaelchli Yes that will be very helpful if I can have a sample PR. Then I will do it for all the modules :) Thanks for the help and thanks for being patient with me :) |
@Atharva-Phatak You can use the new |
Hi Thank you I will take a look :) |
@Atharva-Phatak How's it going? if you want I can help you with that ;) |
@plutasnyy It's going good. I am just a little busy with my final exams :) |
It would be interesting to repeat this analysis now that |
@awaelchli, do you have a script for it to share? 🐰 |
@Borda The details of how to run the benchmark is in the issue description. Here are the latest results. Minimal Full These are just screenshots. If it is hard to see, here are the raw files: import_minimal.log You can download them, and inspect like so by zooming in in the browser:
In summary, for the bare version we can see we are at a roughly 1sec import time. The dominating part is the lightning_app side where mps import and fastapi contribute the most. For the full version with all optional dependencies installed, we are at ~2sec import time and we see that the wandb and mlflow are standing out. There might be an opportunity to optimize there. Overall, Lightning 2.0 has already improved the situation greatly, so nice progress. |
Thank you! This is super valuable to know which items we should tackle first |
I came here to log an issue with almost the exact same screenshot from tuna. It's excellent to see a bunch of smart people are already on the case. Good stuff! I'm almost certain you will know this already, but in case you didn't, you can lazy load expensive values with the module-level |
I reran the benchmark now that we have dropped the top-level import of the app module in #18386. Minimal Full The raw files: In summary, the lightning.app dependency imports is gone. We're good on the minimal-install benchmark since there the torch import is dominating. Some work can be done to reduce the import overhead in the loggers module. Also, TorchMetrics is importing torchvision in some metrics which could be avoided/delayed I guess. Overall, we are in a much better place now! 🎉 |
Uh oh!
There was an error while loading. Please reload this page.
Proposed refactor
The current import time for the pytorch_lightning package on my machine is several seconds. There are some opportunities to improve this.
Motivation
High import times have an impact on the development and debugging speed.
Benchmark
I benchmarked the import time in two environments:
To measure the import time, I created a simple file which only imports pytorch_lightning:
Then I use the
importtime
package to measure the time and create a profile:Finally, I used tuna to visualize the profile:
For the fresh environment, the total import time is <2 secs with the following profile:
For a full development environment, the total import time is >4 seconds:
The times vary a bit between multiple runs. However, I have observed that the time is consistently higher when running in an environment with extras installed. Looking at the profiles, it looks like the origin of a large waste of time is coming from our
pytorch_lightning.utilities.imports
module where we evaluate some constants at import time:https://github.com/PyTorchLightning/pytorch-lightning/blob/ae3226ced96e2bc7e62f298d532aaf2290e6ef34/pytorch_lightning/utilities/imports.py#L98-L124
It looks like if a 3rd party package is installed and takes a long time to import, this time gets added to our loading time as well, even if the package never ends up being used. This is because our
_module_available
and_package_available
implementations attempt to import the modules to check their availability. This can be very costly.Pitch
Evaluate the import checks lazily.
Convert
to
And investigate other opportunities to improve loading time given the above profile.
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @rohitgr7 @Borda @akihironitta
The text was updated successfully, but these errors were encountered: