Skip to content

Reduce the import time of pytorch_lightning #12786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
awaelchli opened this issue Apr 17, 2022 · 32 comments · Fixed by #13223 or #18573
Closed

Reduce the import time of pytorch_lightning #12786

awaelchli opened this issue Apr 17, 2022 · 32 comments · Fixed by #13223 or #18573
Labels
good first issue Good for newcomers help wanted Open to be worked on performance refactor
Milestone

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Apr 17, 2022

Proposed refactor

The current import time for the pytorch_lightning package on my machine is several seconds. There are some opportunities to improve this.

Motivation

High import times have an impact on the development and debugging speed.

Benchmark

I benchmarked the import time in two environments:

  1. Fresh environment with pytorch lightning installed, no extras.
  2. My currrent environment, with many extras installed such as loggers, horovod, etc.

To measure the import time, I created a simple file which only imports pytorch_lightning:

import pytorch_lightning as pl

Then I use the importtime package to measure the time and create a profile:

python -X importtime simple.py 2> import.log

Finally, I used tuna to visualize the profile:

tuna import.log

For the fresh environment, the total import time is <2 secs with the following profile:

image

>  pip freeze  | grep torch                                            (clean-pl-env) 

pytorch-lightning==1.6.1
torch==1.11.0
torchmetrics==0.7.3

For a full development environment, the total import time is >4 seconds:

image

The times vary a bit between multiple runs. However, I have observed that the time is consistently higher when running in an environment with extras installed. Looking at the profiles, it looks like the origin of a large waste of time is coming from our

pytorch_lightning.utilities.imports module where we evaluate some constants at import time:

https://github.com/PyTorchLightning/pytorch-lightning/blob/ae3226ced96e2bc7e62f298d532aaf2290e6ef34/pytorch_lightning/utilities/imports.py#L98-L124

It looks like if a 3rd party package is installed and takes a long time to import, this time gets added to our loading time as well, even if the package never ends up being used. This is because our _module_available and _package_available implementations attempt to import the modules to check their availability. This can be very costly.

Pitch

Evaluate the import checks lazily.
Convert

_X_AVAILALBE = _module_available("x")

to

@lru_cache()
def _is_x_available() -> bool:
    return _module_available("x")

And investigate other opportunities to improve loading time given the above profile.

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @rohitgr7 @Borda @akihironitta

@awaelchli awaelchli added needs triage Waiting to be triaged by maintainers performance refactor and removed needs triage Waiting to be triaged by maintainers labels Apr 17, 2022
@awaelchli awaelchli changed the title Improve import time of pytorch_lightning Reduce the import time of pytorch_lightning Apr 17, 2022
@carmocca
Copy link
Contributor

Thanks for the thoughtful analysis! We should definitely do this

@carmocca carmocca added this to the 1.7 milestone Apr 18, 2022
@Atharva-Phatak
Copy link
Contributor

@carmocca Can I work on this ? I am more than happy to contribute.

@carmocca
Copy link
Contributor

Sure @Atharva-Phatak, you can do one PR per import.

We can also do #11826 jointly

@Atharva-Phatak
Copy link
Contributor

@carmocca quick questions in #11826 you have mentioned some libraries which needs to be moved. So in my PR should I only use the mentioned libraries right ?

Also can't I do all the libraries in one PR ? or would it be better if I do one PR per import.

Please let me know

@awaelchli
Copy link
Contributor Author

Jointly doing the lazy functions and the move to the files makes sense.
Doing one PR per import is maybe a bit overkill. I would group them by locality.
For example: the two deepspeed in one pr, all logger-related imports in one pr, all fairscale imports in one pr and so on.
Thanks for the help.

@Atharva-Phatak
Copy link
Contributor

@awaelchli This makes sense. I will start working on this :)

@Atharva-Phatak
Copy link
Contributor

Atharva-Phatak commented Apr 21, 2022

@lru_cache()
def _is_x_available() -> bool:
    return _module_available("x")

@awaelchli can we redefine the above function as, I was confused since above function looks like we need to implement it for every module "x".

@lru_cache()
def _is_x_available(x : str) -> bool:
      """x: name of the library"""
     return _module_available(x)

Please let me know.

@awaelchli
Copy link
Contributor Author

I don't have a strong preference here. @carmocca @Borda ?

The name as part of the function name makes it less error prone to use and probably easier to refactor the current imports, as it it can be renamed using a simple regex pattern everywhere.

@Borda
Copy link
Member

Borda commented Apr 22, 2022

why not directly?

@lru_cache()
def _module_available(x : str) -> bool:
      # this is the original function

@Atharva-Phatak
Copy link
Contributor

@Borda can you clarify a bit ? :) Sorry for such a beginner question

@awaelchli
Copy link
Contributor Author

@Atharva-Phatak to clarify, what @Borda is saying is that the @lru_cache() decorator can be added to the main function that does the import check and doesn't have to be added to each individual one as I showed in my post above.

This detail and the naming aside, the important point of this issue is that we want to evaluate module availability lazily.

@Atharva-Phatak
Copy link
Contributor

oh I see so basically I just have to decorate the original _module_available function using @lru_cache and create a PR.
Anything else that I should look out for @awaelchli

@awaelchli
Copy link
Contributor Author

awaelchli commented Apr 24, 2022

Yes the lru_cache can be added there. But this is not really the main topic of this issue. The real work is in converting the _X_AVAILABLE constants to functions and evaluate them lazily, i.e., at the individual module level only when they are needed!

Would it help if I made an example PR for a single import, and then you could work on the remaining ones? Appreciate the help.

@Atharva-Phatak
Copy link
Contributor

@awaelchli Yes that will be very helpful if I can have a sample PR. Then I will do it for all the modules :)

Thanks for the help and thanks for being patient with me :)

@carmocca
Copy link
Contributor

@Atharva-Phatak You can use the new _RequirementAvailable class as shown in #13035 for the CLI

@carmocca carmocca added the good first issue Good for newcomers label May 12, 2022
@Atharva-Phatak
Copy link
Contributor

Hi Thank you I will take a look :)

@plutasnyy
Copy link

@Atharva-Phatak How's it going? if you want I can help you with that ;)

@Atharva-Phatak
Copy link
Contributor

@plutasnyy It's going good. I am just a little busy with my final exams :)

@carmocca
Copy link
Contributor

It would be interesting to repeat this analysis now that lightning imports all app, fabric, and pl packages

@Borda
Copy link
Member

Borda commented Mar 16, 2023

@awaelchli, do you have a script for it to share? 🐰

@awaelchli
Copy link
Contributor Author

awaelchli commented Mar 17, 2023

@Borda The details of how to run the benchmark is in the issue description. Here are the latest results.

Minimal
Fresh environment,
pip install lightning, nothing else:

import_minimal

Full
Fresh environment
pip install lightning deepspeed wandb neptune mlflow comet-ml tensorboard jsonargparse[signatures]

import_full

These are just screenshots. If it is hard to see, here are the raw files:

import_minimal.log
import_full.log

You can download them, and inspect like so by zooming in in the browser:

pip install tuna
tuna import_full.log

In summary, for the bare version we can see we are at a roughly 1sec import time. The dominating part is the lightning_app side where mps import and fastapi contribute the most. For the full version with all optional dependencies installed, we are at ~2sec import time and we see that the wandb and mlflow are standing out. There might be an opportunity to optimize there. Overall, Lightning 2.0 has already improved the situation greatly, so nice progress.

@carmocca
Copy link
Contributor

Thank you! This is super valuable to know which items we should tackle first

@davidgilbertson
Copy link
Contributor

davidgilbertson commented May 5, 2023

I came here to log an issue with almost the exact same screenshot from tuna. It's excellent to see a bunch of smart people are already on the case. Good stuff!

I'm almost certain you will know this already, but in case you didn't, you can lazy load expensive values with the module-level __getattr__ from PEP 562 (docs)

@awaelchli
Copy link
Contributor Author

I reran the benchmark now that we have dropped the top-level import of the app module in #18386.

Minimal
Fresh environment,
pip install lightning, nothing else:
minimal-install

Full
Fresh environment
pip install lightning deepspeed wandb neptune mlflow comet-ml tensorboard jsonargparse[signatures]

full-install

The raw files:

import-minimal.log
import.log

In summary, the lightning.app dependency imports is gone. We're good on the minimal-install benchmark since there the torch import is dominating. Some work can be done to reduce the import overhead in the loggers module. Also, TorchMetrics is importing torchvision in some metrics which could be avoided/delayed I guess.

Overall, we are in a much better place now! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Open to be worked on performance refactor
Projects
None yet
7 participants