Skip to content

[RFC] Introduce Plugin Registry #6923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SeanNaren opened this issue Apr 9, 2021 · 8 comments
Closed

[RFC] Introduce Plugin Registry #6923

SeanNaren opened this issue Apr 9, 2021 · 8 comments
Assignees
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@SeanNaren
Copy link
Contributor

SeanNaren commented Apr 9, 2021

🚀 Feature

Introduce a registry, where plugins and users can register their own plugins to alias when creating the trainer (cc @tchaton who came up with the idea).

from pytorch_lightning.plugins import plugins_registry

print(plugins_registry.available_plugins(with_descriptions=True, with_recommendations=True))

# prints
# ddp_sharded: description ..., recommended when: ...
# deepspeed: description ..., recommended when: ...
# deepspeed_zero_3: description ..., recommended when: ...
# deepspeed_zero_3_offload: description ..., recommended when: ...
# ...

pl.Trainer(plugins='deepspeed_zero_3_offload')

Motivation

When using https://github.com/SeanNaren/minGPT I was testing across a bunch of different configurations across DeepSpeed and Sharded training. I was doing it manually, by specifying the plugin and the options available:

    trainer = Trainer.from_argparse_args(
        args,
        log_every_n_steps=1,
        max_epochs=1,
        # plugins=DDPPlugin(find_unused_parameters=True), # keep these for later to run
        # plugins=DDPShardedPlugin(ddp_comm_hook=...), # keep these for later to run
        plugins=DeepSpeedPlugin(stage=2, cpu_offload=False, logging_level=logging.INFO),
        gradient_clip_val=1.0,
        checkpoint_callback=False,
        callbacks=[lr_decay, CUDACallback()],
    )

This is extremely cumbersome and we're seeing a similar trend appear for other libraries as well, case or case. Having aliases allows users to use the trainer without having to modify the code. This is also important for cloud train agent solutions that rely on this flexibility from repos.

To support this in code right now, I either A) have to integrate a config tool to allow me to instantiate the plugins (Hydra) or B) add a bunch of branching statements and argparse.

Given the goal of Lightning is to reduce boilerplate, my proposal moves this boilerplate into Lightning rather than user code via a registry.

Alternatives

Force users to use an instantiating package such as Hydra in their own repo or implement their own branching logic to select a plugin.

Points from @carmocca discussion. I don't have answers to all these and most are not important now, but are important once this registry idea has been approved from the high level API.

How many plugins would we provide?

Leave it up to the plugin to decide. It would be nice to have an interface for the plugin (maybe an inherited classmethod) that defines the alias to add the plugin, and combinations to a registry.

Where would the registry be?

I think this question boils down to if we want a general registry across PL or separate registries. The registry should live in PL, and for a first iteration should be preferably tied to plugins for now. I can see further iterations if the idea proves more value to move to other arguments.

Should we do this for other trainer arguments?

Based on the above, I'd start with plugins first before moving onto all other arguments primarily to ensure the idea proves value. I can imagine a world where the registry provides easier access to certain objects that can be instantiated easily.

How would users register their own plugins?

Using the same registry. Preferably with a function or context manager.

Where would we register our plugins?

Like done in many other repos, we can add our pre-built plugins by a simple search across the plugins directory.

Will this introduce too much clutter?

To me this is a non-issue as I prioritise reducing user code clutter over lightning code clutter, and if aliases are handled by plugins, this means the responsibility is distributed. Currently to use custom plugins and the various configurations, it's impractical. If we'd like to continue reducing boilerplate we'll need some form of registry to make it easier to enable plugins with slightly different arguments.

@SeanNaren SeanNaren added feature Is an improvement or enhancement help wanted Open to be worked on discussion In a discussion stage labels Apr 9, 2021
@SeanNaren SeanNaren self-assigned this Apr 9, 2021
@ethanwharris
Copy link
Member

This seems very nice 😃 Couple comments / questions:

  • So, the user can add a plugin to the registry + a bunch of plugin-specific other configs. Is the use case that you maybe have a bunch of different machines where you like to use different training types (but also precisions I guess) with their own settings?
  • What is the proposed API for adding to it? It would be neat if, in addition to adding one at a time, you could have an outer level registry (e.g. in another repo or python file or something) and do e.g.:
from ... import plugins_registry as my_registry
plugins_registry.add(my_registry)

@ananthsub
Copy link
Contributor

ananthsub commented Apr 9, 2021

@ethanwharris agreed, this could work like fsspec's registration for how Lightning supports different cloud storage options

FWIW I think this could extend to any argument in the trainer that takes either a str or an instance of a class. This could equally apply to profilers or accelerators as well

@SeanNaren
Copy link
Contributor Author

I've updated the OP with more information, please take a look!

Outer registry is a cool idea and I think the natural extension if the plugins registry works well and proves value. I've primarily selected plugins because as a user, this is what I need the most. However if we can built a general enough API then it would be sweet to add this; I'd just try to reduce the requirement of code changes throughout lightning if possible.

cc @carmocca I've added discussion points above.

@kaushikb11 kaushikb11 self-assigned this Apr 12, 2021
@carmocca
Copy link
Contributor

Thank you so much @SeanNaren for addressing all the points in the top post.

How does this all fit with #6090? Would we just rename plugins to training_type?

What about argument order? Should both deepspeed_zero_3_offload and deepspeed_offload_zero_3 work?

Before moving forward with the implementation, I'd like that we exactly define all options that would be returned by (using your top post snippet):

plugins_registry.available_plugins()

Also, is there value in with_descriptions=True, with_recommendations=True? Wouldn't that just be duplicating docs?

@SeanNaren
Copy link
Contributor Author

Thanks @carmocca!

How does this all fit with #6090? Would we just rename plugins to training_type?

Great question, I think that would be preferred. This would then become a TrainingTypeRegistry, but for the most part be an internal refactor + ensuring references exist to the original name for deprecation.

What about argument order? Should both deepspeed_zero_3_offload and deepspeed_offload_zero_3 work?

It should be up to the plugin to define the finite list of plugin configurations it supports. These should be documented as such within the plugins own documentation. In this specific example, this would be defined as deepspeed_stage_3_offload as this is the correct naming based on the upstream library.

Before moving forward with the implementation, I'd like that we exactly define all options that would be returned by (using your top post snippet):

Given the current plugin registry list is quite small, I can outline the current plugins the registry will support out of the box with PL (including FSDP). Note that this doesn't include AMP, Apex or the precision plugins and doesn't reflect how it would look in a few months if other plugins are added. Also note that we include the pre-defined flags, as now these can be defined in the registry, cleaning up having to define the plugin alias in a single Enum (as currently done in PL).

plugins_registry.available_plugins()

# ddp_sharded
# ddp_fully_sharded
# ddp_fully_sharded_auto_wrap
# deepspeed
# deepspeed_stage_2_offload
# deepspeed_stage_3
# deepspeed_stage_3_offload

# Vanilla plugins
# ddp
...

Also, is there value in with_descriptions=True, with_recommendations=True? Wouldn't that just be duplicating docs?

I agree. Including descriptions/recommendations isn't a really matured idea and for a first pass iteration would be unnecessary imo.

@ananthsub
Copy link
Contributor

@SeanNaren @kaushikb11 should we also add an entry for DDP to deal with find_unused_parameters? Then we could make it even easier for people to tap into these perf savings: https://pytorch-lightning.readthedocs.io/en/latest/benchmarking/performance.html#when-using-ddp-set-find-unused-parameters-false

@kaushikb11
Copy link
Contributor

@ananthsub Yup, definitely! That makes sense. It would be convenient for the Users.

@SeanNaren
Copy link
Contributor Author

This was done in #6982 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants