Skip to content

[RFC] Support passing pluggable Accelerators to Trainer #10687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaushikb11 opened this issue Nov 23, 2021 · 3 comments
Closed

[RFC] Support passing pluggable Accelerators to Trainer #10687

kaushikb11 opened this issue Nov 23, 2021 · 3 comments
Assignees
Labels
accelerator discussion In a discussion stage feature Is an improvement or enhancement priority: 1 Medium priority task

Comments

@kaushikb11
Copy link
Contributor

kaushikb11 commented Nov 23, 2021

🚀 Feature

With the present Lightning accelerator design, new accelerators cannot be passed to Trainer unless they are part of the Lightning accelerators.

This is not possible.

trainer = Trainer(accelerator=NewSOTAAccelerator(), devices=4)

There is a lot of innovation happening in the space of ML Accelerators, and the list will continue to grow. We should enable support for this functionality and make it easier for users to experiment with new accelerators using Lightning.

This proposal also aims at cleaning up and moving hardware specific logic from the accelerator connector to the accelerators.

For example, the HPUAccelerator PR, which is still in development, adds support for Habana's Gaudi Accelerator. Based on the above points, the Accelerator interface would look like this.

class HPUAccelerator(Accelerator):
    """Accelerator for HPU devices."""
    
    @property
    def accelerator_type(self) -> str:
        """Accelerator type."""
        return "hpu"
	
    @staticmethod
    def parse_devices(devices) -> int:
        # Include the HPU device parsing logic here
        return devices
    
    @staticmethod
    def auto_device_count() -> int:
        """Get the HPU devices when set to auto."""
        return habana.device_count()
    
    @staticmethod
    def get_parallel_devices(devices: int) -> List[torch.device]:
        """Gets parallel devices for the given HPU devices."""
        # Moved the logic from accelerator connector
        return [torch.device("hpu")] * devices
    
    def get_device_stats(self, device: Union[str, torch.device]) -> Dict[str, Any]:
        """Gets stats for the given HPU device."""
        return {}

After defining HPUAccelerator, the user could provide it to the Trainer without it being part of the Lightning accelerators.

trainer = Trainer(accelerator=HPUAccelerator(), devices=4, strategy=HPUPlugin())

cc @Borda @tchaton @rohitgr7 @akihironitta

@kaushikb11 kaushikb11 added feature Is an improvement or enhancement accelerator labels Nov 23, 2021
@kaushikb11 kaushikb11 self-assigned this Nov 23, 2021
@kaushikb11 kaushikb11 added the discussion In a discussion stage label Nov 23, 2021
@SeanNaren
Copy link
Contributor

I love this! However a double edge sword; we should not detract from the importance of bringing these accelerators/strategies in core Lightning.

I.e, If we have supporters/maintainers for the Habana accelerator, we should bring this into Lightning.

@kaushikb11
Copy link
Contributor Author

Yes, agreed! Habana accelerator was an example. The aim of the proposal is to add flexibility for the users, and not be limited by our options.

Whereas we should aim to support as much as accelerators/strategies for the community.

@kaushikb11
Copy link
Contributor Author

This is supported by #12030

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator discussion In a discussion stage feature Is an improvement or enhancement priority: 1 Medium priority task
Projects
None yet
Development

No branches or pull requests

2 participants