IPU Integration #7735

SeanNaren · 2021-05-27T11:00:57Z

What does this PR do?

Fixes #<issue_number>

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

# Conflicts: # pytorch_lightning/trainer/connectors/accelerator_connector.py

SeanNaren · 2021-05-27T11:01:33Z

pl_examples/ipu_examples/mnist.py

+        return acc
+
+    def validation_epoch_end(self, outputs) -> None:
+        self.log('val_acc', torch.stack(outputs).mean(), prog_bar=True)


Need to clear up why this is here, not in the step itself (the step functions are jitted, and the outputs are collated from all devices, so mean averaging etc cannot happen within the functions)

codecov · 2021-05-27T11:02:25Z

Codecov Report

Merging #7735 (61d2014) into master (41be61c) will decrease coverage by 6%.
The diff coverage is 48%.

❗ Current head 61d2014 differs from pull request most recent head d76f491. Consider uploading reports for the commit d76f491 to get more accurate results

@@           Coverage Diff           @@
##           master   #7735    +/-   ##
=======================================
- Coverage      93%     87%    -6%     
=======================================
  Files         202     205     +3     
  Lines       13121   13372   +251     
=======================================
- Hits        12154   11623   -531     
- Misses        967    1749   +782

SeanNaren · 2021-05-27T11:02:59Z

pytorch_lightning/plugins/training_type/ipu.py

+    def on_reset_train_dataloader(self, dataloader) -> Union[Iterable, DataLoader]:
+        return self.process_dataloader(dataloader)
+
+    def on_reset_val_dataloader(self, dataloader) -> Union[Iterable, DataLoader]:
+        return self.process_dataloader(dataloader)
+
+    def on_reset_test_dataloader(self, dataloader) -> Union[Iterable, DataLoader]:
+        return self.process_dataloader(dataloader)
+
+    def on_reset_predict_dataloader(self, dataloader) -> Union[Iterable, DataLoader]:
+        return self.process_dataloader(dataloader)


Need to be pulled into the base plugin in a separate PR if we're comfortable with adding these hooks. Basically the reason these have been introduced is process_dataloader assumes that the dataset size doesn't change. If the size does change, the progress bar size is messed up.

These hooks happen early enough in the code that progress bar is correct. I also looked into moving process_dataloader but this is finnicky.

instead of the multiple hooks, the method process_dataloader could simply be the one that goes into the base class.

There was some confusion here; process_dataloader already exists in the TrainingTypePlugin

pytorch_lightning/plugins/training_type/ipu.py

SeanNaren · 2021-05-27T11:15:44Z

pytorch_lightning/plugins/precision/ipu_precision.py

+from pytorch_lightning.plugins.precision.precision_plugin import PrecisionPlugin
+
+
+class IPUPrecisionPlugin(PrecisionPlugin):


@awaelchli said before the next boilerplate precision plugin we should refactor to have the PrecisionPlugin inside of the TrainingTypePlugin, but maybe we can allow one more if this doesn't happen in time :P

pytorch_lightning/plugins/training_type/ipu.py

justusschock · 2021-05-28T08:18:17Z

pytorch_lightning/plugins/training_type/ipu.py

+        precision = self.lightning_module.trainer.accelerator.precision_plugin.precision
+        precision = 16 if self.half else precision
+
+        model = LightningIPUModule(self.lightning_module, precision)


We wrap the model in so many different places (for rerouting forward to the steps in DDP for example). Should we maybe do a wrapping like this all the time? So that we always reroute forward and are more consistent in our internals? (Not in this PR, just a general consideration)

cc @awaelchli @ananthsub

# Conflicts: # .azure-pipelines/ipu-tests.yml # pytorch_lightning/plugins/training_type/training_type_plugin.py # pytorch_lightning/trainer/data_loading.py # pytorch_lightning/trainer/trainer.py # pytorch_lightning/trainer/training_loop.py

# Conflicts: # pytorch_lightning/accelerators/accelerator.py # pytorch_lightning/plugins/training_type/training_type_plugin.py

SeanNaren · 2021-06-07T12:39:15Z

EDIT: closed and opened a new PR since a lot has changed :)

SeanNaren added 11 commits February 22, 2021 17:10

Initial changes

f75f445

Merge branch 'master' into wip/acc

a4a60c2

# Conflicts: # pytorch_lightning/trainer/connectors/accelerator_connector.py

Add broken example for now

dc9744b

Fix reference

931bb74

Merge branch 'master' into wip/acc

9b18baf

Fix format

c617f02

Code runs

522a81f

Fixes

0c00360

Merge branch 'master' into wip/acc

30c1370

Clear up files

adbdb2a

Add tests, helpers, fixes

3e733af

SeanNaren commented May 27, 2021

View reviewed changes

Small cleanups

a51f23e

SeanNaren commented May 27, 2021

View reviewed changes

pytorch_lightning/plugins/training_type/ipu.py Show resolved Hide resolved

SeanNaren commented May 27, 2021

View reviewed changes

justusschock reviewed May 28, 2021

View reviewed changes

pytorch_lightning/plugins/training_type/ipu.py Show resolved Hide resolved

justusschock reviewed May 28, 2021

View reviewed changes

SeanNaren added 11 commits June 1, 2021 12:17

Refactors based on review

be7de87

Swap to special tests

83c8a79

Add special tests

a6018e5

Add source

0e71bbe

Cleanups

6e38bd1

Add logic to attach/detach model from devices

526383f

Fixes for tests

e18039c

Fixes for tests

2e43fee

Move earlier

53d31a0

Cleanups

6241432

Add check for nvcc

d249a13

SeanNaren added 2 commits June 2, 2021 21:17

Add tests, cleanups

d08cf39

Fix errors

7469744

Lightning-AI deleted a comment from pep8speaks Jun 3, 2021

SeanNaren added 6 commits June 3, 2021 14:08

fix

f474c5b

Try condition

e178d5f

Add missing annotation

c704920

Clearer

c54a216

Clearer message

2ea1766

Fix variable

751f0ea

SeanNaren mentioned this pull request Jun 4, 2021

[IPU] Call accelerator hooks regardless if LM hook overridden 1/n #7826

Merged

11 tasks

SeanNaren changed the title ~~[WIP] Acc~~ IPU Integration Jun 4, 2021

SeanNaren added 3 commits June 7, 2021 11:40

Merge branch 'master' into wip/acc

87e4c8a

# Conflicts: # .azure-pipelines/ipu-tests.yml # pytorch_lightning/plugins/training_type/training_type_plugin.py # pytorch_lightning/trainer/data_loading.py # pytorch_lightning/trainer/trainer.py # pytorch_lightning/trainer/training_loop.py

Cleanups

61d2014

Merge branch 'master' into wip/acc

d76f491

# Conflicts: # pytorch_lightning/accelerators/accelerator.py # pytorch_lightning/plugins/training_type/training_type_plugin.py

SeanNaren closed this Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IPU Integration #7735

IPU Integration #7735

Uh oh!

SeanNaren commented May 27, 2021

Uh oh!

SeanNaren May 27, 2021

Uh oh!

codecov bot commented May 27, 2021 •

edited

Loading

Uh oh!

SeanNaren May 27, 2021

Uh oh!

awaelchli May 27, 2021

Uh oh!

SeanNaren Jun 2, 2021

Uh oh!

Uh oh!

SeanNaren May 27, 2021

Uh oh!

Uh oh!

justusschock May 28, 2021

Uh oh!

SeanNaren commented Jun 7, 2021 •

edited

Loading

Uh oh!

Uh oh!

		from pytorch_lightning.plugins.precision.precision_plugin import PrecisionPlugin


		class IPUPrecisionPlugin(PrecisionPlugin):

IPU Integration #7735

IPU Integration #7735

Uh oh!

Conversation

SeanNaren commented May 27, 2021

What does this PR do?

Before submitting

PR review

Did you have fun?

Uh oh!

SeanNaren May 27, 2021

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SeanNaren May 27, 2021

Choose a reason for hiding this comment

Uh oh!

awaelchli May 27, 2021

Choose a reason for hiding this comment

Uh oh!

SeanNaren Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SeanNaren May 27, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justusschock May 28, 2021

Choose a reason for hiding this comment

Uh oh!

SeanNaren commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 27, 2021 •

edited

Loading

SeanNaren commented Jun 7, 2021 •

edited

Loading