-
Notifications
You must be signed in to change notification settings - Fork 3.5k
PoC: Accelerator refactor #5743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 179 commits
Commits
Show all changes
314 commits
Select commit
Hold shift + click to select a range
259c7f7
restoring the result from subprocess
awaelchli dfab52a
fix queue.get() order for results
awaelchli 6742488
add missing "block_backward_sync" context manager
awaelchli 8c89932
add missing "block_backward_sync" context manager
awaelchli 0186a0f
fix sync_batchnorm
awaelchli b2ac1f4
fix supported gpu-ids for tuple
awaelchli 07a41ce
fix clip gradients and inf recursion
awaelchli 63b7eaf
accelerator selection: added cluster_environment plugin
awaelchli f8344c5
fix torchelastic test
awaelchli 34e3c15
fix reduce early stopping decision for DDP
awaelchli 27a4cff
fix tests: callbacks, conversion to lightning optimizer
awaelchli df5ac30
fix lightning optimizer does not pickle
awaelchli dcf917a
fix setting benchmark and deterministic option
awaelchli 272f088
fix slurm amp test
awaelchli 4529476
fix prepare_data test and determine node_rank
awaelchli 5319b0f
fix retrieving last path when testing
awaelchli 3b54cfb
remove obsolete plugin argument
awaelchli 6540b87
fix test: test_trainer_config
awaelchli 6b450e1
fix torchscript tests
awaelchli 4ef539f
fix trainer.model access
awaelchli 1001ccf
move properties
awaelchli 38a1d0f
fix test_transfer_batch_hook
awaelchli 46cf7ef
fix auto_select_gpus
awaelchli 258f50e
fix omegaconf test
awaelchli a5d69b9
fix test that needs to simulate slurm ddp
awaelchli 88a7ed5
add horovod plugin
awaelchli 40daa41
fix test with named arguments
awaelchli 96fc074
clean up whitespace
awaelchli 210831a
fix datamodules test
awaelchli 98b6dd4
remove old accelerators
justusschock dfcbba6
fix naming
justusschock 348a1b0
move old plugins
justusschock 14f2f6e
move to plugins
justusschock 2f779c6
create precision subpackage
justusschock 58536f6
create training_type subpackage
justusschock ee53c90
fix all new import errors
awaelchli 894e604
fix wrong arguments order passed to test
awaelchli 2bdc836
fix LR finder
awaelchli 48b9882
Added sharded training type and amp plugin
38452b6
Move clip grad to precision plugin
173b22c
Added sharded spawn, select accelerators based on distributed_backend…
79803f6
Fix import issue, attempting to fix tests
a7c0d8f
Fix initial test
02df0ad
Reflect hook logic from master, should wrap model after move to device
d0ebcba
Optional state consolidation, since master has optimizers not wrapped
justusschock 319c3e8
change attribute for instance test
justusschock a34cd15
reset optimizers
justusschock c95b06a
legacy
Borda 9ff0c64
imports in accel
Borda 67d4e47
legacy2
Borda 577b00d
trainer imports
Borda aa4858b
fix import errors after rebase
awaelchli f81a44f
move hook to new setup location
awaelchli a285665
provide unwrapping logic
awaelchli bf78d70
fix trainer callback system
awaelchli 34947cf
added ddp2 implementation
awaelchli 49bec53
fix imports .legacy
Borda ba1c986
move plugins
Borda 45dfbb7
restore legacy
Borda 9b7326a
drop test.py from root
Borda 96bc05d
add tpu accelerator and plugins
justusschock c5994e5
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli 9e46624
fixes
awaelchli 22d2ae8
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli 901d392
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli e174b8d
fix lightning optimizer merge
awaelchli 98660de
reset bugreportmodel
awaelchli 4d95b6c
unwrapping
awaelchli b69d013
step routing forward
awaelchli cb6676d
model access
awaelchli a33d27f
unwrap
awaelchli f7486e2
opt
awaelchli 117f16d
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli 3792b72
integrate distrib_type
awaelchli ef85b81
sync changes
awaelchli 9d9a940
sync
awaelchli f017a39
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli a190a56
fixes
awaelchli 73bb607
add forgotten generators
awaelchli c8c74f3
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli ae71997
add missing logic
awaelchli d89847b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli 0e686c3
update
awaelchli d6a43ea
import
awaelchli ceb8f75
missed imports
awaelchli fbb7c20
import fixes
awaelchli b610999
isort
awaelchli 9b79924
mv f
awaelchli 9afe54d
changelog
awaelchli 3b63e82
Merge branch 'release/1.2-dev' into ref/update-plugins
awaelchli ca8cb68
format
awaelchli 0633745
move helper to parallel plugin
awaelchli a622e0b
d
awaelchli 18c682f
Merge branch 'ref/update-plugins' into accelerator-refactor-sharted-4
awaelchli f275803
add world size
awaelchli 4ae008b
clean up
awaelchli 3b3918b
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli d4c6308
duplicate
awaelchli 7eef4a0
Merge branch 'release/1.2-dev' into accelerator-refactor-sharted-4
awaelchli 9949164
activate ddp_sharded and tpu
awaelchli 6d47357
set nvidia flags
awaelchli a6864ec
remove unused colab var
awaelchli b4b9724
use_tpu <-> on_tpu attrs
awaelchli 81001e3
make some ddp_cpu and clusterplugin tests pass
awaelchli cea000d
Ref/accelerator connector (#5742)
justusschock 933e2a1
plugins
awaelchli ad451d8
manual optimization
justusschock a30a3cf
update optimizer routing
justusschock a05b291
add rank to torchelastic
justusschock 4388e73
fix memory mixed precision
awaelchli be9d029
setstate on trainer for pickling in ddp spawn
awaelchli a90a160
add predict method
awaelchli 767bee0
add back commented accelerator code
awaelchli f771a7f
adapt test for sync_batch_norm to new plugin
awaelchli 1a3b04e
fix deprecated tests
awaelchli a1f4938
fix ddp cpu choice when no num_processes are given
awaelchli 38bc8b7
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli ce6b6de
yapf format
awaelchli 3b7c20b
skip a memory test that cannot pass anymore
awaelchli f538c75
fix pickle error in spawn plugin
awaelchli b44d82e
x
awaelchli 3820e77
avoid
awaelchli 08ae327
x
awaelchli 7d0e094
avoid tons of warnings from importing deprecated modules
awaelchli 1028011
fix cyclic import in docs build
awaelchli 11bd0d6
add support for sharded
justusschock 6bf0b60
update typing
justusschock f94082b
add sharded and sharded_spawn to distributed types
justusschock 7939b99
make unwrap model default
justusschock 9131ffb
refactor LightningShardedDataParallel similar to LightningDistributed…
justusschock ed7425c
update sharded spawn to reflect changes
justusschock 209a164
update sharded to reflect changes
justusschock 837a070
Merge 1.1.5 changes
awaelchli 136b321
fix merge
awaelchli ffcb535
fix merge
awaelchli 1edfa73
yapf isort
awaelchli a689b81
merge 1.1.6
awaelchli 330b14c
fix merge
awaelchli ef258d5
yapf isort
awaelchli c85000d
fix indentation in test
awaelchli 5f3a35e
copy over reinit scheduler implementation from dev1.2
awaelchli fa1c9b7
fix apex tracking calls with dev_debugger
awaelchli e330a11
reduce diff to dev1.2, clean up
awaelchli 994ac82
fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
awaelchli 1a78601
sort plugin tests legacy/new
awaelchli 4b76448
fix error handling for amp on cpu
awaelchli bfd54ab
Merge branch 'release/1.2-dev' into patch117
awaelchli 0574d22
fix merge
awaelchli 6ef6637
Merge branch 'patch117' into accelerator-refactor-sharded
awaelchli 9feda39
[Feat] Resolve manual_backward (#5837)
tchaton 7bb9d9f
fix tests/accelerator tests on cpu
awaelchli 13ae1ff
[BugFix] Resolve manual optimization (#5852)
tchaton fc3b4db
Merge formatting changes from 1.2 branch
awaelchli b437642
Remove copy trainer parameters to happen earlier within the loop and …
SeanNaren 8c6aa83
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
beb980a
resovle a bug
7a0fd27
Accelerator refactor sharded rpc (#5854)
justusschock 0d0ced5
resolve bug
1f3ab76
fix assert in rpc test
awaelchli f1b1121
resolve a test
cd31fa1
fix docs compilation
awaelchli f48793e
accelerator refactor - fix for sharded parity test (#5866)
awaelchli 81ff6ea
Remove DDP2 as this does not apply
20deb46
Add missing pre optimizer hook to ensure lambda closure is called
be4d1a2
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
0ac5fc4
fix apex docstring
awaelchli 07fdd95
[accelerator][BugFix] Resolve some test for 1 gpu (#5863)
tchaton 384b791
yapf isort
awaelchli b1a84b8
resolve flake8
tchaton a157a29
fix apex doctests
awaelchli 08cfc65
fix apex doctests 2
awaelchli 7888bfd
resolve docs
tchaton b5b4243
update drone
tchaton 93ceb4c
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
tchaton d001bcf
clean env
ad47f47
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton 60bfb1a
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton 0608a41
update
f0120b5
update
bf8874e
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
baf7d7f
update
tchaton 9360aad
update
tchaton b814cdc
merge
justusschock 0d3ea37
Merge branch 'accelerator-refactor-sharded' of github.com:PytorchLigh…
justusschock f1f90c2
Fix RPC related tests, clean out old API, update for new accelerator …
SeanNaren 6d05881
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
justusschock d86fdff
Update test_remove_1-4.py
justusschock 5fbc1cf
Expose properties for tpu cores/gpus/num_gpus
aa9aea0
Add root GPU property
c35baf1
Move properties to properties.py
a9c6e21
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli 8f3947b
move tests that were previously in drone
awaelchli 50ecc4a
Fix root GPU property (#5908)
SeanNaren c7d0075
fix best model path transfer when no checkpoint callback available
awaelchli 3f61d15
Merge remote-tracking branch 'original/accelerator-refactor-sharded' …
awaelchli 061ea46
Fix setup hook order [wip] (#5858)
SeanNaren 1fe1f91
rename ddp sequential -> rpc sequential for special test
awaelchli 3683f5a
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli 1f01b81
revert
awaelchli 135c236
fix stupid merge problem
awaelchli 222653d
Use property in connector for sampler (#5913)
SeanNaren f4311cd
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
awaelchli b210dee
merge the import conflicts
awaelchli 236009e
fix spawning of processes in slurm
awaelchli aace276
[wip] Fix some bugs for TPU [skip ci] (#5878)
tchaton 68273f5
resolve some tests
ca77fa4
update
c35edfd
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
justusschock 8cacef7
fix imports
justusschock f7bbe48
update
30d9800
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
25f7f13
resolve flake8
tchaton fa28c41
update azure pipeline
tchaton 51c27e6
Merge branch 'release/1.2-dev' into accelerator-refactor-sharded
tchaton b888d68
skip a sharded test on cpu that requires a gpu
awaelchli 01ca4cd
resolve tpus
181d143
Merge branch 'master' into accelerator-refactor-sharded
justusschock 946a1e9
resolve bug
2ad1a6e
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
6e0aff0
resolve flake8
tchaton a931791
update
319d034
Merge branch 'accelerator-refactor-sharded' of https://github.com/PyT…
4117bec
updat utils
8d000f7
Merge branch 'master' into accelerator-refactor-sharded
tchaton 0b1ba67
revert permission change on files
awaelchli cc385b4
suggestions from carlos
awaelchli e9eb318
remove unrelated formatting changes
awaelchli 7c08400
remove incomplete comment
awaelchli 7c3d184
Update pytorch_lightning/accelerators/__init__.py
awaelchli 503426e
remove unrelated formatting change
awaelchli c0fbf7a
add types
awaelchli 23a9a10
warn 1.7 ddp manual backward only if ddp kwarg unset
awaelchli a70ee4a
yapf + isort
awaelchli b0621c4
pep8 unused imports
awaelchli 18bfe70
Merge branch 'master' into accelerator-refactor-sharded
awaelchli 7b0515d
fix cyclic import in docs
awaelchli d966057
Apply suggestions from code review
Borda f636d9d
typer in accelerator.py
Borda 5579ea7
typo
tchaton f5df88b
Apply suggestions from code review
Borda 233694e
formatting
Borda a47644a
update on comments
tchaton 80dacb6
update typo
tchaton 99573eb
Update pytorch_lightning/trainer/properties.py
tchaton ab859d7
update
tchaton ad5742a
suggestion from code review
awaelchli 5eaec98
suggestion from code review
awaelchli 941cf77
Merge branch 'master' into accelerator-refactor-sharded
mergify[bot] 8491c29
Merge branch 'master' into accelerator-refactor-sharded
mergify[bot] bd2b23d
Merge branch 'master' into accelerator-refactor-sharded
mergify[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,4 @@ | ||
# Copyright The PyTorch Lightning team. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from pytorch_lightning.accelerators.legacy.accelerator import Accelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.cpu_accelerator import CPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp2_accelerator import DDP2Accelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp_accelerator import DDPAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp_cpu_hpc_accelerator import DDPCPUHPCAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp_cpu_spawn_accelerator import DDPCPUSpawnAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp_hpc_accelerator import DDPHPCAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.ddp_spawn_accelerator import DDPSpawnAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.dp_accelerator import DataParallelAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.gpu_accelerator import GPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.horovod_accelerator import HorovodAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.legacy.tpu_accelerator import TPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.accelerator import Accelerator | ||
from pytorch_lightning.accelerators.cpu import CPUAccelerator | ||
from pytorch_lightning.accelerators.gpu import GPUAccelerator | ||
from pytorch_lightning.accelerators.tpu import TPUAccelerator |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.