-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Fix TPU testing and collect all tests #11098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think that in past we had just a selection of files to have it faster, but I agree that this is a more robust solution - we test all PL and we do not need to remember to add a single file somewhere... |
TPU ci is still getting stuck/ timing out. One of the newly discovered tests must be the cause of this. |
Any progress on this? Did you identify the hanging test? |
I will take a stab at it today. |
What does this PR do?
Fixes #13720
Addresses a range of issues with the TPU CI:
RunIf(tpu=True)
. No longer do the tests get hardcoded and forgotten, leading to tests being added that never run.@pl_multi_process
decorator on all tests. This decorator suppressed exceptions and assertion errors. Some tests were broken and outdated for a while and were never raising the errors due to this.RunIf(tpu=True, standalone=True)
for TPU tests as an alternative to the aforementionedpl_multi_process
. The CI now runs standalone tests similar to the GPU test suite. This is necessary for example when we need to run with the single device tpu strategy that requires us to access the xla device in the main process.Note, after resolving the core issues, and pushing many commits to run the CI, I have not seen any flakiness and random behavior anymore.
In a follow-up, we should update the CI to the latest torch_xla version.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
Part of #1 (it's a lie, this is just here to avoid noisy GitHub bot)
cc @Borda @tchaton @rohitgr7 @akihironitta @carmocca @kaushikb11