-
Notifications
You must be signed in to change notification settings - Fork 7.1k
CI fails on windows: ci/circleci: unittest_windows_cpu_pyX.Y #6189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently trying to reproduce on https://github.com/pytorch/vision/pull/5009/checks?check_run_id=7006220104 |
Thanks for the update! Let's close this issue if everything is OK now |
@YosuaMichael actually, tests are still failing on #5009. I reopen |
Ah yeah, previously I just rerun the test and it seems green. But it get the error after I update the branch. Sorry for the false negative @vfdev-5 ! |
@vfdev-5 the failure is suspicious because it's on a very large model. Can you try skipping the specific test to see if this is related to issues on CircleCI side rather than on core? Another thing we can do to confirm that the core is not the issue, is to fix the nightly the the one before and rerun the job. If it fails we will know it's the CircleCI. |
I have confirmed that skipping the big models indeed make the CI green again. Now with the same PR #6195 I use the older nightly version (torch==1.13.0.dev20220621) to check if this issue is caused by core or circle CI. |
^^ Nevermind on this, I check on the error and it seems to still use new torch nightly:
seems like the windows build have different way to get the torch core, will look more on this first. |
@YosuaMichael you can also be able to ssh to the circle CI failing job and check directly which nightly is passing (if any) |
@vfdev-5 I never ssh to circle CI before, do you have any pointer on how to do it? @datumbox @vfdev-5 I identify the installation of pytorch on windows seems to happened here: https://github.com/pytorch/vision/blob/main/.circleci/unittest/windows/scripts/install.sh#L37. Do you have any idea how to modify to install the older nightly version?
Another note that in https://anaconda.org/pytorch-nightly/pytorch/files it seems that it only have version |
@YosuaMichael there was a wiki page on pytorch about that. There is link on circle ci docs: https://circleci.com/docs/2.0/ssh-access-jobs Basically, once you logged in on circle CI you can have options like : restart failed job, restart failed job with ssh. I think we should have write rights on the repo to be able to run with ssh. As for installation with conda, I would try also with pip in case there is a version... |
@vfdev-5 thanks for the suggestion! Currently I hardcode and replace the conda install with pip, it seems to successfully install the torch version that we want (see https://app.circleci.com/pipelines/github/pytorch/vision/18540/workflows/3765b5f9-445c-4d89-895b-b100f2f99834/jobs/1500134). |
Now I have confirmed that the problem is not from core. |
That's the problem with very large models like that. They often cause random memory issues. If you send a PR that adds a list of such models and skips them (similar to what you have for the GPU), I'll be happy to review it. Basically we should turn off the specific test and recover our CI. |
@YosuaMichael just to confirm if we run everything locally it does not fail right, only Circle CI is failing everytime ? |
For my macbook it does not fail, but I think this is expected (in the circle ci, only windows one failing and probably because of resource problem like memory) |
Fixed by #6197 |
Uh oh!
There was an error while loading. Please reload this page.
Tests on windows are started failing:
It started appearing on PyTorch core nightly 20220622
cc @pmeier @seemethere
The text was updated successfully, but these errors were encountered: