Skip to content

[DONT MERGE] PR to debug CI failures on windows #6195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

YosuaMichael
Copy link
Contributor

@YosuaMichael YosuaMichael commented Jun 23, 2022

This PR is just meant to check on CI failures described on this issue: #6189

  • First we try to skip the big models in CPU as well to make sure the problem is not caused by big model [confirmed GREEN for windows test]
  • Second, we will test the big models with torch nightly 20220621 -> Still got issue
  • @vfdev-5 try on torch nightly 20220618 and it is green (confirmed)
  • using inference_mode and only skip jit, fx, and backprop doesn't work

In summary, seems like there is a change on core since the nightly 20220618 works fine, but the later version is not. The main suspect of the problem is memory usage, and need double check if this is indeed the case with memory profiler. If yes, we should raise to core about this memory issue.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 23, 2022

I was debugging with ssh the failing job and confirm that recent pytorch nightlies are failing even if executed just a single test. However, if I install Juin 18 version single test seems passing:

(C:\Users\circleci\project\env) C:\Users\circleci\project>pytest -vvv test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
pytest -vvv test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
============================= test session starts =============================
platform win32 -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- C:\Users\circleci\project\env\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\circleci\project, configfile: pytest.ini
plugins: cov-3.0.0, mock-3.7.0
collecting ... collected 1 item

test/test_models.py::test_classification_model[cpu-regnet_y_128gf] PASSED [100%]

============================= 1 passed in 30.42s ==============================

(C:\Users\circleci\project\env) C:\Users\circleci\project>pip list | grep torch
pip list | grep torch
torch              1.13.0.dev20220618+cpu
torchvision        0.14.0a0+1eae59a       c:\users\circleci\project

Testing it with latest commits

@YosuaMichael
Copy link
Contributor Author

I was debugging with ssh the failing job and confirm that recent pytorch nightlies are failing even if executed just a single test. However, if I install Juin 18 version single test seems passing:

(C:\Users\circleci\project\env) C:\Users\circleci\project>pytest -vvv test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
pytest -vvv test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
============================= test session starts =============================
platform win32 -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- C:\Users\circleci\project\env\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\circleci\project, configfile: pytest.ini
plugins: cov-3.0.0, mock-3.7.0
collecting ... collected 1 item

test/test_models.py::test_classification_model[cpu-regnet_y_128gf] PASSED [100%]

============================= 1 passed in 30.42s ==============================

(C:\Users\circleci\project\env) C:\Users\circleci\project>pip list | grep torch
pip list | grep torch
torch              1.13.0.dev20220618+cpu
torchvision        0.14.0a0+1eae59a       c:\users\circleci\project

Testing it with latest commits

cc @datumbox

From this finding, there might be some changes in core that increase memory usage?

@YosuaMichael YosuaMichael self-assigned this Jun 24, 2022
@datumbox
Copy link
Contributor

datumbox commented Jun 24, 2022

@YosuaMichael Yes seems this way. We should raise to Core and see if they are aware something might have raised the memory requirements on Windows. This is going to be quite a difficult debugging. It's worth creating an issue where you document the findings and summarize, providing references. This will help people investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants