-
Notifications
You must be signed in to change notification settings - Fork 3.6k
CI: debug using K80 #13245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: debug using K80 #13245
Conversation
|
Both LTS and stable are failing due to the failure of this test case: |
|
This test is no longer relevant.
Since the various code paths that were tested before no longer exist, I recommend dropping the test entirely. |
|
trying to run CUDA 10.2 and seem it is failing at the very same place... :( |
|
actual failer is most likely on: |
for more information, see https://pre-commit.ci
|
I am thinking the problem is that the K80 are shared / virtual cards |
|
@Borda Thank you very much for investigating this... Other instance types we could try are |
|
Closing as we couldn't find the cause and we chose to split the testing anyways |
What does this PR do?
seems we had in PL codebase test which was never executed
and running on a 4-GPU machine, it is failing
https://dev.azure.com/PytorchLightning/pytorch-lightning/_build/results?buildId=742[…]b5f-c4ba606ae534&t=ad0b8e2f-da1f-5a7c-b4de-96aa939719e3&l=3333
Does your PR introduce any breaking changes? If yes, please list them.
https://docs.microsoft.com/en-us/azure/virtual-machines/nc-series
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃
cc @carmocca @akihironitta @Borda @tchaton @rohitgr7