-
Notifications
You must be signed in to change notification settings - Fork 28
fix testing if node has gpu support #1604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix testing if node has gpu support #1604
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1604 +/- ##
========================================
- Coverage 73.7% 73.7% -0.1%
========================================
Files 278 278
Lines 10874 10854 -20
Branches 1181 1175 -6
========================================
- Hits 8015 8000 -15
+ Misses 2516 2514 -2
+ Partials 343 340 -3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it looks good on my machine.
|
||
logger.info("Node GPU support: %s", has_gpu_support) | ||
return has_gpu_support | ||
config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I guess this image will never block when boots
"Tty": False, | ||
"OpenStdin": False, | ||
} | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIP: to suppress exceptions sometimes is handy and more readable
from contextlib import suppress
with suppress(aiodocker.execptions.DockerError):
await ...
return True
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not realize this existed. cool thing! but for this pre-new-sidecar era I will keep it for a next time.
- UI/UX improvements (#1657) - Bump yarl from 1.4.2 to 1.5.1 in /packages/postgres-database (#1665) - Bump ujson from 3.0.0 to 3.1.0 in /packages/service-library (#1664) - Bump pytest-docker from 0.7.2 to 0.8.0 in /packages/service-library (#1647) - Improving storage performance (#1659) - Bump aiozipkin from 0.6.0 to 0.7.0 in /packages/service-library (#1642) - Theming (#1656) - Platform stability: (#1645) - is1594 fix and re-activate e2e testing (#1620) - 2 bugs fixed + Some improvements (#1634) - Fixes default (#1640) - Bump lodash from 4.17.15 to 4.17.19 (#1639) - Is1585/cleanup storage (#1586) - Fixes on publish studies handling (#1632) - Some enhancements and bug fixes (#1608) - Improve e2e (#1631) - filter studies by name before deleting them (#1629) - Maintenance/upgrades test tools (#1628) - Bugfix/concurent opening projects (#1598) - Bugfix/allow reading groups anonymous user (#1615) - Bump docker from 4.2.1 to 4.2.2 in /packages/postgres-database (#1605) - fix testing if node has gpu support (#1604) - [bugfix] Invalidate cache before starting a study (#1602) - Feature/fix e2e 2 (#1600) - fix deploy not needing e2e testing since it is disabled - reduce cardinality of metrics (#1593) - Excudes e2e stage from include until fixed (#1595) - Shared project concurrency (frontend) (#1591) - Homogenize studies and services (#1569) - [feature] UI Fine grained access - project locking and notification - Bugfix/apiserver does not need sslheaders (#1564) - Cleanup catalog service (#1582) - Maintenance/cleanup api server (#1578) - Adds support for GPU scheduling of computational services (#1553) - Maintenance/upgrades and tooling (#1546) - Is1570/study fails 500 (#1572) - Bump faker from 4.1.0 to 4.1.1 in /packages/postgres-database (#1573) - maintenance fix codecov reports (#1568) - Manage groups, Share studies (#1512) - Is/add notebook migration script (#1565) - Is1269/api-server upgrade (#1475) - added simcore_webserver_service in pytest simcore package (#1563) - add traefik endpoint to api-gateway (#1555)
What do these changes do?
executing
docker node inspect self
is not allowed on non-manager nodes in a swarm.Therefore, alternative proposal is to try run a nvidia-smi container that will fail if the nvidia runtime is not set as default on the node.
@GitHK : please test on your GPU enabled machine.
fixes #1603 (after being tested by @GitHK )
Related issue number
How to test
Checklist
make openapi-specs
,git commit ...
and thenmake version-*
)