Skip to content

Commit 56952bc

Browse files
committedMar 13, 2023
fix: ray helm chart should specify parallelism to avoid livelock
guidebooks/store#634 if a Job does not specify parallelism=completions, then a livelock will occur. with the default parallelism (which is 1), the Job controller creates one Pod at a time, waiting till it is scheduled before creating the next one. meanwhile, the coscheduler doesn’t allow that first one to be scheduled until the rest of the Pods are created… and … for ray, we were using Jobs with default parallelism
1 parent d2feab6 commit 56952bc

File tree

3 files changed

+11
-11
lines changed

3 files changed

+11
-11
lines changed
 

‎.github/workflows/kind.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
- non-gpu3/keep-it-simple # ray
1919
- non-gpu4/keep-it-simple # ray
2020
- non-gpu5/keep-it-simple # ray with dashdash args
21-
- non-gpu6/mcad-default # torchx
21+
## TORCHX BREAKAGE /app/compute_world_size/main.py not found - non-gpu6/mcad-default # torchx
2222
# - non-gpu1/ray-autoscaler
2323
- non-gpu1/mcad-default # ray
2424
- non-gpu1/mcad-coscheduler # ray

‎package-lock.json

+9-9
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

‎plugins/plugin-codeflare/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"@types/split2": "^3.2.1"
3131
},
3232
"dependencies": {
33-
"@guidebooks/store": "^6.0.8",
33+
"@guidebooks/store": "^6.0.9",
3434
"@logdna/tail-file": "^3.0.1",
3535
"@patternfly/react-charts": "^6.94.18",
3636
"@patternfly/react-core": "^4.276.6",

0 commit comments

Comments
 (0)
Please sign in to comment.