Skip to content

Commit 989b37c

Browse files
committed
Updated existing notebooks in favour of Kueue specific nb
1 parent 446bcde commit 989b37c

14 files changed

+98
-744
lines changed

Diff for: demo-notebooks/additional-demos/hf_interactive.ipynb

+10-6
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,13 @@
6868
"id": "bc27f84c",
6969
"metadata": {},
7070
"source": [
71-
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
71+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding Ray Cluster).\n",
7272
"\n",
7373
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
74-
"The example here is a community image."
74+
"The example here is a community image.\n",
75+
"\n",
76+
"NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
77+
"MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
7578
]
7679
},
7780
{
@@ -89,7 +92,8 @@
8992
}
9093
],
9194
"source": [
92-
"# Create our cluster and submit appwrapper\n",
95+
"# Create our cluster and submit\n",
96+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
9397
"cluster = Cluster(ClusterConfiguration(name='hfgputest', \n",
9498
" namespace=\"default\",\n",
9599
" num_workers=1,\n",
@@ -99,16 +103,16 @@
99103
" max_memory=16, \n",
100104
" num_gpus=4,\n",
101105
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
102-
" mcad=True,\n",
103-
" instascale=True, machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
106+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
107+
" ))"
104108
]
105109
},
106110
{
107111
"cell_type": "markdown",
108112
"id": "12eef53c",
109113
"metadata": {},
110114
"source": [
111-
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
115+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
112116
]
113117
},
114118
{

Diff for: demo-notebooks/additional-demos/local_interactive.ipynb

+23-137
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": 1,
5+
"execution_count": null,
66
"id": "9a44568b-61ef-41c7-8ad1-9a3b128f03a7",
77
"metadata": {
88
"tags": []
@@ -36,7 +36,10 @@
3636
"source": [
3737
"\n",
3838
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
39-
"The example here is a community image."
39+
"The example here is a community image.\n",
40+
"\n",
41+
"NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
42+
"MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
4043
]
4144
},
4245
{
@@ -48,7 +51,8 @@
4851
},
4952
"outputs": [],
5053
"source": [
51-
"# Create our cluster and submit appwrapper\n",
54+
"# Create our cluster and submit\n",
55+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
5256
"namespace = \"default\"\n",
5357
"cluster_name = \"hfgputest-1\"\n",
5458
"local_interactive = True\n",
@@ -63,14 +67,13 @@
6367
" max_memory=4,\n",
6468
" num_gpus=0,\n",
6569
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
66-
" mcad=True,\n",
67-
" instascale=False,\n",
68-
" machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
70+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
71+
" ))"
6972
]
7073
},
7174
{
7275
"cell_type": "code",
73-
"execution_count": 3,
76+
"execution_count": null,
7477
"id": "69968140-15e6-482f-9529-82b0cd19524b",
7578
"metadata": {
7679
"tags": []
@@ -82,21 +85,12 @@
8285
},
8386
{
8487
"cell_type": "code",
85-
"execution_count": 4,
88+
"execution_count": null,
8689
"id": "e20f9982-f671-460b-8c22-3d62e101fed9",
8790
"metadata": {
8891
"tags": []
8992
},
90-
"outputs": [
91-
{
92-
"name": "stdout",
93-
"output_type": "stream",
94-
"text": [
95-
"Waiting for requested resources to be set up...\n",
96-
"Requested cluster up and running!\n"
97-
]
98-
}
99-
],
93+
"outputs": [],
10094
"source": [
10195
"cluster.wait_ready()"
10296
]
@@ -125,82 +119,13 @@
125119
},
126120
{
127121
"cell_type": "code",
128-
"execution_count": 6,
122+
"execution_count": null,
129123
"id": "9483bb98-33b3-4beb-9b15-163d7e76c1d7",
130124
"metadata": {
131125
"scrolled": true,
132126
"tags": []
133127
},
134-
"outputs": [
135-
{
136-
"name": "stderr",
137-
"output_type": "stream",
138-
"text": [
139-
"2023-06-27 19:14:16,088\tINFO client_builder.py:251 -- Passing the following kwargs to ray.init() on the server: logging_level\n",
140-
"2023-06-27 19:14:16,100\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.IDLE\n",
141-
"2023-06-27 19:14:16,308\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.CONNECTING\n",
142-
"2023-06-27 19:14:16,434\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.READY\n",
143-
"2023-06-27 19:14:16,436\tDEBUG worker.py:807 -- Pinging server.\n",
144-
"2023-06-27 19:14:18,634\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000001000000\n",
145-
"2023-06-27 19:14:18,635\tDEBUG worker.py:564 -- Scheduling task get_dashboard_url 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00'\n",
146-
"2023-06-27 19:14:18,645\tDEBUG worker.py:640 -- Retaining c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n",
147-
"2023-06-27 19:14:19,454\tDEBUG worker.py:636 -- Releasing c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n"
148-
]
149-
},
150-
{
151-
"data": {
152-
"text/html": [
153-
"<div>\n",
154-
" <div style=\"margin-left: 50px;display: flex;flex-direction: row;align-items: center\">\n",
155-
" <h3 style=\"color: var(--jp-ui-font-color0)\">Ray</h3>\n",
156-
" <svg version=\"1.1\" id=\"ray\" width=\"3em\" viewBox=\"0 0 144.5 144.6\" style=\"margin-left: 3em;margin-right: 3em\">\n",
157-
" <g id=\"layer-1\">\n",
158-
" <path fill=\"#00a2e9\" class=\"st0\" d=\"M97.3,77.2c-3.8-1.1-6.2,0.9-8.3,5.1c-3.5,6.8-9.9,9.9-17.4,9.6S58,88.1,54.8,81.2c-1.4-3-3-4-6.3-4.1\n",
159-
" c-5.6-0.1-9.9,0.1-13.1,6.4c-3.8,7.6-13.6,10.2-21.8,7.6C5.2,88.4-0.4,80.5,0,71.7c0.1-8.4,5.7-15.8,13.8-18.2\n",
160-
" c8.4-2.6,17.5,0.7,22.3,8c1.3,1.9,1.3,5.2,3.6,5.6c3.9,0.6,8,0.2,12,0.2c1.8,0,1.9-1.6,2.4-2.8c3.5-7.8,9.7-11.8,18-11.9\n",
161-
" c8.2-0.1,14.4,3.9,17.8,11.4c1.3,2.8,2.9,3.6,5.7,3.3c1-0.1,2,0.1,3,0c2.8-0.5,6.4,1.7,8.1-2.7s-2.3-5.5-4.1-7.5\n",
162-
" c-5.1-5.7-10.9-10.8-16.1-16.3C84,38,81.9,37.1,78,38.3C66.7,42,56.2,35.7,53,24.1C50.3,14,57.3,2.8,67.7,0.5\n",
163-
" C78.4-2,89,4.7,91.5,15.3c0.1,0.3,0.1,0.5,0.2,0.8c0.7,3.4,0.7,6.9-0.8,9.8c-1.7,3.2-0.8,5,1.5,7.2c6.7,6.5,13.3,13,19.8,19.7\n",
164-
" c1.8,1.8,3,2.1,5.5,1.2c9.1-3.4,17.9-0.6,23.4,7c4.8,6.9,4.6,16.1-0.4,22.9c-5.4,7.2-14.2,9.9-23.1,6.5c-2.3-0.9-3.5-0.6-5.1,1.1\n",
165-
" c-6.7,6.9-13.6,13.7-20.5,20.4c-1.8,1.8-2.5,3.2-1.4,5.9c3.5,8.7,0.3,18.6-7.7,23.6c-7.9,5-18.2,3.8-24.8-2.9\n",
166-
" c-6.4-6.4-7.4-16.2-2.5-24.3c4.9-7.8,14.5-11,23.1-7.8c3,1.1,4.7,0.5,6.9-1.7C91.7,98.4,98,92.3,104.2,86c1.6-1.6,4.1-2.7,2.6-6.2\n",
167-
" c-1.4-3.3-3.8-2.5-6.2-2.6C99.8,77.2,98.9,77.2,97.3,77.2z M72.1,29.7c5.5,0.1,9.9-4.3,10-9.8c0-0.1,0-0.2,0-0.3\n",
168-
" C81.8,14,77,9.8,71.5,10.2c-5,0.3-9,4.2-9.3,9.2c-0.2,5.5,4,10.1,9.5,10.3C71.8,29.7,72,29.7,72.1,29.7z M72.3,62.3\n",
169-
" c-5.4-0.1-9.9,4.2-10.1,9.7c0,0.2,0,0.3,0,0.5c0.2,5.4,4.5,9.7,9.9,10c5.1,0.1,9.9-4.7,10.1-9.8c0.2-5.5-4-10-9.5-10.3\n",
170-
" C72.6,62.3,72.4,62.3,72.3,62.3z M115,72.5c0.1,5.4,4.5,9.7,9.8,9.9c5.6-0.2,10-4.8,10-10.4c-0.2-5.4-4.6-9.7-10-9.7\n",
171-
" c-5.3-0.1-9.8,4.2-9.9,9.5C115,72.1,115,72.3,115,72.5z M19.5,62.3c-5.4,0.1-9.8,4.4-10,9.8c-0.1,5.1,5.2,10.4,10.2,10.3\n",
172-
" c5.6-0.2,10-4.9,9.8-10.5c-0.1-5.4-4.5-9.7-9.9-9.6C19.6,62.3,19.5,62.3,19.5,62.3z M71.8,134.6c5.9,0.2,10.3-3.9,10.4-9.6\n",
173-
" c0.5-5.5-3.6-10.4-9.1-10.8c-5.5-0.5-10.4,3.6-10.8,9.1c0,0.5,0,0.9,0,1.4c-0.2,5.3,4,9.8,9.3,10\n",
174-
" C71.6,134.6,71.7,134.6,71.8,134.6z\"/>\n",
175-
" </g>\n",
176-
" </svg>\n",
177-
" <table>\n",
178-
" <tr>\n",
179-
" <td style=\"text-align: left\"><b>Python version:</b></td>\n",
180-
" <td style=\"text-align: left\"><b>3.8.13</b></td>\n",
181-
" </tr>\n",
182-
" <tr>\n",
183-
" <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
184-
" <td style=\"text-align: left\"><b> 2.1.0</b></td>\n",
185-
" </tr>\n",
186-
" <tr>\n",
187-
" <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
188-
" <td style=\"text-align: left\"><b><a href=\"http://10.254.20.41:8265\" target=\"_blank\">http://10.254.20.41:8265</a></b></td>\n",
189-
"</tr>\n",
190-
"\n",
191-
" </table>\n",
192-
" </div>\n",
193-
"</div>\n"
194-
],
195-
"text/plain": [
196-
"ClientContext(dashboard_url='10.254.20.41:8265', python_version='3.8.13', ray_version='2.1.0', ray_commit='23f34d948dae8de9b168667ab27e6cf940b3ae85', protocol_version='2022-10-05', _num_clients=1, _context_to_restore=<ray.util.client._ClientContext object at 0x108ca2730>)"
197-
]
198-
},
199-
"execution_count": 6,
200-
"metadata": {},
201-
"output_type": "execute_result"
202-
}
203-
],
128+
"outputs": [],
204129
"source": [
205130
"import ray\n",
206131
"\n",
@@ -210,7 +135,7 @@
210135
},
211136
{
212137
"cell_type": "code",
213-
"execution_count": 7,
138+
"execution_count": null,
214139
"id": "3436eb4a-217c-4109-a3c3-309fda7e2442",
215140
"metadata": {},
216141
"outputs": [],
@@ -234,80 +159,41 @@
234159
},
235160
{
236161
"cell_type": "code",
237-
"execution_count": 8,
162+
"execution_count": null,
238163
"id": "5cca1874-2be3-4631-ae48-9adfa45e3af3",
239164
"metadata": {
240165
"scrolled": true,
241166
"tags": []
242167
},
243-
"outputs": [
244-
{
245-
"name": "stderr",
246-
"output_type": "stream",
247-
"text": [
248-
"2023-06-27 19:14:28,222\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000002000000\n",
249-
"2023-06-27 19:14:28,222\tDEBUG worker.py:564 -- Scheduling task heavy_calculation 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x02\\x00\\x00\\x00'\n"
250-
]
251-
}
252-
],
168+
"outputs": [],
253169
"source": [
254170
"ref = heavy_calculation.remote(3000)"
255171
]
256172
},
257173
{
258174
"cell_type": "code",
259-
"execution_count": 9,
175+
"execution_count": null,
260176
"id": "01172c29-e8bf-41ef-8db5-eccb07906111",
261177
"metadata": {},
262-
"outputs": [
263-
{
264-
"name": "stderr",
265-
"output_type": "stream",
266-
"text": [
267-
"2023-06-27 19:14:29,202\tDEBUG worker.py:640 -- Retaining 16310a0f0a45af5cffffffffffffffffffffffff0100000001000000\n",
268-
"2023-06-27 19:14:31,224\tDEBUG worker.py:439 -- Internal retry for get [ClientObjectRef(16310a0f0a45af5cffffffffffffffffffffffff0100000001000000)]\n"
269-
]
270-
},
271-
{
272-
"data": {
273-
"text/plain": [
274-
"1789.4644387076714"
275-
]
276-
},
277-
"execution_count": 9,
278-
"metadata": {},
279-
"output_type": "execute_result"
280-
}
281-
],
178+
"outputs": [],
282179
"source": [
283180
"ray.get(ref)"
284181
]
285182
},
286183
{
287184
"cell_type": "code",
288-
"execution_count": 10,
185+
"execution_count": null,
289186
"id": "9e79b547-a457-4232-b77d-19147067b972",
290187
"metadata": {},
291-
"outputs": [
292-
{
293-
"name": "stderr",
294-
"output_type": "stream",
295-
"text": [
296-
"2023-06-27 19:14:33,161\tDEBUG dataclient.py:287 -- Got unawaited response connection_cleanup {\n",
297-
"}\n",
298-
"\n",
299-
"2023-06-27 19:14:34,460\tDEBUG dataclient.py:278 -- Shutting down data channel.\n"
300-
]
301-
}
302-
],
188+
"outputs": [],
303189
"source": [
304190
"ray.cancel(ref)\n",
305191
"ray.shutdown()"
306192
]
307193
},
308194
{
309195
"cell_type": "code",
310-
"execution_count": 11,
196+
"execution_count": null,
311197
"id": "2c198f1f-68bf-43ff-a148-02b5cb000ff2",
312198
"metadata": {},
313199
"outputs": [],
@@ -340,7 +226,7 @@
340226
"name": "python",
341227
"nbconvert_exporter": "python",
342228
"pygments_lexer": "ipython3",
343-
"version": "3.8.17"
229+
"version": "3.9.18"
344230
},
345231
"vscode": {
346232
"interpreter": {

Diff for: demo-notebooks/guided-demos/0_basic_ray.ipynb

+9-6
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,13 @@
4545
"id": "bc27f84c",
4646
"metadata": {},
4747
"source": [
48-
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
48+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding RayCluster).\n",
4949
"\n",
5050
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
51-
"The example here is a community image."
51+
"The example here is a community image.\n",
52+
"\n",
53+
"NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
54+
"MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
5255
]
5356
},
5457
{
@@ -58,7 +61,8 @@
5861
"metadata": {},
5962
"outputs": [],
6063
"source": [
61-
"# Create and configure our cluster object (and appwrapper)\n",
64+
"# Create and configure our cluster object\n",
65+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
6266
"cluster = Cluster(ClusterConfiguration(\n",
6367
" name='raytest',\n",
6468
" namespace='default',\n",
@@ -69,8 +73,7 @@
6973
" max_memory=4,\n",
7074
" num_gpus=0,\n",
7175
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
72-
" mcad=True,\n",
73-
" instascale=False\n",
76+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
7477
"))"
7578
]
7679
},
@@ -79,7 +82,7 @@
7982
"id": "12eef53c",
8083
"metadata": {},
8184
"source": [
82-
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
85+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
8386
]
8487
},
8588
{

Diff for: demo-notebooks/guided-demos/1_basic_instascale.ipynb

+7-3
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
"id": "9865ee8c",
66
"metadata": {},
77
"source": [
8-
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
8+
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments).\n",
9+
"\n",
10+
"NOTE: The InstaScale and MCAD components are in Tech Preview"
911
]
1012
},
1113
{
@@ -45,7 +47,9 @@
4547
"This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):\n",
4648
"\n",
4749
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
48-
"The example here is a community image."
50+
"The example here is a community image.\n",
51+
"\n",
52+
"NOTE: This specific demo requires MCAD and InstaScale to be enabled on the Cluster"
4953
]
5054
},
5155
{
@@ -66,7 +70,7 @@
6670
" max_memory=8,\n",
6771
" num_gpus=1,\n",
6872
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
69-
" mcad=True,\n",
73+
" mcad=True, # Enable MCAD\n",
7074
" instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
7175
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
7276
"))"

0 commit comments

Comments
 (0)