Skip to content

Commit 0ce4a79

Browse files
committed
Added demo notebook for Kueue by default
Updated existing notebooks in favour of Kueue specific nb Updated wording
1 parent bd49ef7 commit 0ce4a79

12 files changed

+174
-180
lines changed

Diff for: demo-notebooks/additional-demos/hf_interactive.ipynb

+9-5
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,12 @@
6868
"id": "bc27f84c",
6969
"metadata": {},
7070
"source": [
71-
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
71+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding Ray Cluster).\n",
7272
"\n",
7373
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
74-
"The example here is a community image."
74+
"The example here is a community image.\n",
75+
"\n",
76+
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
7577
]
7678
},
7779
{
@@ -89,7 +91,8 @@
8991
}
9092
],
9193
"source": [
92-
"# Create our cluster and submit appwrapper\n",
94+
"# Create our cluster and submit\n",
95+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
9396
"cluster = Cluster(ClusterConfiguration(name='hfgputest', \n",
9497
" namespace=\"default\",\n",
9598
" num_workers=1,\n",
@@ -99,15 +102,16 @@
99102
" max_memory=16, \n",
100103
" num_gpus=4,\n",
101104
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
102-
" instascale=True, machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
105+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
106+
" ))"
103107
]
104108
},
105109
{
106110
"cell_type": "markdown",
107111
"id": "12eef53c",
108112
"metadata": {},
109113
"source": [
110-
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
114+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
111115
]
112116
},
113117
{

Diff for: demo-notebooks/additional-demos/local_interactive.ipynb

+22-136
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": 1,
5+
"execution_count": null,
66
"id": "9a44568b-61ef-41c7-8ad1-9a3b128f03a7",
77
"metadata": {
88
"tags": []
@@ -36,7 +36,9 @@
3636
"source": [
3737
"\n",
3838
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
39-
"The example here is a community image."
39+
"The example here is a community image.\n",
40+
"\n",
41+
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
4042
]
4143
},
4244
{
@@ -48,7 +50,8 @@
4850
},
4951
"outputs": [],
5052
"source": [
51-
"# Create our cluster and submit appwrapper\n",
53+
"# Create our cluster and submit\n",
54+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
5255
"namespace = \"default\"\n",
5356
"cluster_name = \"hfgputest-1\"\n",
5457
"local_interactive = True\n",
@@ -63,13 +66,13 @@
6366
" max_memory=4,\n",
6467
" num_gpus=0,\n",
6568
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
66-
" instascale=False,\n",
67-
" machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
69+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
70+
" ))"
6871
]
6972
},
7073
{
7174
"cell_type": "code",
72-
"execution_count": 3,
75+
"execution_count": null,
7376
"id": "69968140-15e6-482f-9529-82b0cd19524b",
7477
"metadata": {
7578
"tags": []
@@ -81,21 +84,12 @@
8184
},
8285
{
8386
"cell_type": "code",
84-
"execution_count": 4,
87+
"execution_count": null,
8588
"id": "e20f9982-f671-460b-8c22-3d62e101fed9",
8689
"metadata": {
8790
"tags": []
8891
},
89-
"outputs": [
90-
{
91-
"name": "stdout",
92-
"output_type": "stream",
93-
"text": [
94-
"Waiting for requested resources to be set up...\n",
95-
"Requested cluster up and running!\n"
96-
]
97-
}
98-
],
92+
"outputs": [],
9993
"source": [
10094
"cluster.wait_ready()"
10195
]
@@ -124,82 +118,13 @@
124118
},
125119
{
126120
"cell_type": "code",
127-
"execution_count": 6,
121+
"execution_count": null,
128122
"id": "9483bb98-33b3-4beb-9b15-163d7e76c1d7",
129123
"metadata": {
130124
"scrolled": true,
131125
"tags": []
132126
},
133-
"outputs": [
134-
{
135-
"name": "stderr",
136-
"output_type": "stream",
137-
"text": [
138-
"2023-06-27 19:14:16,088\tINFO client_builder.py:251 -- Passing the following kwargs to ray.init() on the server: logging_level\n",
139-
"2023-06-27 19:14:16,100\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.IDLE\n",
140-
"2023-06-27 19:14:16,308\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.CONNECTING\n",
141-
"2023-06-27 19:14:16,434\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.READY\n",
142-
"2023-06-27 19:14:16,436\tDEBUG worker.py:807 -- Pinging server.\n",
143-
"2023-06-27 19:14:18,634\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000001000000\n",
144-
"2023-06-27 19:14:18,635\tDEBUG worker.py:564 -- Scheduling task get_dashboard_url 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00'\n",
145-
"2023-06-27 19:14:18,645\tDEBUG worker.py:640 -- Retaining c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n",
146-
"2023-06-27 19:14:19,454\tDEBUG worker.py:636 -- Releasing c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n"
147-
]
148-
},
149-
{
150-
"data": {
151-
"text/html": [
152-
"<div>\n",
153-
" <div style=\"margin-left: 50px;display: flex;flex-direction: row;align-items: center\">\n",
154-
" <h3 style=\"color: var(--jp-ui-font-color0)\">Ray</h3>\n",
155-
" <svg version=\"1.1\" id=\"ray\" width=\"3em\" viewBox=\"0 0 144.5 144.6\" style=\"margin-left: 3em;margin-right: 3em\">\n",
156-
" <g id=\"layer-1\">\n",
157-
" <path fill=\"#00a2e9\" class=\"st0\" d=\"M97.3,77.2c-3.8-1.1-6.2,0.9-8.3,5.1c-3.5,6.8-9.9,9.9-17.4,9.6S58,88.1,54.8,81.2c-1.4-3-3-4-6.3-4.1\n",
158-
" c-5.6-0.1-9.9,0.1-13.1,6.4c-3.8,7.6-13.6,10.2-21.8,7.6C5.2,88.4-0.4,80.5,0,71.7c0.1-8.4,5.7-15.8,13.8-18.2\n",
159-
" c8.4-2.6,17.5,0.7,22.3,8c1.3,1.9,1.3,5.2,3.6,5.6c3.9,0.6,8,0.2,12,0.2c1.8,0,1.9-1.6,2.4-2.8c3.5-7.8,9.7-11.8,18-11.9\n",
160-
" c8.2-0.1,14.4,3.9,17.8,11.4c1.3,2.8,2.9,3.6,5.7,3.3c1-0.1,2,0.1,3,0c2.8-0.5,6.4,1.7,8.1-2.7s-2.3-5.5-4.1-7.5\n",
161-
" c-5.1-5.7-10.9-10.8-16.1-16.3C84,38,81.9,37.1,78,38.3C66.7,42,56.2,35.7,53,24.1C50.3,14,57.3,2.8,67.7,0.5\n",
162-
" C78.4-2,89,4.7,91.5,15.3c0.1,0.3,0.1,0.5,0.2,0.8c0.7,3.4,0.7,6.9-0.8,9.8c-1.7,3.2-0.8,5,1.5,7.2c6.7,6.5,13.3,13,19.8,19.7\n",
163-
" c1.8,1.8,3,2.1,5.5,1.2c9.1-3.4,17.9-0.6,23.4,7c4.8,6.9,4.6,16.1-0.4,22.9c-5.4,7.2-14.2,9.9-23.1,6.5c-2.3-0.9-3.5-0.6-5.1,1.1\n",
164-
" c-6.7,6.9-13.6,13.7-20.5,20.4c-1.8,1.8-2.5,3.2-1.4,5.9c3.5,8.7,0.3,18.6-7.7,23.6c-7.9,5-18.2,3.8-24.8-2.9\n",
165-
" c-6.4-6.4-7.4-16.2-2.5-24.3c4.9-7.8,14.5-11,23.1-7.8c3,1.1,4.7,0.5,6.9-1.7C91.7,98.4,98,92.3,104.2,86c1.6-1.6,4.1-2.7,2.6-6.2\n",
166-
" c-1.4-3.3-3.8-2.5-6.2-2.6C99.8,77.2,98.9,77.2,97.3,77.2z M72.1,29.7c5.5,0.1,9.9-4.3,10-9.8c0-0.1,0-0.2,0-0.3\n",
167-
" C81.8,14,77,9.8,71.5,10.2c-5,0.3-9,4.2-9.3,9.2c-0.2,5.5,4,10.1,9.5,10.3C71.8,29.7,72,29.7,72.1,29.7z M72.3,62.3\n",
168-
" c-5.4-0.1-9.9,4.2-10.1,9.7c0,0.2,0,0.3,0,0.5c0.2,5.4,4.5,9.7,9.9,10c5.1,0.1,9.9-4.7,10.1-9.8c0.2-5.5-4-10-9.5-10.3\n",
169-
" C72.6,62.3,72.4,62.3,72.3,62.3z M115,72.5c0.1,5.4,4.5,9.7,9.8,9.9c5.6-0.2,10-4.8,10-10.4c-0.2-5.4-4.6-9.7-10-9.7\n",
170-
" c-5.3-0.1-9.8,4.2-9.9,9.5C115,72.1,115,72.3,115,72.5z M19.5,62.3c-5.4,0.1-9.8,4.4-10,9.8c-0.1,5.1,5.2,10.4,10.2,10.3\n",
171-
" c5.6-0.2,10-4.9,9.8-10.5c-0.1-5.4-4.5-9.7-9.9-9.6C19.6,62.3,19.5,62.3,19.5,62.3z M71.8,134.6c5.9,0.2,10.3-3.9,10.4-9.6\n",
172-
" c0.5-5.5-3.6-10.4-9.1-10.8c-5.5-0.5-10.4,3.6-10.8,9.1c0,0.5,0,0.9,0,1.4c-0.2,5.3,4,9.8,9.3,10\n",
173-
" C71.6,134.6,71.7,134.6,71.8,134.6z\"/>\n",
174-
" </g>\n",
175-
" </svg>\n",
176-
" <table>\n",
177-
" <tr>\n",
178-
" <td style=\"text-align: left\"><b>Python version:</b></td>\n",
179-
" <td style=\"text-align: left\"><b>3.8.13</b></td>\n",
180-
" </tr>\n",
181-
" <tr>\n",
182-
" <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
183-
" <td style=\"text-align: left\"><b> 2.1.0</b></td>\n",
184-
" </tr>\n",
185-
" <tr>\n",
186-
" <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
187-
" <td style=\"text-align: left\"><b><a href=\"http://10.254.20.41:8265\" target=\"_blank\">http://10.254.20.41:8265</a></b></td>\n",
188-
"</tr>\n",
189-
"\n",
190-
" </table>\n",
191-
" </div>\n",
192-
"</div>\n"
193-
],
194-
"text/plain": [
195-
"ClientContext(dashboard_url='10.254.20.41:8265', python_version='3.8.13', ray_version='2.1.0', ray_commit='23f34d948dae8de9b168667ab27e6cf940b3ae85', protocol_version='2022-10-05', _num_clients=1, _context_to_restore=<ray.util.client._ClientContext object at 0x108ca2730>)"
196-
]
197-
},
198-
"execution_count": 6,
199-
"metadata": {},
200-
"output_type": "execute_result"
201-
}
202-
],
127+
"outputs": [],
203128
"source": [
204129
"import ray\n",
205130
"\n",
@@ -209,7 +134,7 @@
209134
},
210135
{
211136
"cell_type": "code",
212-
"execution_count": 7,
137+
"execution_count": null,
213138
"id": "3436eb4a-217c-4109-a3c3-309fda7e2442",
214139
"metadata": {},
215140
"outputs": [],
@@ -233,80 +158,41 @@
233158
},
234159
{
235160
"cell_type": "code",
236-
"execution_count": 8,
161+
"execution_count": null,
237162
"id": "5cca1874-2be3-4631-ae48-9adfa45e3af3",
238163
"metadata": {
239164
"scrolled": true,
240165
"tags": []
241166
},
242-
"outputs": [
243-
{
244-
"name": "stderr",
245-
"output_type": "stream",
246-
"text": [
247-
"2023-06-27 19:14:28,222\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000002000000\n",
248-
"2023-06-27 19:14:28,222\tDEBUG worker.py:564 -- Scheduling task heavy_calculation 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x02\\x00\\x00\\x00'\n"
249-
]
250-
}
251-
],
167+
"outputs": [],
252168
"source": [
253169
"ref = heavy_calculation.remote(3000)"
254170
]
255171
},
256172
{
257173
"cell_type": "code",
258-
"execution_count": 9,
174+
"execution_count": null,
259175
"id": "01172c29-e8bf-41ef-8db5-eccb07906111",
260176
"metadata": {},
261-
"outputs": [
262-
{
263-
"name": "stderr",
264-
"output_type": "stream",
265-
"text": [
266-
"2023-06-27 19:14:29,202\tDEBUG worker.py:640 -- Retaining 16310a0f0a45af5cffffffffffffffffffffffff0100000001000000\n",
267-
"2023-06-27 19:14:31,224\tDEBUG worker.py:439 -- Internal retry for get [ClientObjectRef(16310a0f0a45af5cffffffffffffffffffffffff0100000001000000)]\n"
268-
]
269-
},
270-
{
271-
"data": {
272-
"text/plain": [
273-
"1789.4644387076714"
274-
]
275-
},
276-
"execution_count": 9,
277-
"metadata": {},
278-
"output_type": "execute_result"
279-
}
280-
],
177+
"outputs": [],
281178
"source": [
282179
"ray.get(ref)"
283180
]
284181
},
285182
{
286183
"cell_type": "code",
287-
"execution_count": 10,
184+
"execution_count": null,
288185
"id": "9e79b547-a457-4232-b77d-19147067b972",
289186
"metadata": {},
290-
"outputs": [
291-
{
292-
"name": "stderr",
293-
"output_type": "stream",
294-
"text": [
295-
"2023-06-27 19:14:33,161\tDEBUG dataclient.py:287 -- Got unawaited response connection_cleanup {\n",
296-
"}\n",
297-
"\n",
298-
"2023-06-27 19:14:34,460\tDEBUG dataclient.py:278 -- Shutting down data channel.\n"
299-
]
300-
}
301-
],
187+
"outputs": [],
302188
"source": [
303189
"ray.cancel(ref)\n",
304190
"ray.shutdown()"
305191
]
306192
},
307193
{
308194
"cell_type": "code",
309-
"execution_count": 11,
195+
"execution_count": null,
310196
"id": "2c198f1f-68bf-43ff-a148-02b5cb000ff2",
311197
"metadata": {},
312198
"outputs": [],
@@ -339,7 +225,7 @@
339225
"name": "python",
340226
"nbconvert_exporter": "python",
341227
"pygments_lexer": "ipython3",
342-
"version": "3.8.17"
228+
"version": "3.9.18"
343229
},
344230
"vscode": {
345231
"interpreter": {

Diff for: demo-notebooks/guided-demos/0_basic_ray.ipynb

+8-5
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,12 @@
4545
"id": "bc27f84c",
4646
"metadata": {},
4747
"source": [
48-
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
48+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding RayCluster).\n",
4949
"\n",
5050
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
51-
"The example here is a community image."
51+
"The example here is a community image.\n",
52+
"\n",
53+
"NOTE: By default the SDK uses Kueue as it's scheduling solution to use MCAD set the `mcad=True` option in `ClusterConfiguration`"
5254
]
5355
},
5456
{
@@ -58,7 +60,8 @@
5860
"metadata": {},
5961
"outputs": [],
6062
"source": [
61-
"# Create and configure our cluster object (and appwrapper)\n",
63+
"# Create and configure our cluster object\n",
64+
"# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
6265
"cluster = Cluster(ClusterConfiguration(\n",
6366
" name='raytest',\n",
6467
" namespace='default',\n",
@@ -69,7 +72,7 @@
6972
" max_memory=4,\n",
7073
" num_gpus=0,\n",
7174
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
72-
" instascale=False\n",
75+
" # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
7376
"))"
7477
]
7578
},
@@ -78,7 +81,7 @@
7881
"id": "12eef53c",
7982
"metadata": {},
8083
"source": [
81-
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
84+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
8285
]
8386
},
8487
{

Diff for: demo-notebooks/guided-demos/1_basic_instascale.ipynb

+7-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
"id": "9865ee8c",
66
"metadata": {},
77
"source": [
8-
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
8+
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments).\n",
9+
"\n",
10+
"NOTE: The InstaScale and MCAD components are in Tech Preview"
911
]
1012
},
1113
{
@@ -45,7 +47,9 @@
4547
"This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):\n",
4648
"\n",
4749
"NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
48-
"The example here is a community image."
50+
"The example here is a community image.\n",
51+
"\n",
52+
"NOTE: This specific demo requires MCAD and InstaScale to be enabled on the Cluster"
4953
]
5054
},
5155
{
@@ -66,6 +70,7 @@
6670
" max_memory=8,\n",
6771
" num_gpus=1,\n",
6872
" image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
73+
" mcad=True, # Enable MCAD\n",
6974
" instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
7075
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
7176
"))"

0 commit comments

Comments
 (0)