project-codeflare
diff --git a/Diff for: ‎demo-notebooks/additional-demos/hf_interactive.ipynb
+10-6 b/Diff for: ‎demo-notebooks/additional-demos/hf_interactive.ipynb
+10-6
diff --git a/Diff for: ‎demo-notebooks/additional-demos/local_interactive.ipynb
+23-137 b/Diff for: ‎demo-notebooks/additional-demos/local_interactive.ipynb
+23-137
diff --git a/Diff for: ‎demo-notebooks/guided-demos/0_basic_ray.ipynb
+9-6 b/Diff for: ‎demo-notebooks/guided-demos/0_basic_ray.ipynb
+9-6
diff --git a/Diff for: ‎demo-notebooks/guided-demos/1_basic_instascale.ipynb
+7-3 b/Diff for: ‎demo-notebooks/guided-demos/1_basic_instascale.ipynb
+7-3
@@ -68,10 +68,13 @@
    "id": "bc27f84c",
    "metadata": {},
    "source": [
-    "Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
+    "Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding Ray Cluster).\n",
     "\n",
     "NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
-    "The example here is a community image."
+    "The example here is a community image.\n",
+    "\n",
+    "NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
+    "MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
    ]
   },
   {
@@ -89,7 +92,8 @@
     }
    ],
    "source": [
-    "# Create our cluster and submit appwrapper\n",
+    "# Create our cluster and submit\n",
+    "# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
     "cluster = Cluster(ClusterConfiguration(name='hfgputest', \n",
     "                                       namespace=\"default\",\n",
     "                                       num_workers=1,\n",
@@ -99,16 +103,16 @@
     "                                       max_memory=16, \n",
     "                                       num_gpus=4,\n",
     "                                       image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
-    "                                       mcad=True,\n",
-    "                                       instascale=True, machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
+    "                                       # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
+    "                                       ))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "12eef53c",
    "metadata": {},
    "source": [
-    "Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
+    "Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
    ]
   },
   {
 
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "id": "9a44568b-61ef-41c7-8ad1-9a3b128f03a7",
    "metadata": {
     "tags": []
@@ -36,7 +36,10 @@
    "source": [
     "\n",
     "NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
-    "The example here is a community image."
+    "The example here is a community image.\n",
+    "\n",
+    "NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
+    "MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
    ]
   },
   {
@@ -48,7 +51,8 @@
    },
    "outputs": [],
    "source": [
-    "# Create our cluster and submit appwrapper\n",
+    "# Create our cluster and submit\n",
+    "# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
     "namespace = \"default\"\n",
     "cluster_name = \"hfgputest-1\"\n",
     "local_interactive = True\n",
@@ -63,14 +67,13 @@
     "                                       max_memory=4,\n",
     "                                       num_gpus=0,\n",
     "                                       image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
-    "                                       mcad=True,\n",
-    "                                       instascale=False,\n",
-    "                                       machine_types=[\"m5.xlarge\", \"p3.8xlarge\"]))"
+    "                                       # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
+    "                                    ))"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "id": "69968140-15e6-482f-9529-82b0cd19524b",
    "metadata": {
     "tags": []
@@ -82,21 +85,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "id": "e20f9982-f671-460b-8c22-3d62e101fed9",
    "metadata": {
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Waiting for requested resources to be set up...\n",
-      "Requested cluster up and running!\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "cluster.wait_ready()"
    ]
@@ -125,82 +119,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "id": "9483bb98-33b3-4beb-9b15-163d7e76c1d7",
    "metadata": {
     "scrolled": true,
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2023-06-27 19:14:16,088\tINFO client_builder.py:251 -- Passing the following kwargs to ray.init() on the server: logging_level\n",
-      "2023-06-27 19:14:16,100\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.IDLE\n",
-      "2023-06-27 19:14:16,308\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.CONNECTING\n",
-      "2023-06-27 19:14:16,434\tDEBUG worker.py:378 -- client gRPC channel state change: ChannelConnectivity.READY\n",
-      "2023-06-27 19:14:16,436\tDEBUG worker.py:807 -- Pinging server.\n",
-      "2023-06-27 19:14:18,634\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000001000000\n",
-      "2023-06-27 19:14:18,635\tDEBUG worker.py:564 -- Scheduling task get_dashboard_url 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00'\n",
-      "2023-06-27 19:14:18,645\tDEBUG worker.py:640 -- Retaining c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n",
-      "2023-06-27 19:14:19,454\tDEBUG worker.py:636 -- Releasing c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "    <div style=\"margin-left: 50px;display: flex;flex-direction: row;align-items: center\">\n",
-       "        <h3 style=\"color: var(--jp-ui-font-color0)\">Ray</h3>\n",
-       "        <svg version=\"1.1\" id=\"ray\" width=\"3em\" viewBox=\"0 0 144.5 144.6\" style=\"margin-left: 3em;margin-right: 3em\">\n",
-       "            <g id=\"layer-1\">\n",
-       "                <path fill=\"#00a2e9\" class=\"st0\" d=\"M97.3,77.2c-3.8-1.1-6.2,0.9-8.3,5.1c-3.5,6.8-9.9,9.9-17.4,9.6S58,88.1,54.8,81.2c-1.4-3-3-4-6.3-4.1\n",
-       "                    c-5.6-0.1-9.9,0.1-13.1,6.4c-3.8,7.6-13.6,10.2-21.8,7.6C5.2,88.4-0.4,80.5,0,71.7c0.1-8.4,5.7-15.8,13.8-18.2\n",
-       "                    c8.4-2.6,17.5,0.7,22.3,8c1.3,1.9,1.3,5.2,3.6,5.6c3.9,0.6,8,0.2,12,0.2c1.8,0,1.9-1.6,2.4-2.8c3.5-7.8,9.7-11.8,18-11.9\n",
-       "                    c8.2-0.1,14.4,3.9,17.8,11.4c1.3,2.8,2.9,3.6,5.7,3.3c1-0.1,2,0.1,3,0c2.8-0.5,6.4,1.7,8.1-2.7s-2.3-5.5-4.1-7.5\n",
-       "                    c-5.1-5.7-10.9-10.8-16.1-16.3C84,38,81.9,37.1,78,38.3C66.7,42,56.2,35.7,53,24.1C50.3,14,57.3,2.8,67.7,0.5\n",
-       "                    C78.4-2,89,4.7,91.5,15.3c0.1,0.3,0.1,0.5,0.2,0.8c0.7,3.4,0.7,6.9-0.8,9.8c-1.7,3.2-0.8,5,1.5,7.2c6.7,6.5,13.3,13,19.8,19.7\n",
-       "                    c1.8,1.8,3,2.1,5.5,1.2c9.1-3.4,17.9-0.6,23.4,7c4.8,6.9,4.6,16.1-0.4,22.9c-5.4,7.2-14.2,9.9-23.1,6.5c-2.3-0.9-3.5-0.6-5.1,1.1\n",
-       "                    c-6.7,6.9-13.6,13.7-20.5,20.4c-1.8,1.8-2.5,3.2-1.4,5.9c3.5,8.7,0.3,18.6-7.7,23.6c-7.9,5-18.2,3.8-24.8-2.9\n",
-       "                    c-6.4-6.4-7.4-16.2-2.5-24.3c4.9-7.8,14.5-11,23.1-7.8c3,1.1,4.7,0.5,6.9-1.7C91.7,98.4,98,92.3,104.2,86c1.6-1.6,4.1-2.7,2.6-6.2\n",
-       "                    c-1.4-3.3-3.8-2.5-6.2-2.6C99.8,77.2,98.9,77.2,97.3,77.2z M72.1,29.7c5.5,0.1,9.9-4.3,10-9.8c0-0.1,0-0.2,0-0.3\n",
-       "                    C81.8,14,77,9.8,71.5,10.2c-5,0.3-9,4.2-9.3,9.2c-0.2,5.5,4,10.1,9.5,10.3C71.8,29.7,72,29.7,72.1,29.7z M72.3,62.3\n",
-       "                    c-5.4-0.1-9.9,4.2-10.1,9.7c0,0.2,0,0.3,0,0.5c0.2,5.4,4.5,9.7,9.9,10c5.1,0.1,9.9-4.7,10.1-9.8c0.2-5.5-4-10-9.5-10.3\n",
-       "                    C72.6,62.3,72.4,62.3,72.3,62.3z M115,72.5c0.1,5.4,4.5,9.7,9.8,9.9c5.6-0.2,10-4.8,10-10.4c-0.2-5.4-4.6-9.7-10-9.7\n",
-       "                    c-5.3-0.1-9.8,4.2-9.9,9.5C115,72.1,115,72.3,115,72.5z M19.5,62.3c-5.4,0.1-9.8,4.4-10,9.8c-0.1,5.1,5.2,10.4,10.2,10.3\n",
-       "                    c5.6-0.2,10-4.9,9.8-10.5c-0.1-5.4-4.5-9.7-9.9-9.6C19.6,62.3,19.5,62.3,19.5,62.3z M71.8,134.6c5.9,0.2,10.3-3.9,10.4-9.6\n",
-       "                    c0.5-5.5-3.6-10.4-9.1-10.8c-5.5-0.5-10.4,3.6-10.8,9.1c0,0.5,0,0.9,0,1.4c-0.2,5.3,4,9.8,9.3,10\n",
-       "                    C71.6,134.6,71.7,134.6,71.8,134.6z\"/>\n",
-       "            </g>\n",
-       "        </svg>\n",
-       "        <table>\n",
-       "            <tr>\n",
-       "                <td style=\"text-align: left\"><b>Python version:</b></td>\n",
-       "                <td style=\"text-align: left\"><b>3.8.13</b></td>\n",
-       "            </tr>\n",
-       "            <tr>\n",
-       "                <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
-       "                <td style=\"text-align: left\"><b> 2.1.0</b></td>\n",
-       "            </tr>\n",
-       "            <tr>\n",
-       "    <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
-       "    <td style=\"text-align: left\"><b><a href=\"http://10.254.20.41:8265\" target=\"_blank\">http://10.254.20.41:8265</a></b></td>\n",
-       "</tr>\n",
-       "\n",
-       "        </table>\n",
-       "    </div>\n",
-       "</div>\n"
-      ],
-      "text/plain": [
-       "ClientContext(dashboard_url='10.254.20.41:8265', python_version='3.8.13', ray_version='2.1.0', ray_commit='23f34d948dae8de9b168667ab27e6cf940b3ae85', protocol_version='2022-10-05', _num_clients=1, _context_to_restore=<ray.util.client._ClientContext object at 0x108ca2730>)"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "import ray\n",
     "\n",
@@ -210,7 +135,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "id": "3436eb4a-217c-4109-a3c3-309fda7e2442",
    "metadata": {},
    "outputs": [],
@@ -234,80 +159,41 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "id": "5cca1874-2be3-4631-ae48-9adfa45e3af3",
    "metadata": {
     "scrolled": true,
     "tags": []
    },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2023-06-27 19:14:28,222\tDEBUG worker.py:640 -- Retaining 00ffffffffffffffffffffffffffffffffffffff0100000002000000\n",
-      "2023-06-27 19:14:28,222\tDEBUG worker.py:564 -- Scheduling task heavy_calculation 0 b'\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x01\\x00\\x00\\x00\\x02\\x00\\x00\\x00'\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "ref = heavy_calculation.remote(3000)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "id": "01172c29-e8bf-41ef-8db5-eccb07906111",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2023-06-27 19:14:29,202\tDEBUG worker.py:640 -- Retaining 16310a0f0a45af5cffffffffffffffffffffffff0100000001000000\n",
-      "2023-06-27 19:14:31,224\tDEBUG worker.py:439 -- Internal retry for get [ClientObjectRef(16310a0f0a45af5cffffffffffffffffffffffff0100000001000000)]\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "1789.4644387076714"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "ray.get(ref)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
    "id": "9e79b547-a457-4232-b77d-19147067b972",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2023-06-27 19:14:33,161\tDEBUG dataclient.py:287 -- Got unawaited response connection_cleanup {\n",
-      "}\n",
-      "\n",
-      "2023-06-27 19:14:34,460\tDEBUG dataclient.py:278 -- Shutting down data channel.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "ray.cancel(ref)\n",
     "ray.shutdown()"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": null,
    "id": "2c198f1f-68bf-43ff-a148-02b5cb000ff2",
    "metadata": {},
    "outputs": [],
@@ -340,7 +226,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.17"
+   "version": "3.9.18"
   },
   "vscode": {
    "interpreter": {
 
@@ -45,10 +45,13 @@
    "id": "bc27f84c",
    "metadata": {},
    "source": [
-    "Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).\n",
+    "Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding RayCluster).\n",
     "\n",
     "NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
-    "The example here is a community image."
+    "The example here is a community image.\n",
+    "\n",
+    "NOTE: By default the SDK uses Kueue as it's scheduling solution. \n",
+    "MCAD can be enabled over Kueue by using the `mcad=True` option in `ClusterConfiguration`"
    ]
   },
   {
@@ -58,7 +61,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Create and configure our cluster object (and appwrapper)\n",
+    "# Create and configure our cluster object\n",
+    "# The SDK will try to find the name of your default local queue based on the annotation \"kueue.x-k8s.io/default-queue\": \"true\"\n",
     "cluster = Cluster(ClusterConfiguration(\n",
     "    name='raytest',\n",
     "    namespace='default',\n",
@@ -69,8 +73,7 @@
     "    max_memory=4,\n",
     "    num_gpus=0,\n",
     "    image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
-    "    mcad=True,\n",
-    "    instascale=False\n",
+    "    # local_queue=\"local-queue-name\" # Specify the local queue manually\n",
     "))"
    ]
   },
@@ -79,7 +82,7 @@
    "id": "12eef53c",
    "metadata": {},
    "source": [
-    "Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
+    "Next, we want to bring our cluster up, so we call the `up()` function below to submit our Ray Cluster onto the queue, and begin the process of obtaining our resource cluster."
    ]
   },
   {
 
@@ -5,7 +5,9 @@
    "id": "9865ee8c",
    "metadata": {},
    "source": [
-    "In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
+    "In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments).\n",
+    "\n",
+    "NOTE: The InstaScale and MCAD components are in Tech Preview"
    ]
   },
   {
@@ -45,7 +47,9 @@
     "This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):\n",
     "\n",
     "NOTE: We must specify the `image` which will be used in our RayCluster, we recommend you bring your own image which suits your purposes. \n",
-    "The example here is a community image."
+    "The example here is a community image.\n",
+    "\n",
+    "NOTE: This specific demo requires MCAD and InstaScale to be enabled on the Cluster"
    ]
   },
   {
@@ -66,7 +70,7 @@
     "    max_memory=8,\n",
     "    num_gpus=1,\n",
     "    image=\"quay.io/project-codeflare/ray:latest-py39-cu118\",\n",
-    "    mcad=True,\n",
+    "    mcad=True, # Enable MCAD\n",
     "    instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
     "    machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
     "))"