Skip to content

Use Kueue as default #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 5, 2024
Merged

Conversation

Bobbins228
Copy link
Contributor

@Bobbins228 Bobbins228 commented Feb 26, 2024

Issue link

RHOAIENG-1058

What changes have been made

KubeRay and Kueue are now the primary resources responsible for creating Ray Clusters through the SDK.
mcad=False by default meaning Ray Clusters are created via Kube Ray and Routes/Ingresses are created seperately.

The local queue label is applied to the Ray Cluster when mcad=False.
The SDK will try to gather the default local queue's name by reading the annotation "kueue.x-k8s.io/default-queue":"true" from the local queue. If the SDK is unsuccessful it will raise an error asking for the LQ name to be specified in the ClusterConfiguration.

Verification steps

There are two ways to test this.

  1. SDK gathers the default local queue name
  2. User provides the local queue name

1: SDK gathers the default local queue name

  • Create a Local Queue with the new annotation (or add the annotation to your own LQ).
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
    namespace: ns
    name: local_queue_name
    annotations:
      "kueue.x-k8s.io/default-queue": "true"
spec:
  clusterQueue: cluster_queue_name
  • Run through a default notebook.
  • A yaml file will be generated with the kueue.x-k8s.io/queue-name: LQ_NAME label
  • Create the Ray Cluster
  • Check if a Workload was created for the Ray Cluster

2: User provides the Local Queue name

Note: This method is taken into account over the SDK gathering the Local Queue name

  • Run through a demo notebook
  • Add the parameter local_queue="LOCAL_QUEUE_NAME" to the ClusterConfiguration
  • Check the generated yaml file for the appropriate label
  • Create the Ray Cluster
  • Check if a Workload was created for the Ray Cluster

Other Changes

The status for RayClusters includes suspended now.
Altered list_all_queued and list_all_clusters to reflect this.
Added the functions above to the simple import scheme.

Note: This PR should not be merged until RHOAIENG-1056 and RHOAIENG-3270 are merged.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@Bobbins228 Bobbins228 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2024
Copy link
Collaborator

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments

@dimakis
Copy link
Contributor

dimakis commented Feb 27, 2024

@Bobbins228 This PR needs to be carefully timed as to when it will be merged. just making sure this is thought about before someone merges it

@Bobbins228 Bobbins228 requested a review from KPostOffice March 4, 2024 15:18
@astefanutti
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 14, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2024
Copy link
Contributor

openshift-ci bot commented Mar 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from astefanutti. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2024
@dgrove-oss
Copy link
Collaborator

/lgtm

I'll rebase #484 ontop of this as soon as this is merged. There will be some minor merge conflicts, but should be manageable.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 2, 2024
@Bobbins228
Copy link
Contributor Author

@dgrove-oss
This PR now just depends on the two following issues and it can be merged 👍
https://issues.redhat.com/browse/RHOAIENG-1056
https://issues.redhat.com/browse/RHOAIENG-1031

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2024
@dimakis
Copy link
Contributor

dimakis commented Apr 4, 2024

@dgrove-oss This PR now just depends on the two following issues and it can be merged 👍 https://issues.redhat.com/browse/RHOAIENG-1056 https://issues.redhat.com/browse/RHOAIENG-1031

@Bobbins228 https://issues.redhat.com/browse/RHOAIENG-1031 has been merged by the looks of it so we're now just waiting for #495

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2024
@Bobbins228 Bobbins228 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 5, 2024
@dimakis dimakis requested a review from astefanutti April 5, 2024 16:04
Updated oauth test to have mcad=True

Changed .codeflare/appwrappers to .codeflare/resources

Addressed comments & added SUSPENDED status

Review changes & list_cluster functions

Updated tests and load_components

Update tests, Rebase
@astefanutti
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024
Copy link
Contributor

openshift-ci bot commented Apr 5, 2024

New changes are detected. LGTM label has been removed.

@dimakis dimakis merged commit 017fa12 into project-codeflare:main Apr 5, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants