Skip to content

Remove creation of OAuth resources/logic and remove openshift_oauth option #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ChristianZaccaria
Copy link
Collaborator

@ChristianZaccaria ChristianZaccaria commented Mar 12, 2024

Issue link

Issue: https://issues.redhat.com/browse/RHOAIENG-1993

What changes have been made

  • This PR removes the logic in the SDK that creates the OAuth resources.
  • Remove openshift_oauth from ClusterConfiguration as defaulting to True.
  • Added COOKIE_SECRET env for oauth proxy container. - Here
  • Removed unit tests related to the creation/deletion of OAuth resources.
  • Adjusted test cases and unit tests.
  • Verified this change works with the RayCluster Controller
  • Add verify_tls option to ClusterConfiguration: this option is set to True by default but still has the option to change for debugging. Based on this option the endpoint will be verified, if set to false, it alters if the user verifies that the certificate is trusted.

Verification steps

To test these changes:
0. In an OpenShift cluster.

  1. Checkout this PR RayCluster Controller. In the CFO repository you can run the CFO in your cluster with make run NAMESPACE=default and make install.
  2. Deploy and install KubeRay
  3. Checkout this PR and install the SDK
  4. Run through this notebook, change between verify_tls=True/False.
  5. Once cluster.up() is ran, the oauth-proxy sidecar is created by default if on an OpenShift cluster. You should find several resources created including a ServiceAccount named jobtest-oauth-proxy and a route named ray-dashboard-jobtest which were created by the RayCluster Controller as opposed by the SDK.
  6. The route should be accessible through authentication.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link
Collaborator

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one comment

@ChristianZaccaria ChristianZaccaria force-pushed the remove-oauth-logic branch 2 times, most recently from c08cac1 to 03737c8 Compare March 21, 2024 10:16
@Srihari1192 Srihari1192 self-requested a review March 21, 2024 10:48
@Srihari1192
Copy link
Contributor

@ChristianZaccaria
Ray Dashboard route is accessible successfully now, but noticed below error during ray cluster creation in Controller logs

2024-03-21T15:57:08+05:30       ERROR   Failed to update RayCluster with finalizer      {"controller": "codeflare-raycluster-controller", "controllerGroup": "ray.io", "controllerKind": "RayCluster", "RayCluster": {"name":"jobtest-1","namespace":"default"}, "namespace": "default", "name": "jobtest-1", "reconcileID": "8f0aa895-6eb3-40d7-9704-b5b9a232c76a", "requeueing": "true", "error": "Operation cannot be fulfilled on rayclusters.ray.io \"jobtest-1\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/project-codeflare/codeflare-operator/controllers.(*RayClusterReconciler).Reconcile
       /Users/hari/Development/codeflare-operator/controllers/raycluster_controller.go:109
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
2024-03-21T15:57:08+05:30       ERROR   Reconciler error        {"controller": "codeflare-raycluster-controller", "controllerGroup": "ray.io", "controllerKind": "RayCluster", "RayCluster": {"name":"jobtest-1","namespace":"default"}, "namespace": "default", "name": "jobtest-1", "reconcileID": "8f0aa895-6eb3-40d7-9704-b5b9a232c76a", "error": "Operation cannot be fulfilled on rayclusters.ray.io \"jobtest-1\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2```

@Bobbins228
Copy link
Contributor

Bobbins228 commented Mar 21, 2024

Hi @Srihari1192,
That issue seems to be related to the Ray Cluster Controller more so than the SDK changes made here

@ChristianZaccaria
Copy link
Collaborator Author

@ChristianZaccaria Ray Dashboard route is accessible successfully now, but noticed below error during ray cluster creation in Controller logs

2024-03-21T15:57:08+05:30       ERROR   Failed to update RayCluster with finalizer      {"controller": "codeflare-raycluster-controller", "controllerGroup": "ray.io", "controllerKind": "RayCluster", "RayCluster": {"name":"jobtest-1","namespace":"default"}, "namespace": "default", "name": "jobtest-1", "reconcileID": "8f0aa895-6eb3-40d7-9704-b5b9a232c76a", "requeueing": "true", "error": "Operation cannot be fulfilled on rayclusters.ray.io \"jobtest-1\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/project-codeflare/codeflare-operator/controllers.(*RayClusterReconciler).Reconcile
       /Users/hari/Development/codeflare-operator/controllers/raycluster_controller.go:109
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
2024-03-21T15:57:08+05:30       ERROR   Reconciler error        {"controller": "codeflare-raycluster-controller", "controllerGroup": "ray.io", "controllerKind": "RayCluster", "RayCluster": {"name":"jobtest-1","namespace":"default"}, "namespace": "default", "name": "jobtest-1", "reconcileID": "8f0aa895-6eb3-40d7-9704-b5b9a232c76a", "error": "Operation cannot be fulfilled on rayclusters.ray.io \"jobtest-1\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
       /Users/hari/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2```

cc: @KPostOffice

Copy link
Collaborator

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 22, 2024
@KPostOffice
Copy link
Collaborator

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 28, 2024
@ChristianZaccaria
Copy link
Collaborator Author

Note: I performed a rebased

@ChristianZaccaria ChristianZaccaria changed the title Remove creation of OAuth resources/logic and add annotation Remove creation of OAuth resources/logic and remove openshift_oauth option Apr 2, 2024
@Bobbins228
Copy link
Contributor

This needs a pre-commit run --all-files run on the repo

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 3, 2024
@Bobbins228 Bobbins228 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2024
@Bobbins228
Copy link
Contributor

/approve

Copy link
Contributor

openshift-ci bot commented Apr 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobbins228, KPostOffice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Bobbins228,KPostOffice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobbins228 Bobbins228 merged commit 403cca6 into project-codeflare:main Apr 3, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants