Skip to content

remove: DDPJobDefinition from SDK #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

VanillaSpoon
Copy link
Contributor

Issue link

https://issues.redhat.com/browse/RHOAIENG-5096

What changes have been made

This pr contains a removal of the DDPJobDefinition from the codeflare-sdk, as well as updated tests to reflect the change.

A follow up pr will be made to update the docs and demo notebooks upon being merged.

Verification steps

Run the sdk tests and ensure they are functioning correctly.
Run a demo notebook to ensure no breaks.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024
@Srihari1192
Copy link
Contributor

@VanillaSpoon Tried running tests mnist_raycluster_sdk_oauth_test.py and demo notebooks with cluster configuration defined , App wrapper creation is fine but Ray Cluster creation is Failing..

Tried running on installing Main branch aswell , same error
@ChristianZaccaria Does we missed any cluster configuration here?

tests/e2e/mnist_raycluster_sdk_oauth_test.py::TestRayClusterSDKOauth::test_mnist_ray_cluster_sdk_auth Insecure request warnings have been disabled Written to: /opt/app-root/src/.codeflare/appwrapper/mnist.yaml AppWrapper 'mnist' has been created in the namespace: 'test-ns-8iw5r' ╭──────────────────────╮ │ 🚀 Cluster Queue │ │ Status 🚀 │ │ +-------+----------+ │ │ | Name | Status | │ │ +=======+==========+ │ │ | mnist | queueing | │ │ | | | │ │ +-------+----------+ │ ╰──────────────────────╯ Waiting for requested resources to be set up...

cluster = Cluster(ClusterConfiguration( name='jobtest', namespace='test-dw', num_workers=2, head_cpus="500m", head_memory=2, min_cpus=1, max_cpus=1, min_memory=4, max_memory=4, num_gpus=0, image="quay.io/project-codeflare/ray:latest-py39-cu118", instascale=False, write_to_file=True, ))

@ChristianZaccaria
Copy link
Collaborator

ChristianZaccaria commented Apr 5, 2024

@Srihari1192 Those clusterconfiguration look good to me, the openshift_oauth is removed as we are defaulting to it to be true. And all oauth resources are created by the RC Controller in Main. cc: @KPostOffice

@KPostOffice
Copy link
Collaborator

There are some examples in the demo notebooks that use these classes. Not sure what the best course of action is (remove notebooks?)

@KPostOffice
Copy link
Collaborator

You can remove torchx_config function in cluster.py and torchx_scheduler = "ray" as well as imports involving torchx at the top of unit_test.py and remove torchx from requirements.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024
Copy link
Contributor

openshift-ci bot commented Apr 5, 2024

New changes are detected. LGTM label has been removed.

Copy link
Contributor

openshift-ci bot commented Apr 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from bobbins228. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dimakis dimakis merged commit 88b2518 into project-codeflare:main Apr 5, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants