Skip to content

RHOAIENG-11046 - Add support for AMD GPU image #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jiripetrlik
Copy link
Contributor

Issue link

What changes have been made

Verification steps

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@@ -8,4 +8,5 @@ const (
CodeFlareSDKVersion = "v0.20.2"
RayVersion = "2.35.0"
RayImage = "quay.io/modh/ray:2.35.0-py39-cu121"
RayAMDGpuImage = "quay.io/rhoai/ray:2.35.0-py39-rocm61-torch24-fa26"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rocm-ray image quay.io/rhoai/ray:2.35.0-py39-rocm61-torch24-fa26 is torch compatible.
IMO, we should use base Ray AMD/ROCm image"2.35.0-py39-rocm61" built with konflux
quay.io/modh/ray:2.35.0-py39-rocm61

@jiripetrlik
Copy link
Contributor Author

jiripetrlik commented Sep 27, 2024

Hello @astefanutti can you please help us to find out what is the right image?

@astefanutti
Copy link
Collaborator

@jiripetrlik Yes that's preferable to default to quay.io/modh/ray:2.35.0-py39-rocm61 as @ChughShilpa suggested.

@jiripetrlik
Copy link
Contributor Author

@ChughShilpa @astefanutti Thank you for clarification of the image name! It should be fixed now.

Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -78,6 +79,10 @@ func GetRayImage() string {
return lookupEnvOrDefault(CodeFlareTestRayImage, RayImage)
}

func GetRayAMDGpuImage() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to call it GetRayROCmImage, and maybe rename CodeFlareTestRayImage to CodeFlareTestRayCUDAImage.

Copy link

openshift-ci bot commented Sep 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti, ChughShilpa, jiripetrlik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 05a5ffd into project-codeflare:main Sep 30, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants