Skip to content

Testing r220 #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 3 additions & 27 deletions .github/workflows/e2e_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ jobs:
- name: Set up specific Python version
uses: actions/setup-python@v5
with:
python-version: '3.9'
python-version: '3.9.18'
cache: 'pip' # caching pip dependencies

- name: Setup and start KinD cluster
Expand All @@ -90,6 +90,7 @@ jobs:
run: |
cd codeflare-operator
echo Setting up CodeFlare stack
sed -i 's|certGeneratorImage: .*|certGeneratorImage: registry.redhat.io/ubi9@sha256:770cf07083e1c85ae69c25181a205b7cdef63c11b794c89b3b487d4670b4c328|' config/e2e/config.yaml
make setup-e2e
echo Deploying CodeFlare operator
IMG="${REGISTRY_ADDRESS}"/codeflare-operator
Expand All @@ -98,39 +99,14 @@ jobs:
kubectl wait --timeout=120s --for=condition=Available=true deployment -n openshift-operators codeflare-operator-manager
cd ..

- name: Add user to KinD
uses: ./common/github-actions/kind-add-user
with:
user-name: sdk-user

- name: Add kueue resources
run: kubectl apply --server-side -f "https://github.com/kubernetes-sigs/kueue/releases/download/v0.6.2/manifests.yaml"

- name: Configure RBAC for sdk user with limited permissions
run: |
kubectl create clusterrole list-ingresses --verb=get,list --resource=ingresses
kubectl create clusterrolebinding sdk-user-list-ingresses --clusterrole=list-ingresses --user=sdk-user
kubectl create clusterrole namespace-creator --verb=get,list,create,delete,patch --resource=namespaces
kubectl create clusterrolebinding sdk-user-namespace-creator --clusterrole=namespace-creator --user=sdk-user
kubectl create clusterrole raycluster-creator --verb=get,list,create,delete,patch --resource=rayclusters
kubectl create clusterrolebinding sdk-user-raycluster-creator --clusterrole=raycluster-creator --user=sdk-user
kubectl create clusterrole appwrapper-creator --verb=get,list,create,delete,patch --resource=appwrappers
kubectl create clusterrolebinding sdk-user-appwrapper-creator --clusterrole=appwrapper-creator --user=sdk-user
kubectl create clusterrole resourceflavor-creator --verb=get,list,create,delete --resource=resourceflavors
kubectl create clusterrolebinding sdk-user-resourceflavor-creator --clusterrole=resourceflavor-creator --user=sdk-user
kubectl create clusterrole clusterqueue-creator --verb=get,list,create,delete,patch --resource=clusterqueues
kubectl create clusterrolebinding sdk-user-clusterqueue-creator --clusterrole=clusterqueue-creator --user=sdk-user
kubectl create clusterrole localqueue-creator --verb=get,list,create,delete,patch --resource=localqueues
kubectl create clusterrolebinding sdk-user-localqueue-creator --clusterrole=localqueue-creator --user=sdk-user
kubectl create clusterrole list-secrets --verb=get,list --resource=secrets
kubectl create clusterrolebinding sdk-user-list-secrets --clusterrole=list-secrets --user=sdk-user
kubectl config use-context sdk-user

- name: Run e2e tests
run: |
export CODEFLARE_TEST_OUTPUT_DIR=${{ env.TEMP_DIR }}
echo "CODEFLARE_TEST_OUTPUT_DIR=${CODEFLARE_TEST_OUTPUT_DIR}" >> $GITHUB_ENV

python --version
set -euo pipefail
pip install poetry
poetry install --with test,docs
Expand Down
279 changes: 103 additions & 176 deletions poetry.lock

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ homepage = "https://github.com/project-codeflare/codeflare-sdk"
keywords = ['codeflare', 'python', 'sdk', 'client', 'batch', 'scale']

[tool.poetry.dependencies]
python = "^3.8"
python = "^3.9"
openshift-client = "1.0.18"
rich = "^12.5"
ray = {version = "2.7.0", extras = ["data", "default"]}
ray = {version = "2.20.0", extras = ["data", "default"]}
kubernetes = ">= 25.3.0, < 27"
codeflare-torchx = "0.6.0.dev2"
cryptography = "40.0.2"
Expand Down
6 changes: 3 additions & 3 deletions src/codeflare_sdk/templates/base-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ spec:
# - kubernetes
spec:
# The version of Ray you are using. Make sure all Ray containers are running this version of Ray.
rayVersion: '2.7.0'
rayVersion: '2.20.0'
# If enableInTreeAutoscaling is true, the autoscaler sidecar will be added to the Ray head pod.
# Ray autoscaler integration is supported only for Ray versions >= 1.11.0
# Ray autoscaler integration is Beta with KubeRay >= 0.3.0 and Ray >= 2.0.0.
Expand Down Expand Up @@ -78,7 +78,7 @@ spec:
containers:
# The Ray head pod
- name: ray-head
image: quay.io/project-codeflare/ray:latest-py39-cu118
image: quay.io/project-codeflare/ray:2.20.0-py39-cu118
imagePullPolicy: Always
ports:
- containerPort: 6379
Expand Down Expand Up @@ -161,7 +161,7 @@ spec:
spec:
containers:
- name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: quay.io/project-codeflare/ray:latest-py39-cu118
image: quay.io/project-codeflare/ray:2.20.0-py39-cu118
# environment variables to set in the container.Optional.
# Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
lifecycle:
Expand Down
67 changes: 66 additions & 1 deletion tests/e2e/local_interactive_sdk_kind_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
import pytest
import ray
import math
from time import sleep

from support import *
import yaml


@pytest.mark.kind
Expand Down Expand Up @@ -45,11 +47,74 @@ def run_local_interactives(self):
max_memory=2,
num_gpus=0,
image=ray_image,
write_to_file=True,
write_to_file=False,
verify_tls=False,
)
)
cluster.up()
sleep(60)
api_instance = client.CustomObjectsApi()
rcs = api_instance.list_namespaced_custom_object(
group="ray.io",
version="v1",
namespace=self.namespace,
plural="rayclusters",
)
print("------------------ Ray Cluster ------------------")
for rc in rcs["items"]:
print(rc)
print("------------------ EVENTS ------------------")
v1 = client.CoreV1Api()
print(f"Events in namespace: {self.namespace}")
try:
events = v1.list_namespaced_event(namespace=self.namespace).items
for event in events:
print(
f"Event: {event.metadata.name}, Reason: {event.reason}, Message: {event.message}, Timestamp: {event.last_timestamp}"
)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_event: {e}")

print("------------------ Workloads ------------------")
api_instance = client.CustomObjectsApi()
try:
workloads = api_instance.list_namespaced_custom_object(
"kueue.x-k8s.io", "v1beta1", self.namespace, "workloads"
)
for workload in workloads.get("items", []):
name = workload["metadata"]["name"]
status = workload.get("status", {})
print(workload)
print(
f"Workload: {name}, Namespace: {self.namespace}, Status: {status}"
)
except client.exceptions.ApiException as e:
print(
f"Exception when calling CustomObjectsApi->list_namespaced_custom_object: {e}"
)
print("------------- PODS ----------------")
try:
pods = v1.list_namespaced_pod(self.namespace)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_pod: {e}")
exit(1)

# Loop through the list of pods and print the YAML of those that start with 'test-name-head'
for pod in pods.items:
pod_name = pod.metadata.name
if pod_name.startswith("test-ray-cluster-li"):
print(f"YAML configuration for pod: {pod_name}")
try:
pod_detail = v1.read_namespaced_pod(
name=pod_name, namespace=self.namespace
)
pod_dict = pod_detail.to_dict()
print("---------------------------------")
print(yaml.dump(pod_dict, default_flow_style=False))
except client.exceptions.ApiException as e:
print(
f"Exception when calling CoreV1Api->read_namespaced_pod for pod {pod_name}: {e}"
)
cluster.wait_ready()

generate_cert.generate_tls_cert(cluster_name, self.namespace)
Expand Down
66 changes: 65 additions & 1 deletion tests/e2e/mnist_raycluster_sdk_aw_kind_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
import pytest

from support import *
from time import sleep

# This test creates an AppWrapper containing a Ray Cluster and covers the Ray Job submission functionality on Kind Cluster
import yaml


@pytest.mark.kind
Expand Down Expand Up @@ -49,7 +51,69 @@ def run_mnist_raycluster_sdk_kind(self):
)

cluster.up()

sleep(45)
api_instance = client.CustomObjectsApi()
rcs = api_instance.list_namespaced_custom_object(
group="ray.io",
version="v1",
namespace=self.namespace,
plural="rayclusters",
)
print("------------------ Ray Cluster ------------------")
for rc in rcs["items"]:
print(rc)
print("------------------ EVENTS ------------------")
v1 = client.CoreV1Api()
print(f"Events in namespace: {self.namespace}")
try:
events = v1.list_namespaced_event(namespace=self.namespace).items
for event in events:
print(
f"Event: {event.metadata.name}, Reason: {event.reason}, Message: {event.message}, Timestamp: {event.last_timestamp}"
)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_event: {e}")

print("------------------ Workloads ------------------")
api_instance = client.CustomObjectsApi()
try:
workloads = api_instance.list_namespaced_custom_object(
"kueue.x-k8s.io", "v1beta1", self.namespace, "workloads"
)
for workload in workloads.get("items", []):
name = workload["metadata"]["name"]
status = workload.get("status", {})
print(workload)
print(
f"Workload: {name}, Namespace: {self.namespace}, Status: {status}"
)
except client.exceptions.ApiException as e:
print(
f"Exception when calling CustomObjectsApi->list_namespaced_custom_object: {e}"
)
print("------------- PODS ----------------")
try:
pods = v1.list_namespaced_pod(self.namespace)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_pod: {e}")
exit(1)

# Loop through the list of pods and print the YAML of those that start with 'test-name-head'
for pod in pods.items:
pod_name = pod.metadata.name
if pod_name.startswith("mnist"):
print(f"YAML configuration for pod: {pod_name}")
try:
pod_detail = v1.read_namespaced_pod(
name=pod_name, namespace=self.namespace
)
pod_dict = pod_detail.to_dict()
print("---------------------------------")
print(yaml.dump(pod_dict, default_flow_style=False))
except client.exceptions.ApiException as e:
print(
f"Exception when calling CoreV1Api->read_namespaced_pod for pod {pod_name}: {e}"
)
cluster.status()

cluster.wait_ready()
Expand Down
67 changes: 66 additions & 1 deletion tests/e2e/mnist_raycluster_sdk_kind_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,12 @@
import pytest

from support import *
from time import sleep

# This test creates a Ray Cluster and covers the Ray Job submission functionality on Kind Cluster

import yaml


@pytest.mark.kind
class TestRayClusterSDKKind:
Expand Down Expand Up @@ -49,7 +52,69 @@ def run_mnist_raycluster_sdk_kind(self):
)

cluster.up()

sleep(45)
api_instance = client.CustomObjectsApi()
rcs = api_instance.list_namespaced_custom_object(
group="ray.io",
version="v1",
namespace=self.namespace,
plural="rayclusters",
)
print("------------------ Ray Cluster ------------------")
for rc in rcs["items"]:
print(rc)
print("------------------ EVENTS ------------------")
v1 = client.CoreV1Api()
print(f"Events in namespace: {self.namespace}")
try:
events = v1.list_namespaced_event(namespace=self.namespace).items
for event in events:
print(
f"Event: {event.metadata.name}, Reason: {event.reason}, Message: {event.message}, Timestamp: {event.last_timestamp}"
)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_event: {e}")

print("------------------ Workloads ------------------")
api_instance = client.CustomObjectsApi()
try:
workloads = api_instance.list_namespaced_custom_object(
"kueue.x-k8s.io", "v1beta1", self.namespace, "workloads"
)
for workload in workloads.get("items", []):
print(workload)
name = workload["metadata"]["name"]
status = workload.get("status", {})
print(
f"Workload: {name}, Namespace: {self.namespace}, Status: {status}"
)
except client.exceptions.ApiException as e:
print(
f"Exception when calling CustomObjectsApi->list_namespaced_custom_object: {e}"
)
print("------------- PODS ----------------")
try:
pods = v1.list_namespaced_pod(self.namespace)
except client.exceptions.ApiException as e:
print(f"Exception when calling CoreV1Api->list_namespaced_pod: {e}")
exit(1)

# Loop through the list of pods and print the YAML of those that start with 'test-name-head'
for pod in pods.items:
pod_name = pod.metadata.name
if pod_name.startswith("mnist"):
print(f"YAML configuration for pod: {pod_name}")
try:
pod_detail = v1.read_namespaced_pod(
name=pod_name, namespace=self.namespace
)
pod_dict = pod_detail.to_dict()
print("---------------------------------")
print(yaml.dump(pod_dict, default_flow_style=False))
except client.exceptions.ApiException as e:
print(
f"Exception when calling CoreV1Api->read_namespaced_pod for pod {pod_name}: {e}"
)
cluster.status()

cluster.wait_ready()
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e/support.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


def get_ray_image():
default_ray_image = "quay.io/project-codeflare/ray:latest-py39-cu118"
default_ray_image = "quay.io/project-codeflare/ray:2.20.0-py39-cu118"
return os.getenv("RAY_IMAGE", default_ray_image)


Expand Down
6 changes: 3 additions & 3 deletions tests/test-case-bad.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ spec:
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay.io/project-codeflare/ray:latest-py39-cu118
image: quay.io/project-codeflare/ray:2.20.0-py39-cu118
imagePullPolicy: Always
lifecycle:
preStop:
Expand All @@ -68,7 +68,7 @@ spec:
cpu: 2
memory: 8G
nvidia.com/gpu: 0
rayVersion: 1.12.0
rayVersion: 2.20.0
workerGroupSpecs:
- groupName: small-group-unit-test-cluster
maxReplicas: 2
Expand All @@ -90,7 +90,7 @@ spec:
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay.io/project-codeflare/ray:latest-py39-cu118
image: quay.io/project-codeflare/ray:2.20.0-py39-cu118
lifecycle:
preStop:
exec:
Expand Down
Loading
Loading