From 18e457218098b02b55b1038b6bb974ee288cbaaf Mon Sep 17 00:00:00 2001 From: Claudia Date: Mon, 24 Mar 2025 21:49:44 -0400 Subject: [PATCH 1/2] fine tuning demo --- setup.KubeConEU25/README.md | 227 +++++++++++++++++++++++++++++++++++- 1 file changed, 226 insertions(+), 1 deletion(-) diff --git a/setup.KubeConEU25/README.md b/setup.KubeConEU25/README.md index b39340b..a1da532 100644 --- a/setup.KubeConEU25/README.md +++ b/setup.KubeConEU25/README.md @@ -692,8 +692,233 @@ In this example, `alice` uses [KubeRay](https://github.com/ray-project/kuberay) to run a job that uses [Ray](https://github.com/ray-project/ray) to fine tune a machine learning model. +This workload is an adaptation from [this blog post by Red Hat](https://developers.redhat.com/articles/2024/09/30/fine-tune-llama-openshift-ai), in turn adapted from [an example on Ray documentation](https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed). +The example is about fine tuning Llama 3.1 with Ray, with DeepSpeed and LoRA. +
-TODO +Let's set up the environment by installing Ray and cloning the repository + +```bash +uv venv myenv --python 3.12 --seed && source myenv/bin/activate && uv pip install ray datasets +``` + +We are going to impersonate Alice in this example. + +First, we create the PVC where we can download the model and save the checkpoints from the fine tuning job. We are calling this PVC `finetuning-pvc` and we need to add this to the Ray cluster YAML. If another name is used, please update the `claimName` entry in the Ray cluster definition. + +```bash +kubectl apply --as alice -n blue -f- << EOF +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: finetuning-pvc +spec: + accessModes: + - ReadWriteMany + resources: + requests: + storage: 100Gi + storageClassName: nfs-client-pokprod +EOF +``` + +Now, let's create an AppWrapper version of the Ray cluster. Notice that: + +- We are using the container image `quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26` from Red Hat, but you can use the images from DockerHub if preferred +- We are setting the number of worker replicas to `7`. Since we want to run on one GPU node, we are assigning one to the Ray Head pod, and one each to the 7 worker pods. + +```bash +cd tools/appwrapper-packager/ +cat << EOF > ray.yaml +apiVersion: ray.io/v1 +kind: RayCluster +metadata: + name: ray +spec: + headGroupSpec: + enableIngress: false + rayStartParams: + block: 'true' + dashboard-host: 0.0.0.0 + num-gpus: '1' + resources: '"{}"' + serviceType: ClusterIP + template: + metadata: {} + spec: + containers: + - env: + - name: MY_POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + - name: RAY_USE_TLS + value: '0' + image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26' + imagePullPolicy: Always + lifecycle: + preStop: + exec: + command: + - /bin/sh + - '-c' + - ray stop + name: ray-head + ports: + - containerPort: 6379 + name: gcs + protocol: TCP + - containerPort: 8265 + name: dashboard + protocol: TCP + - containerPort: 10001 + name: client + protocol: TCP + resources: + limits: + cpu: '16' + memory: 256G + nvidia.com/gpu: '1' + requests: + cpu: '16' + memory: 128G + nvidia.com/gpu: '1' + volumeMounts: + - mountPath: /model + name: model + volumes: + - name: model + persistentVolumeClaim: + claimName: finetuning-pvc + rayVersion: 2.35.0 + workerGroupSpecs: + - groupName: small-group-ray + rayStartParams: + block: 'true' + num-gpus: '1' + resources: '"{}"' + replicas: 7 + scaleStrategy: {} + template: + metadata: {} + spec: + containers: + - env: + - name: MY_POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + - name: RAY_USE_TLS + value: '0' + image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26' + imagePullPolicy: Always + lifecycle: + preStop: + exec: + command: + - /bin/sh + - '-c' + - ray stop + name: machine-learning + resources: + limits: + cpu: '16' + memory: 256G + nvidia.com/gpu: '1' + requests: + cpu: '16' + memory: 128G + nvidia.com/gpu: '1' + volumeMounts: + - mountPath: /model + name: model + volumes: + - name: model + persistentVolumeClaim: + claimName: finetuning-pvc +EOF +``` +Now let's use the tool to create the appwrapper: + +```bash +./awpack.py -o ray-aw.yaml -n ray-appwrapper -i ray.yaml +``` + +Now we can submit the job while impersonating Alice + +```bash +kubectl create -f ray-aw.yaml -n blue --as alice +``` + +Now that the Ray cluster is set up, first we need to expose the `ray-head` service, as that is the entrypoint for all job submissions. In another terminal, type: + +```bash +kubectl port-forward svc/ray-head-svc 8265:8265 -n blue --as alice +``` + +Now we can download the git repository with the fine tuning workload. + +```bash +git clone https://github.com/opendatahub-io/distributed-workloads +cd distributed-workloads/examples/ray-finetune-llm-deepspeed +``` + +We also create a Python program that launches the job in the Ray cluster using the Ray API. +Notice that: + +- We set the `--num-devices=8` as it is the total number of accelerators being used by head and workers +- we set the `HF_HOME` to the shared PVC, so the model will be downloaded as a single instance and shared among all executors +- we set `epochs` to just one for a shorter run +- we use localhost as entry point for submitting Ray jobs as we exposed the service earlier. + +```bash +cat << EOF > finetuning.py +import create_dataset +create_dataset.gsm8k_qa_no_tokens_template() + +from ray.job_submission import JobSubmissionClient + +client = JobSubmissionClient("http://127.0.0.1:8265") + +kick_off_pytorch_benchmark = ( + "git clone https://github.com/opendatahub-io/distributed-workloads || true;" + # Run the benchmark. + "python ray_finetune_llm_deepspeed.py" + " --model-name=meta-llama/Meta-Llama-3.1-8B --lora --num-devices=8 --num-epochs=1 --ds-config=./deepspeed_configs/zero_3_offload_optim_param.json --storage-path=/model/ --batch-size-per-device=32 --eval-batch-size-per-device=32" +) + + +submission_id = client.submit_job( + entrypoint=kick_off_pytorch_benchmark, + runtime_env={ + "env_vars": { + 'HF_HOME': "/model/ray_finetune_llm_deepspeed/cache/", + }, + 'pip': 'requirements.txt', + 'working_dir': './', + "excludes": ["/docs/", "*.ipynb", "*.md"] + }, +) + +print("Use the following command to follow this Job's logs:") +print(f"ray job logs '{submission_id}' --address http://127.0.0.1:8265 --follow") +EOF +python finetuning.py +``` +The expected output is like the following: +```bash +2025-03-24 16:37:53,029 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_21ddaa8b13d30deb.zip. +2025-03-24 16:37:53,030 INFO packaging.py:575 -- Creating a file package for local module './'. +Use the following command to follow this Job's logs: +ray job logs 'raysubmit_C6hVCvdhpmapgQB8' --address http://127.0.0.1:8265 --follow +``` + +We can now either follow the logs on the terminal with `ray job logs` command, or open the Ray dashboard and follow from there. To access the Ray dashboard from localhost, as we exposed the service earlier. + +Once the job is completed, the checkpoint with the fine tuned model is saved in the folder +``` +/model/meta-llama/Meta-Llama-3.1-8B/TorchTrainer_/checkpoint_ +```
From 0a7b5beb60020c535879277b259bebb970695f34 Mon Sep 17 00:00:00 2001 From: Claudia Date: Mon, 24 Mar 2025 22:55:03 -0400 Subject: [PATCH 2/2] correction to path --- setup.KubeConEU25/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.KubeConEU25/README.md b/setup.KubeConEU25/README.md index a1da532..f0a6f57 100644 --- a/setup.KubeConEU25/README.md +++ b/setup.KubeConEU25/README.md @@ -919,6 +919,6 @@ We can now either follow the logs on the terminal with `ray job logs` command, o Once the job is completed, the checkpoint with the fine tuned model is saved in the folder ``` -/model/meta-llama/Meta-Llama-3.1-8B/TorchTrainer_/checkpoint_ +/model/meta-llama/Meta-Llama-3.1-8B/TorchTrainer_/TorchTrainer_/checkpoint_ ```