-
Notifications
You must be signed in to change notification settings - Fork 23
Bitbucket GPU examples #410
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -117,6 +117,8 @@ train-and-report: | |
|
||
</tab> | ||
<tab title="Bitbucket"> | ||
<toggle> | ||
<tab title="No GPU"> | ||
|
||
```yaml | ||
pipelines: | ||
|
@@ -134,7 +136,6 @@ pipelines: | |
- step: | ||
runs-on: [self.hosted, cml.runner] | ||
image: iterativeai/cml:0-dvc2-base1 | ||
# GPU not yet supported, see https://github.com/iterative/cml/issues/1015 | ||
script: | ||
- pip install -r requirements.txt | ||
- python train.py # generate plot.png | ||
|
@@ -144,6 +145,58 @@ pipelines: | |
- cml comment create report.md | ||
``` | ||
|
||
</tab> | ||
<tab title="GPU"> | ||
|
||
Bitbucket does not support GPUs natively | ||
([cml#1015](https://github.com/iterative/cml/issues/1015), | ||
[BCLOUD-21459](https://jira.atlassian.com/browse/BCLOUD-21459)). A work-around | ||
is to directly use | ||
[TPI](https://github.com/iterative/terraform-provider-iterative) (the library | ||
which CML `runner` uses internally). TPI includes a CLI-friendly helper called | ||
LEO (launch, execute, orchestrate), used below: | ||
|
||
```yaml | ||
image: iterativeai/cml:0-dvc2-base1 | ||
pipelines: | ||
default: | ||
- step: | ||
name: Launch Runner and Train | ||
script: | ||
# Create training script | ||
- | | ||
cat <<EOF > leo-script.sh | ||
#!/bin/bash | ||
apt-get update -q && apt-get install -yq python3.9 | ||
pip3 install -r requirements.txt | ||
python train.py # generate plot.png | ||
EOF | ||
# Launch runner | ||
- | | ||
LEO_OPTIONS="--cloud=aws --region=us-west" | ||
leo_id=$(leo create $LEO_OPTIONS \ | ||
--image=nvidia | ||
--machine=p2.xlarge \ | ||
--disk-size=64 \ | ||
--workdir=. \ | ||
--output=. \ | ||
--environment AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \ | ||
--environment AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \ | ||
--script="$(cat ./leo-script.sh)" | ||
) | ||
# Wait for cloud training to finish | ||
leo read $LEO_OPTIONS --follow "$leo_id" | ||
sleep 45 # TODO: explain | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /CC @dacbd There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Best to have a grace time to allow for the workdir to finish its last syncing to the cloud. In practice, I think it's probably best to have something here. For small tasks it's certinly not required. |
||
# Download cloud training results & clean up cloud resources | ||
leo delete $LEO_OPTIONS --workdir=. --output=. "$leo_id" | ||
# Create CML report | ||
- cat metrics.txt >> report.md | ||
- echo '' >> report.md | ||
- cml comment create report.md | ||
``` | ||
|
||
</tab> | ||
</toggle> | ||
</tab> | ||
</toggle> | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -672,54 +672,62 @@ const UseCasesSection: React.ForwardRefRenderFunction<HTMLElement> = () => ( | |||
)} | ||||
bitbucket={( | ||||
<Collapser> | ||||
<Code filename="bitbucket-pipelines.yml" repo="https://github.com/iterative/cml/issues/1015"> | ||||
<Tooltip type="dependencies"> | ||||
<div><span># GPU support coming soon, see https://github.com/iterative/cml/issues/1015</span></div> | ||||
</Tooltip> | ||||
<Code filename="bitbucket-pipelines.yml" repo="https://bitbucket.org/iterative-ai/cml-cloud-case"> | ||||
<div><span># Use LEO instead of CML to force GPU support on Bitbucket</span></div> | ||||
<div><span># (<a href="/doc/ref/runner#bitbucket">https://cml.dev/doc/ref/runner#bitbucket</a>)</span></div> | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link colours don't render well, maybe better to remove?
Suggested change
|
||||
<div><span>image: iterativeai/cml:0-dvc2-base1</span></div> | ||||
<div><span>pipelines:</span></div> | ||||
<div><span> default:</span></div> | ||||
<div><span> - step:</span></div> | ||||
<div><span> name: deploy-runner</span></div> | ||||
<div><span> image: iterativeai/cml:0-dvc2-base1</span></div> | ||||
<div><span> script:</span></div> | ||||
<div><span> - |</span></div> | ||||
<Tooltip type="runner"> | ||||
<div><span> cml runner \</span></div> | ||||
<div><span> --cloud=aws \</span></div> | ||||
<div><span> --cloud-region=us-west \</span></div> | ||||
<div><span> --cloud-type=m5.2xlarge \</span></div> | ||||
<div><span> --cloud-spot \</span></div> | ||||
<div><span> --labels=cml.runner</span></div> | ||||
</Tooltip> | ||||
<div><span> - step:</span></div> | ||||
<div><span> name: run</span></div> | ||||
<Tooltip type="runner"> | ||||
<div><span> runs-on: [self.hosted, cml.runner]</span></div> | ||||
<div> <span>default:</span></div> | ||||
<div> <span>- step:</span></div> | ||||
<div> <span>name: Launch Runner and Train</span></div> | ||||
<div> <span>script:</span></div> | ||||
<div> <span>- |</span></div> | ||||
<div> <span>cat <<EOF > leo-script.sh</span></div> | ||||
<div> <span>#!/bin/bash</span></div> | ||||
<div> <span>apt-get update -q && apt-get install -yq python3.9</span></div> | ||||
<Tooltip type="dvc"> | ||||
<div> <span>dvc pull data</span></div> | ||||
</Tooltip> | ||||
<div><span> image: iterativeai/cml:0-dvc2-base1</span></div> | ||||
<div><span> script:</span></div> | ||||
<Tooltip type="dependencies"> | ||||
<div><span> - apt-get update -y</span></div> | ||||
<div><span> - apt install imagemagick -y</span></div> | ||||
<div><span> - pip install -r requirements.txt</span></div> | ||||
<div> <span>pip3 install -r requirements.txt</span></div> | ||||
<div> <span>dvc repro</span></div> | ||||
</Tooltip> | ||||
<div> <span>EOF</span></div> | ||||
<Tooltip type="runner"> | ||||
<div> <span>- |</span></div> | ||||
<div> <span>LEO_OPTIONS="--cloud=aws --region=us-west"</span></div> | ||||
<div> <span>leo_id=$(leo create $LEO_OPTIONS \</span></div> | ||||
<div> <span>--image="nvidia"</span></div> | ||||
<div> <span>--machine="p2.xlarge" \</span></div> | ||||
<div> <span>--disk-size=64 \</span></div> | ||||
<div> <span>--workdir="." \</span></div> | ||||
<div> <span>--output="." \</span></div> | ||||
<div> <span>--environment AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \</span></div> | ||||
<div> <span>--environment AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \</span></div> | ||||
<div> <span>--script="$(cat ./leo-script.sh)"</span></div> | ||||
<div> <span>)</span></div> | ||||
<div> <span>leo read $LEO_OPTIONS --follow "$leo_id"</span></div> | ||||
<div> <span>sleep 45 # TODO: explain</span></div> | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /CC @dacbd |
||||
<div> <span>leo delete $LEO_OPTIONS --workdir="." --output="." \</span></div> | ||||
<div> <span>"$leo_id"</span></div> | ||||
</Tooltip> | ||||
<div><span> - git fetch --prune</span></div> | ||||
<div><span> - dvc repro</span></div> | ||||
<Tooltip type="reports"> | ||||
<div><span> - echo "# Style transfer" >> report.md</span></div> | ||||
<div><span> - git show origin/master:final_owl.png > master_owl.png</span></div> | ||||
<div><span> - convert +append final_owl.png master_owl.png out.png</span></div> | ||||
<div><span> - convert out.png -resize 75% out_shrink.png</span></div> | ||||
<div><span> - echo "### Workspace vs. Main" >> report.md</span></div> | ||||
<div><span> - cml publish out_shrink.png --md --title 'compare' >> report.md</span></div> | ||||
<div><span> - echo "## Training metrics" >> report.md</span></div> | ||||
<div><span> - dvc params diff master --show-md >> report.md</span></div> | ||||
<div><span> - echo >> report.md</span></div> | ||||
<div><span> - cml send-comment report.md</span></div> | ||||
<div> <span>- git show origin/main:image.png > image-main.png</span></div> | ||||
<div> <span>- |</span></div> | ||||
<div> <span>cat <<EOF > report.md</span></div> | ||||
<div> <span># Style transfer</span></div> | ||||
<div> <span>## Workspace vs. Main</span></div> | ||||
<div> <span> </span></div> | ||||
<div> <span>## Training metrics</span></div> | ||||
<div> <span>$(dvc params diff main --show-md)</span></div> | ||||
<div> <span>## GPU info</span></div> | ||||
<div> <span>$(cat gpu_info.txt)</span></div> | ||||
<div> <span>EOF</span></div> | ||||
<div> <span>- cml comment create report.md</span></div> | ||||
</Tooltip> | ||||
</Code> | ||||
<ExampleBox title="CML Report"> | ||||
<a target="_blank" rel="noreferrer" href="https://github.com/iterative/cml/issues/1015"> | ||||
<a target="_blank" rel="noreferrer" href="https://bitbucket.org/iterative-ai/cml-cloud-case/pull-requests/1"> | ||||
<Image src="/img/bitbucket/cloud-report.png" alt="Bitbucket Cloud report example" /> | ||||
</a> | ||||
</ExampleBox> | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting that this should be considered experimental until we decide what a Leo release looks like?