Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Bitbucket GPU examples #410

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions content/docs/ref/runner.md
Original file line number Diff line number Diff line change
@@ -78,6 +78,15 @@ Any [generic option](/doc/ref) in addition to:
need to write your code to save intermediate results to take advantage of
this).

### Bitbucket

- **GPU support**.

See
[the guide on self-hosted Bitbucket runners](/doc/self-hosted-runners?tab=Bitbucket-GPU)
to work around
[Bitbucket's lack of native GPU support](https://jira.atlassian.com/browse/BCLOUD-21459).

## Examples

### Using `--cloud-permission-set`
55 changes: 54 additions & 1 deletion content/docs/self-hosted-runners.md
Original file line number Diff line number Diff line change
@@ -117,6 +117,8 @@ train-and-report:

</tab>
<tab title="Bitbucket">
<toggle>
<tab title="No GPU">

```yaml
pipelines:
@@ -134,7 +136,6 @@ pipelines:
- step:
runs-on: [self.hosted, cml.runner]
image: iterativeai/cml:0-dvc2-base1
# GPU not yet supported, see https://github.com/iterative/cml/issues/1015
script:
- pip install -r requirements.txt
- python train.py # generate plot.png
@@ -144,6 +145,58 @@ pipelines:
- cml comment create report.md
```

</tab>
<tab title="GPU">

Bitbucket does not support GPUs natively
([cml#1015](https://github.com/iterative/cml/issues/1015),
[BCLOUD-21459](https://jira.atlassian.com/browse/BCLOUD-21459)). A work-around
is to directly use
[TPI](https://github.com/iterative/terraform-provider-iterative) (the library
which CML `runner` uses internally). TPI includes a CLI-friendly helper called
LEO (launch, execute, orchestrate), used below:
Comment on lines +152 to +157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that this should be considered experimental until we decide what a Leo release looks like?


```yaml
image: iterativeai/cml:0-dvc2-base1
pipelines:
default:
- step:
name: Launch Runner and Train
script:
# Create training script
- |
cat <<EOF > leo-script.sh
#!/bin/bash
apt-get update -q && apt-get install -yq python3.9
pip3 install -r requirements.txt
python train.py # generate plot.png
EOF
# Launch runner
- |
LEO_OPTIONS="--cloud=aws --region=us-west"
leo_id=$(leo create $LEO_OPTIONS \
--image=nvidia
--machine=p2.xlarge \
--disk-size=64 \
--workdir=. \
--output=. \
--environment AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
--environment AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
--script="$(cat ./leo-script.sh)"
)
# Wait for cloud training to finish
leo read $LEO_OPTIONS --follow "$leo_id"
sleep 45 # TODO: explain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/CC @dacbd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to have a grace time to allow for the workdir to finish its last syncing to the cloud. In practice, I think it's probably best to have something here. For small tasks it's certinly not required.

# Download cloud training results & clean up cloud resources
leo delete $LEO_OPTIONS --workdir=. --output=. "$leo_id"
# Create CML report
- cat metrics.txt >> report.md
- echo '![](./plot.png "Confusion Matrix")' >> report.md
- cml comment create report.md
```

</tab>
</toggle>
</tab>
</toggle>

88 changes: 48 additions & 40 deletions src/components/pages/Home/UseCasesSection/index.tsx
Original file line number Diff line number Diff line change
@@ -672,54 +672,62 @@ const UseCasesSection: React.ForwardRefRenderFunction<HTMLElement> = () => (
)}
bitbucket={(
<Collapser>
<Code filename="bitbucket-pipelines.yml" repo="https://github.com/iterative/cml/issues/1015">
<Tooltip type="dependencies">
<div><span># GPU support coming soon, see https://github.com/iterative/cml/issues/1015</span></div>
</Tooltip>
<Code filename="bitbucket-pipelines.yml" repo="https://bitbucket.org/iterative-ai/cml-cloud-case">
<div><span># Use LEO instead of CML to force GPU support on Bitbucket</span></div>
<div><span># (<a href="/doc/ref/runner#bitbucket">https://cml.dev/doc/ref/runner#bitbucket</a>)</span></div>
Copy link
Contributor Author

@casperdcl casperdcl Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link colours don't render well, maybe better to remove?

Suggested change
<div><span># (<a href="/doc/ref/runner#bitbucket">https://cml.dev/doc/ref/runner#bitbucket</a>)</span></div>

<div><span>image: iterativeai/cml:0-dvc2-base1</span></div>
<div><span>pipelines:</span></div>
<div><span> default:</span></div>
<div><span> - step:</span></div>
<div><span> name: deploy-runner</span></div>
<div><span> image: iterativeai/cml:0-dvc2-base1</span></div>
<div><span> script:</span></div>
<div><span> - |</span></div>
<Tooltip type="runner">
<div><span> cml runner \</span></div>
<div><span> --cloud=aws \</span></div>
<div><span> --cloud-region=us-west \</span></div>
<div><span> --cloud-type=m5.2xlarge \</span></div>
<div><span> --cloud-spot \</span></div>
<div><span> --labels=cml.runner</span></div>
</Tooltip>
<div><span> - step:</span></div>
<div><span> name: run</span></div>
<Tooltip type="runner">
<div><span> runs-on: [self.hosted, cml.runner]</span></div>
<div> <span>default:</span></div>
<div> <span>- step:</span></div>
<div> <span>name: Launch Runner and Train</span></div>
<div> <span>script:</span></div>
<div> <span>- |</span></div>
<div> <span>cat &lt;&lt;EOF &gt; leo-script.sh</span></div>
<div> <span>#!/bin/bash</span></div>
<div> <span>apt-get update -q && apt-get install -yq python3.9</span></div>
<Tooltip type="dvc">
<div> <span>dvc pull data</span></div>
</Tooltip>
<div><span> image: iterativeai/cml:0-dvc2-base1</span></div>
<div><span> script:</span></div>
<Tooltip type="dependencies">
<div><span> - apt-get update -y</span></div>
<div><span> - apt install imagemagick -y</span></div>
<div><span> - pip install -r requirements.txt</span></div>
<div> <span>pip3 install -r requirements.txt</span></div>
<div> <span>dvc repro</span></div>
</Tooltip>
<div> <span>EOF</span></div>
<Tooltip type="runner">
<div> <span>- |</span></div>
<div> <span>LEO_OPTIONS=&quot;--cloud=aws --region=us-west&quot;</span></div>
<div> <span>leo_id=$(leo create $LEO_OPTIONS \</span></div>
<div> <span>--image=&quot;nvidia&quot;</span></div>
<div> <span>--machine=&quot;p2.xlarge&quot; \</span></div>
<div> <span>--disk-size=64 \</span></div>
<div> <span>--workdir=&quot;.&quot; \</span></div>
<div> <span>--output=&quot;.&quot; \</span></div>
<div> <span>--environment AWS_ACCESS_KEY_ID=&quot;$AWS_ACCESS_KEY_ID&quot; \</span></div>
<div> <span>--environment AWS_SECRET_ACCESS_KEY=&quot;$AWS_SECRET_ACCESS_KEY&quot; \</span></div>
<div> <span>--script=&quot;$(cat ./leo-script.sh)&quot;</span></div>
<div> <span>)</span></div>
<div> <span>leo read $LEO_OPTIONS --follow &quot;$leo_id&quot;</span></div>
<div> <span>sleep 45 # TODO: explain</span></div>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/CC @dacbd

<div> <span>leo delete $LEO_OPTIONS --workdir=&quot;.&quot; --output=&quot;.&quot; \</span></div>
<div> <span>&quot;$leo_id&quot;</span></div>
</Tooltip>
<div><span> - git fetch --prune</span></div>
<div><span> - dvc repro</span></div>
<Tooltip type="reports">
<div><span> - echo &quot;# Style transfer&quot; &gt;&gt; report.md</span></div>
<div><span> - git show origin/master:final_owl.png &gt; master_owl.png</span></div>
<div><span> - convert +append final_owl.png master_owl.png out.png</span></div>
<div><span> - convert out.png -resize 75% out_shrink.png</span></div>
<div><span> - echo &quot;### Workspace vs. Main&quot; &gt;&gt; report.md</span></div>
<div><span> - cml publish out_shrink.png --md --title &#x27;compare&#x27; &gt;&gt; report.md</span></div>
<div><span> - echo &quot;## Training metrics&quot; &gt;&gt; report.md</span></div>
<div><span> - dvc params diff master --show-md &gt;&gt; report.md</span></div>
<div><span> - echo &gt;&gt; report.md</span></div>
<div><span> - cml send-comment report.md</span></div>
<div> <span>- git show origin/main:image.png &gt; image-main.png</span></div>
<div> <span>- |</span></div>
<div> <span>cat &lt;&lt;EOF &gt; report.md</span></div>
<div> <span># Style transfer</span></div>
<div> <span>## Workspace vs. Main</span></div>
<div> <span>![](./image.png &quot;Workspace&quot;) ![](./image-main.png &quot;Main&quot;)</span></div>
<div> <span>## Training metrics</span></div>
<div> <span>$(dvc params diff main --show-md)</span></div>
<div> <span>## GPU info</span></div>
<div> <span>$(cat gpu_info.txt)</span></div>
<div> <span>EOF</span></div>
<div> <span>- cml comment create report.md</span></div>
</Tooltip>
</Code>
<ExampleBox title="CML Report">
<a target="_blank" rel="noreferrer" href="https://github.com/iterative/cml/issues/1015">
<a target="_blank" rel="noreferrer" href="https://bitbucket.org/iterative-ai/cml-cloud-case/pull-requests/1">
<Image src="/img/bitbucket/cloud-report.png" alt="Bitbucket Cloud report example" />
</a>
</ExampleBox>