Skip to content

Commit fcc25f3

Browse files
authored
docs & readme: what/why/how (#433)
1 parent 96b20b5 commit fcc25f3

File tree

6 files changed

+226
-155
lines changed

6 files changed

+226
-155
lines changed

LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright 2020-2021 Iterative, Inc.
189+
Copyright Iterative, Inc.
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.

README.md

+95-49
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,127 @@
1-
![Terraform Provider Iterative](https://static.iterative.ai/img/cml/banner-terraform.png)
1+
![TPI](https://static.iterative.ai/img/cml/banner-tpi.svg)
22

3-
# Iterative Provider [![](https://img.shields.io/badge/-documentation-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
3+
# Terraform Provider Iterative (TPI)
44

5-
The Iterative Provider is a Terraform plugin that enables full lifecycle
6-
management of computing resources for machine learning pipelines, including GPUs, from your favorite cloud vendors.
5+
[![docs](https://img.shields.io/badge/-docs-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
6+
[![tests](https://img.shields.io/github/workflow/status/iterative/terraform-provider-iterative/Test?label=tests&logo=GitHub)](https://github.com/iterative/terraform-provider-iterative/actions/workflows/test.yml)
7+
[![Apache-2.0][licence-badge]][licence-file]
78

8-
The Iterative Provider makes it easy to:
9+
TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. Full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.
910

10-
- Rapidly move local machine learning experiments to a cloud infrastructure
11-
- Take advantage of training models on spot instances without losing any progress
12-
- Unify configuration of various cloud compute providers
13-
- Automatically destroy unused cloud resources (compute instances are terminated on job completion/failure, and storage is removed when results are downloaded)
11+
- **Provision Resources**: create cloud compute (CPU, GPU, RAM) & storage resources without reading pages of documentation
12+
- **Sync & Execute**: easily sync & run local data & code in the cloud
13+
- **Low cost**: transparent auto-recovery from interrupted low-cost spot/preemptible instances
14+
- **No waste**: auto-cleanup unused resources (terminate compute instances upon job completion/failure & remove storage upon download of results)
15+
- **No lock-in**: switch between several cloud vendors with ease due to concise unified configuration
1416

15-
The Iterative Provider can provision resources with the following cloud providers and orchestrators:
17+
Supported cloud vendors include:
1618

17-
- Amazon Web Services
19+
- Amazon Web Services (AWS)
1820
- Microsoft Azure
19-
- Google Cloud Platform
20-
- Kubernetes
21+
- Google Cloud Platform (GCP)
22+
- Kubernetes (K8s)
2123

22-
## Documentation
24+
## Usage
2325

24-
See the [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) guide to learn how to use the Iterative Provider. More details on configuring and using the Iterative Provider are in the [documentation](https://registry.terraform.io/providers/iterative/iterative/latest/docs).
26+
### Requirements
2527

26-
## Support
28+
- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.:
29+
- Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`
30+
- Choco (Chocolatey/Windows): `choco install terraform`
31+
- Conda (Anaconda): `conda install -c conda-forge terraform`
32+
- Debian (Ubuntu/Linux):
33+
```
34+
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
35+
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
36+
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
37+
sudo apt-get update && sudo apt-get install terraform
38+
```
39+
- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication)
2740
28-
Have a feature request or found a bug? Let us know via [GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues). Have questions? Join our [community on Discord](https://discord.gg/bzA6uY7); we'll be happy to help you get started!
41+
### Define a Task
2942
30-
## License
43+
In a project root directory, create a file named `main.tf` with the following contents:
3144
32-
Iterative Provider is released under the [Apache 2.0 License](https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE).
45+
```hcl
46+
terraform {
47+
required_providers { iterative = { source = "iterative/iterative" } }
48+
}
49+
provider "iterative" {}
50+
resource "iterative_task" "example" {
51+
cloud = "aws" # or any of: gcp, az, k8s
52+
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ...
53+
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit
54+
disk_size = 30 # GB
55+
56+
storage {
57+
workdir = "."
58+
output = "results"
59+
}
60+
script = <<-END
61+
#!/bin/bash
62+
mkdir results
63+
echo "Hello World!" > results/greeting.txt
64+
END
65+
}
66+
```
3367

34-
## Development
68+
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#argument-reference) for the full list of options for `main.tf` -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type) with and without GPUs.
3569

36-
### Install Go 1.17+
70+
Run this once (in the directory containing `main.tf`) to download the `required_providers`:
3771

38-
Refer to the [official documentation](https://golang.org/doc/install) for specific instructions.
72+
```
73+
terraform init
74+
```
3975

40-
### Clone the repository
76+
### Run Task
4177

42-
```console
43-
git clone https://github.com/iterative/terraform-provider-iterative
44-
cd terraform-provider-iterative
78+
```
79+
terraform apply
4580
```
4681

47-
### Install the provider
82+
This launches a `machine` in the `cloud`, uploads `workdir`, and runs the `script`. Upon completion (or error), the `machine` is terminated.
4883

49-
Build the provider and install the resulting binary to the [local mirror directory](https://www.terraform.io/docs/cli/config/config-file.html#implied-local-mirror-directories):
84+
With spot/preemptible instances (`spot >= 0`), auto-recovery logic and persistent storage will be used to relaunch interrupted tasks.
5085

51-
```console
52-
make install
53-
```
86+
### Query Status
87+
88+
Results and logs are periodically synced to persistent cloud storage. To query this status and view logs:
5489

55-
### Create a test file
90+
```
91+
terraform refresh
92+
terraform show
93+
```
5694

57-
Create a file named `main.tf` in an empty directory with the following contents:
95+
### Stop Tasks
5896

59-
```hcl
60-
terraform {
61-
required_providers { iterative = { source = "iterative/iterative" } }
62-
}
63-
provider "iterative" {}
64-
# ... other resource blocks ...
97+
```
98+
terraform destroy
6599
```
66100

67-
**Note:** to use your local build, specify `source = "github.com/iterative/iterative"` (`source = "iterative/iterative"` will download the latest stable release instead).
101+
This terminates the `machine` (if still running), downloads `output`, and removes the persistent `disk_size` storage.
68102

69-
### Initialize the provider
103+
## Help
70104

71-
Run this command after every `make install` to use the new build:
105+
The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information.
72106

73-
```console
74-
terraform init --upgrade
75-
```
107+
Feature requests and bugs can be [reported via GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues), while general questions and feedback are very welcome on our active [Discord server](https://discord.gg/bzA6uY7).
76108

77-
### Test the provider
109+
## Contributing
78110

79-
```console
80-
terraform apply
81-
```
111+
Instead of using the latest stable release, a local copy of the repository must be used.
112+
113+
1. [Install Go 1.17+](https://golang.org/doc/install)
114+
2. Clone the repository & build the provider
115+
```
116+
git clone https://github.com/iterative/terraform-provider-iterative
117+
cd terraform-provider-iterative
118+
make install
119+
```
120+
3. Use `source = "github.com/iterative/iterative"` in your `main.tf` to use the local repository (`source = "iterative/iterative"` will download the latest release instead), and run `terraform init --upgrade`
121+
122+
## Copyright
123+
124+
This project and all contributions to it are distributed under [![Apache-2.0][licence-badge]][licence-file]
125+
126+
[licence-badge]: https://img.shields.io/badge/licence-Apache%202.0-blue
127+
[licence-file]: https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE

docs/guides/authentication.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
page_title: Authentication
3+
---
4+
5+
# Authentication
6+
7+
Environment variables are the only supported authentication method. They should be present when running any of the `terraform` commands. For example:
8+
9+
```bash
10+
$ export GOOGLE_APPLICATION_CREDENTIALS_DATA="$(cat service_account.json)"
11+
$ terraform apply
12+
```
13+
14+
## Amazon Web Services
15+
16+
- `AWS_ACCESS_KEY_ID` - Access key identifier.
17+
- `AWS_SECRET_ACCESS_KEY` - Secret access key.
18+
- `AWS_SESSION_TOKEN` - (Optional) Session token.
19+
20+
See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) for more information.
21+
22+
## Microsoft Azure
23+
24+
- `AZURE_CLIENT_ID` - Client identifier.
25+
- `AZURE_CLIENT_SECRET` - Client secret.
26+
- `AZURE_SUBSCRIPTION_ID` - Subscription identifier.
27+
- `AZURE_TENANT_ID` - Tenant identifier.
28+
29+
See the [Azure documentation](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential) for more information.
30+
31+
## Google Cloud Platform
32+
33+
- `GOOGLE_APPLICATION_CREDENTIALS` - Path to (or contents of) a service account JSON key file.
34+
35+
See the [GCP documentation](https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account) for more information.
36+
37+
## Kubernetes
38+
39+
Either one of:
40+
41+
- `KUBECONFIG` - Path to a [`kubeconfig` file](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/#the-kubeconfig-environment-variable).
42+
- `KUBECONFIG_DATA` - Alternatively, the **contents** of a `kubeconfig` file.

docs/guides/getting-started.md

+43-34
Original file line numberDiff line numberDiff line change
@@ -4,39 +4,51 @@ page_title: Getting Started
44

55
# Getting Started
66

7-
To use the Iterative Provider you will need to:
8-
9-
- [Install Terraform 1.0](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform) or greater
10-
- Create an account with your preferred cloud compute provider and expose its [authentication credentials via environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs#authentication)
7+
## Requirements
8+
9+
- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.:
10+
- Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`
11+
- Choco (Chocolatey/Windows): `choco install terraform`
12+
- Conda (Anaconda): `conda install -c conda-forge terraform`
13+
- Debian (Ubuntu/Linux):
14+
```
15+
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
16+
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
17+
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
18+
sudo apt-get update && sudo apt-get install terraform
19+
```
20+
- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][authentication]
21+
22+
[authentication]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication
1123
1224
## Defining a Task
1325
14-
In the project root directory:
15-
16-
1. Create a directory named `shared` to store input data and output artefacts.
17-
2. Create a file named `main.tf` with the following contents:
26+
In a project root directory, create a file named `main.tf` with the following contents:
1827
1928
```hcl
2029
terraform {
2130
required_providers { iterative = { source = "iterative/iterative" } }
2231
}
2332
provider "iterative" {}
24-
resource "iterative_task" "task" {
25-
cloud = "aws" # or any of: gcp, az, k8s
26-
machine = "m"
27-
28-
workdir {
29-
input = "${path.root}/shared"
30-
output = "${path.root}/shared"
33+
resource "iterative_task" "example" {
34+
cloud = "aws" # or any of: gcp, az, k8s
35+
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ...
36+
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit
37+
disk_size = 30 # GB
38+
39+
storage {
40+
workdir = "."
41+
output = "results"
3142
}
3243
script = <<-END
3344
#!/bin/bash
34-
echo "Hello World!" > greeting.txt
45+
mkdir results
46+
echo "Hello World!" > results/greeting.txt
3547
END
3648
}
3749
```
3850

39-
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task) for a full list of options -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type).
51+
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#argument-reference) for the full list of options for `main.tf` -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type) with and without GPUs.
4052

4153
-> **Note:** The `script` argument must begin with a valid [shebang](<https://en.wikipedia.org/wiki/Shebang_(Unix)>), and can take the form of a [heredoc string](https://www.terraform.io/docs/language/expressions/strings.html#heredoc-strings) or [a `file()` function](https://www.terraform.io/docs/language/functions/file.html) function (e.g. `file("task_run.sh")`).
4254

@@ -45,24 +57,21 @@ The project layout should look similar to this:
4557
```
4658
project/
4759
├── main.tf
48-
└── shared/
49-
└── ...
60+
└── results/
61+
└── greeting.txt (created in the cloud and downloaded locally)
5062
```
5163

52-
## Initializing Terraform
64+
## Initialise Terraform
5365

5466
```console
5567
$ terraform init
5668
```
5769

58-
This command will:
59-
60-
1. Download and install the Iterative Provider.
61-
2. Initialize Terraform in the current directory.
70+
This command will check `main.tf` and download the required TPI plugin.
6271

63-
~> **Note:** None of the subsequent commands will work without first setting some [authentication environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs#authentication).
72+
~> **Warning:** None of the subsequent commands will work without first setting some [authentication environment variables][authentication].
6473

65-
## Launching Tasks
74+
## Run Task
6675

6776
```console
6877
$ terraform apply
@@ -71,31 +80,31 @@ $ terraform apply
7180
This command will:
7281

7382
1. Create all the required cloud resources.
74-
2. Upload the specified shared `input` working directory to the cloud.
83+
2. Upload the working directory (`workdir`) to the cloud.
7584
3. Launch the task `script`.
7685

77-
## Viewing Task Statuses
86+
With spot/preemptible instances (`spot >= 0`), auto-recovery logic and persistent storage will be used to relaunch interrupted tasks.
87+
88+
## Query Status
7889

7990
```console
8091
$ terraform refresh && terraform show
8192
```
8293

83-
This command will:
94+
These commands will:
8495

8596
1. Query the task status from the cloud.
8697
2. Display the task status.
8798

88-
## Deleting Tasks
99+
## Stop Task
89100

90101
```console
91102
$ terraform destroy
92103
```
93104

94105
This command will:
95106

96-
1. Download the specified shared working directory from the cloud.
107+
1. Download the `output` directory from the cloud.
97108
2. Delete all the cloud resources created by `terraform apply`.
98109

99-
## Viewing Task Results
100-
101-
After running `terraform destroy`, the `shared` directory should contain a file named `greeting.txt` with the text `Hello, World!`
110+
In this example, after running `terraform destroy`, the `results` directory should contain a file named `greeting.txt` with the text `Hello, World!`

0 commit comments

Comments
 (0)