docs: iteration 3 #492

casperdcl · 2022-04-11T11:41:37Z

~~update README USPs~~ split into features & USPs
- sync with docs/index.md
update description
add high-level diagram
more info on cloud account creation
update example scripts (make spot recovery use case more clear)
fix cross-references/pluralisation
add low-level diagram

high-level

inspired by

low-level

flowchart LR
subgraph tpi [what TPI manages]
direction LR
    subgraph you [what you manage]
        direction LR
        A([Personal Computer])
    end
    B[("Cloud Storage (low cost)")]
    C{{"Cloud instance scaler (zero cost)"}}
    D[["Cloud (spot) Instance"]]
    A ---> |create cloud storage| B
    A --> |create cloud instance scaler| C
    A ==> |upload script & workdir| B
    A -.-> |"offline (lunch break)"| A
    C -.-> |"(re)provision instance"| D
    D ==> |run script| D
    B <-.-> |persistent workdir cache| D
    D ==> |script end,\nshutdown instance| B
    D -.-> |outage| C
    B ==> |download output| A
end
style you fill:#FFFFFF00,stroke:#13ADC7
style tpi fill:#FFFFFF00,stroke:#FFFFFF00,stroke-width:0px
style A fill:#13ADC7,stroke:#333333,color:#000000
style B fill:#945DD5,stroke:#333333,color:#000000
style D fill:#F46737,stroke:#333333,color:#000000
style C fill:#7B61FF,stroke:#333333,color:#000000

README.md

jendefig · 2022-04-11T16:53:29Z

Question: Where in the Readme will the new banner image go? I see where the low-level diagram goes, but not the banner. It's not replacing the Terraform/Iterative banner, right?

casperdcl · 2022-04-11T17:08:39Z

Where in the Readme will the new [high-level] banner image go?

I was thinking somewhere mid-way. Potentially just below the USP bullet list.

README.md

dmpetrov

It feels like we are lacking the unique competitive advantage or Why TPI?. I there a way to introduce it?

README.md

Co-authored-by: DavidGOrtega <[email protected]>

README.md

docs/guides/authentication.md

README.md

casperdcl

potential 3rd point (IaC/HaC)

casperdcl · 2022-04-14T22:04:56Z

README.md

+   TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups[^scalers], taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.
+2. **Unified tool for data science and software development teams**:
+   TPI provides consistent tooling for both data scientists and DevOps engineers, improving cross-team collaboration. This simplifies compute management to a single config file, and reduces time to deliver ML models into production.
+


Suggested change

3. **Reproducible, codified environments**: Store hardware requirements & pipelines in a single configuration file with the rest of your ML project code.

casperdcl · 2022-04-14T22:05:23Z

docs/index.md

+   TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups ([AWS Auto Scaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html), [Azure VM Scale Sets](https://azure.microsoft.com/en-us/services/virtual-machine-scale-sets), [GCP managed instance groups](https://cloud.google.com/compute/docs/instance-groups#managed_instance_groups), and [Kubernetes Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job)), taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.
+2. **Unified tool for data science and software development teams**:
+   TPI provides consistent tooling for both data scientists and DevOps engineers, improving cross-team collaboration. This simplifies compute management to a single config file, and reduces time to deliver ML models into production.
+


Suggested change

3. **Reproducible, codified environments**: Store hardware requirements & pipelines in a single configuration file with the rest of your ML project code.

dmpetrov

Looks greet!

A couple of minor changes:

It might be better to have a link to CML repository and cml.dev, not a doc
It was a good idea about adding 3rd item to the list like "Reproducible, codified environments" or " Extend your GitOps and CI/CD-oriented workflows" or "hardware as code” for AI/ML" or multiple of these.

Let me slide this typo I found in with the docs pr

0x2b3bfa0 · 2022-04-15T22:42:22Z

🔔 @dmpetrov & @iterative/cml, we still have some unresolved conversations that are unlikely to have a noticeable influence in the result:

Can we merge this in the current state and address them in a separate pull request?

0x2b3bfa0 · 2022-04-15T23:24:18Z

Merging as per this conversation

0x2b3bfa0 · 2022-04-16T01:11:58Z

README.md

+1. **Reduced management overhead and infrastructure cost**:
+   TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups[^scalers], taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.
+2. **Unified tool for data science and software development teams**:
+   TPI provides consistent tooling for both data scientists and DevOps engineers, improving cross-team collaboration. This simplifies compute management to a single config file, and reduces time to deliver ML models into production.


and reduces time to deliver ML models into production

Is this a direct effect of using TPI for development? Given that this tool is not intended for model serving, that assertion migh be slightly misleading.

jorgeorpinel · 2022-04-27T03:22:37Z

README.md

+- **No cloud vendor lock-in**: switch between clouds with just one line thanks to unified abstraction
+- **No waste**: auto-cleanup unused resources (terminate compute instances upon task completion/failure & remove storage upon download of results), pay only for what you use


💅🏼 missing . periods in these ?

jorgeorpinel · 2022-04-27T03:23:45Z

README.md


-Supported cloud vendors include:
+Supported cloud vendors [include][auth]:


💅🏼 I'd link the other 3 words instead.

jorgeorpinel · 2022-04-27T03:25:12Z

README.md

+[gcp]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication#google-cloud-platform
+[k8s-badge]: https://img.shields.io/badge/K8s-Kubernetes-black?colorA=white&logoColor=326CE5&logo=kubernetes
+[k8s]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication#kubernetes
+[auth]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication


💅🏼 💅🏼 💅🏼 Shouldn't it be at the beginning of the link list? 😋

jorgeorpinel · 2022-04-27T03:26:55Z

README.md

+![](https://github.com/iterative/static/raw/main/img/tpi/high-level-light.png#gh-light-mode-only)
+![](https://github.com/iterative/static/raw/main/img/tpi/high-level-dark.png#gh-dark-mode-only)


#gh-(light|dark)-mode-only

Woah how does that work? Just curious

jorgeorpinel · 2022-04-27T03:33:01Z

README.md

+## What's Special
+
+There are a several reasons to use TPI instead of other related solutions (custom scripts and/or cloud orchestrators):
+
+1. **Reduced management overhead and infrastructure cost**:
+   TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups[^scalers], taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.


When we number the list it seems like we mean that it is an exhaustive list. Is it? Otherwise maybe use bullets.

But also it somewhat overlaps with the previous bullet list. Maybe they can be combined? So that the reader can get to the Usage right after the vendors figure

jorgeorpinel · 2022-04-27T03:34:20Z

README.md

@@ -92,14 +126,52 @@ TF_LOG_PROVIDER=INFO terraform refresh
 TF_LOG_PROVIDER=INFO terraform show
 ```

-### Stop Tasks
+### Stop Task


End/Delete a Task?
" the Task?

jorgeorpinel · 2022-04-27T03:39:12Z

README.md

+## Future Plans
+
+TPI is a CLI tool bringing the power of bare-metal cloud to a bare-metal local laptop. We're working on more featureful and visual interfaces. We'd also like to have more native support for distributed (multi-instance) training, more data sync optimisations & options, and tighter ecosystem integration with tools such as [DVC](https://dvc.org).
+
 ## Help

 The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information. In case of errors, extra debugging information is available using `TF_LOG_PROVIDER=DEBUG` instead of `INFO`.


💅🏼

Suggested change

The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information. In case of errors, extra debugging information is available using `TF_LOG_PROVIDER=DEBUG` instead of `INFO`.

The [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) guide has some more information. In case of errors, extra debugging information is available using `TF_LOG_PROVIDER=DEBUG` instead of `INFO`.

jorgeorpinel · 2022-04-27T03:42:54Z

docs/guides/getting-started.md

- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][authentication]
+- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][auth]

-[authentication]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication
+[auth]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication


BTW I think the nav would be more logical with Getting Started on top, since it covers installation and then it links to the auth section.

casperdcl added 6 commits April 10, 2022 20:36

more info on cloud account creation

af6445f

update USPs

01cc68b

update example scripts

b5cefe6

low-level diagram

91e57c4

bi-directional cache

739ffea

minor shading

1df2ec1

casperdcl self-assigned this Apr 11, 2022

casperdcl changed the title ~~docs: diagrams~~ docs: iteration 3 Apr 11, 2022

casperdcl requested review from dmpetrov and a team April 11, 2022 11:42

casperdcl added documentation Markdown files resource-task iterative_task TF resource labels Apr 11, 2022

DavidGOrtega suggested changes Apr 11, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

DavidGOrtega suggested changes Apr 11, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

casperdcl commented Apr 11, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

udpate diagram styles

eb8e209

casperdcl commented Apr 11, 2022

View reviewed changes

README.md Show resolved Hide resolved

casperdcl added 2 commits April 11, 2022 18:45

fix cross-refs

215d1c1

fix epochs

5db25b0

dmpetrov requested changes Apr 12, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

casperdcl and others added 2 commits April 12, 2022 16:15

readme: how it works

ad941c4

Apply suggestions from code review

1e2434c

Co-authored-by: DavidGOrtega <[email protected]>

dacbd reviewed Apr 12, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

docs/guides/authentication.md Outdated Show resolved Hide resolved

casperdcl added 3 commits April 13, 2022 14:11

split features & USPs

64c45ec

fix example

4aeffa8

add high-level diagram

2a30b67

casperdcl commented Apr 13, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

minify list

ad2c377

casperdcl temporarily deployed to automatic April 14, 2022 21:45 Inactive

@jurv11 copyedits

54ab9d1

casperdcl commented Apr 14, 2022

View reviewed changes

casperdcl temporarily deployed to automatic April 14, 2022 22:08 Inactive

casperdcl temporarily deployed to automatic April 14, 2022 22:09 Inactive

casperdcl had a problem deploying to automatic April 14, 2022 22:09 Failure

casperdcl temporarily deployed to automatic April 14, 2022 22:09 Inactive

dmpetrov approved these changes Apr 15, 2022

View reviewed changes

logging typo (#512)

a1f1c8c

Let me slide this typo I found in with the docs pr

0x2b3bfa0 temporarily deployed to automatic April 15, 2022 22:05 Inactive

0x2b3bfa0 temporarily deployed to automatic April 15, 2022 22:06 Inactive

0x2b3bfa0 had a problem deploying to automatic April 15, 2022 22:06 Failure

0x2b3bfa0 temporarily deployed to automatic April 15, 2022 22:06 Inactive

0x2b3bfa0 merged commit d24f99c into master Apr 15, 2022

0x2b3bfa0 deleted the docs-iter branch April 15, 2022 23:24

0x2b3bfa0 mentioned this pull request Apr 16, 2022

docs: iteration 4 #513

Closed

6 tasks

0x2b3bfa0 reviewed Apr 16, 2022

View reviewed changes

casperdcl mentioned this pull request Apr 19, 2022

docs: task requirements 2 #363

Open

21 tasks

jorgeorpinel reviewed Apr 27, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: iteration 3 #492

docs: iteration 3 #492

casperdcl commented Apr 11, 2022 •

edited

Loading

jendefig commented Apr 11, 2022

casperdcl commented Apr 11, 2022

dmpetrov left a comment

casperdcl left a comment •

edited

Loading

casperdcl Apr 14, 2022

casperdcl Apr 14, 2022

dmpetrov left a comment

0x2b3bfa0 commented Apr 15, 2022 •

edited

Loading

0x2b3bfa0 commented Apr 15, 2022

0x2b3bfa0 Apr 16, 2022

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022

jorgeorpinel Apr 27, 2022

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022

jorgeorpinel Apr 27, 2022


	3. Reproducible, codified environments: Store hardware requirements & pipelines in a single configuration file with the rest of your ML project code.

		- No cloud vendor lock-in: switch between clouds with just one line thanks to unified abstraction
		- No waste: auto-cleanup unused resources (terminate compute instances upon task completion/failure & remove storage upon download of results), pay only for what you use


		Supported cloud vendors include:
		Supported cloud vendors [include][auth]:

		![](https://github.com/iterative/static/raw/main/img/tpi/high-level-light.png#gh-light-mode-only)
		![](https://github.com/iterative/static/raw/main/img/tpi/high-level-dark.png#gh-dark-mode-only)

	The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information. In case of errors, extra debugging information is available using `TF_LOG_PROVIDER=DEBUG` instead of `INFO`.
	The [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) guide has some more information. In case of errors, extra debugging information is available using `TF_LOG_PROVIDER=DEBUG` instead of `INFO`.

docs: iteration 3 #492

docs: iteration 3 #492

Conversation

casperdcl commented Apr 11, 2022 • edited Loading

high-level

low-level

jendefig commented Apr 11, 2022

casperdcl commented Apr 11, 2022

dmpetrov left a comment

Choose a reason for hiding this comment

casperdcl left a comment • edited Loading

Choose a reason for hiding this comment

casperdcl Apr 14, 2022

Choose a reason for hiding this comment

casperdcl Apr 14, 2022

Choose a reason for hiding this comment

dmpetrov left a comment

Choose a reason for hiding this comment

0x2b3bfa0 commented Apr 15, 2022 • edited Loading

0x2b3bfa0 commented Apr 15, 2022

0x2b3bfa0 Apr 16, 2022

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022

Choose a reason for hiding this comment

jorgeorpinel Apr 27, 2022

Choose a reason for hiding this comment

casperdcl commented Apr 11, 2022 •

edited

Loading

casperdcl left a comment •

edited

Loading

0x2b3bfa0 commented Apr 15, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading

jorgeorpinel Apr 27, 2022 •

edited

Loading