Skip to content

task CML 🍬 Tear down by grepping #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #391
DavidGOrtega opened this issue Feb 9, 2022 · 12 comments
Closed
Tracked by #391

task CML 🍬 Tear down by grepping #389

DavidGOrtega opened this issue Feb 9, 2022 · 12 comments
Assignees
Labels
documentation Markdown files logs p1-important High priority resource-task iterative_task TF resource ui/ux User interface/experience

Comments

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Feb 9, 2022

From users feedback

tear down the created resources one relies on a brittle condition based on grepping some text in the logs

Its far from ideal, specially having status not totally working or not working as expected #388

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Feb 11, 2022

This probably belongs to a separate epic/milestone titled “using task from CI/CD systems”; not saying it's not equally important, though.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Feb 11, 2022

While it's still far from ideal, have you tried using terraform console or terraform show and jq instead?

Example

resource "iterative_task" "example" {
  ...
}

terraform console

terraform console <<< 'iterative_task.example.status["succeeded"]'

terraform show --json and jq

terraform show --json | jq --exit-status '
  .values.root_module.resources[] |
  select(.address == "iterative_task.example") |
  .values.status.succeeded
'

@DavidGOrtega
Copy link
Contributor Author

I tried jq also but is not very friendly. However terraform console looks very interesting

@DavidGOrtega DavidGOrtega self-assigned this Feb 17, 2022
@DavidGOrtega DavidGOrtega removed the awaiting-response Waiting for user feedback label Feb 17, 2022
@casperdcl
Copy link
Contributor

Both console and jq would solve the "brittle" problem. Should be solved by some docs?

@casperdcl casperdcl added documentation Markdown files logs resource-task iterative_task TF resource ui/ux User interface/experience labels Feb 17, 2022
@DavidGOrtega
Copy link
Contributor Author

@iterative/cml definitely it's worth to use console and we could this just convert it into a doc issue.
Tips:

  • note the shell: bash due to the redirection
  • resource "iterative_task" "train resource name is the one used in the check if terraform console <<< 'iterative_task.train.status["succeeded"]'; then

Here is a proven workflow. In my workflow the output models folder and report.md is created with the train.py using cml to do a live metrics alike

name: train-my-model

on: [push]

jobs:
  train-tpi:
    runs-on: [ubuntu-latest]

    steps:
    - uses: actions/checkout@v2
      with:
        fetch-depth: 0

    - uses: iterative/setup-cml@v1

    - name: tpi
      env:
        REPO_TOKEN: ${{ secrets.REPO_TOKEN }}
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
        AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
        AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
        AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}

      shell: bash
      run: |
        cat <<EOF > main.tf
        terraform {
          required_providers {
            iterative = {
              source = "iterative/iterative",
            }
          }
        }

        provider "iterative" {}

        resource "iterative_task" "train" {
          cloud     = "az"
          machine   = "Standard_D2S_v3"
          region    = "us-west"
          spot      = 0

          workdir {
            input = "."
            output = "."
          }
          
          environment = {
            EPOCHS = 1
          }

          script = <<-END
            #!/bin/bash
            sudo apt update
            sudo apt-get install -y software-properties-common build-essential python3-pip
            pip3 install -r requirements.txt
            python3 train.py
          END
        }
        EOF

        terraform init 
        terraform apply --auto-approve

        if terraform console <<< 'iterative_task.train.status["succeeded"]'; then
          echo 'Destroying...'
          terraform destroy --auto-approve
 
          cml send-github-check --token=$GITHUB_TOKEN --conclusion=success --title='CML report' report.md

          cml pr --md output/* >> report.md
          cml send-comment --update report.md
          cml send-comment --update --pr --commit-sha HEAD report.md
        else
          echo 'Creating report...'
          echo 'In progress...' > report.md
          cml send-github-check --token=$GITHUB_TOKEN --conclusion=neutral --title='CML report' report.md
        fi

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Feb 21, 2022

note the shell: bash due to the redirection

GNU Bash version

if terraform console <<< 'iterative_task.example.status["succeeded"]'; then
  ...
fi

POSIX compliant version ™

if echo 'iterative_task.example.status["succeeded"]' | terraform console; then
  ...
fi

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 21, 2022

Here you're a tentative solution to the #357 potential XY problem.

It would be nice to have an easier way of checking if a task has either failed or succeeded, but I'm afraid it would either involve exposing a redundant attribute (e.g. completed: boolean) or finding another way of invoking task from CI/CD systems.

Succeeded + failed check

if terraform console <<< 'try(iterative_task.example.status["succeeded"], 0) + try(iterative_task.example.status["failed"], 0)' | grep --quiet --invert-match 0; then
  echo the task has either succeeded or failed, destroying the resources...
fi

Loop–based alternative (8 bytes less)

if terraform console <<< 'sum([for status in ["succeeded", "failed"] : try(iterative_task.example.status[status], 0)])' | grep --quiet --invert-match 0; then
  echo the task has either succeeded or failed, destroying the resources...
fi

This was referenced Mar 31, 2022
@casperdcl casperdcl removed their assignment Apr 19, 2022
@casperdcl
Copy link
Contributor

I don't quite grok if succeeded; then destroy; report pass; else report running; fi

Surely it should be report running; while !succeeded; do sleep; done; destroy; report pass?

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Apr 19, 2022

Are you referring to #389 (comment)? 🤔 If so, you're probably missing a detail: task was modified (#339) to restart workflows when machines shut down. Intuitive as it gets.

@casperdcl
Copy link
Contributor

casperdcl commented Apr 19, 2022

I just realised 🤢

@0x2b3bfa0

This comment was marked as off-topic.

@0x2b3bfa0
Copy link
Member

No longer a valid use case, use leo read --status or propose new output formats for the standalone command-line tool

@0x2b3bfa0 0x2b3bfa0 reopened this Oct 10, 2022
@0x2b3bfa0 0x2b3bfa0 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Markdown files logs p1-important High priority resource-task iterative_task TF resource ui/ux User interface/experience
Projects
None yet
Development

No branches or pull requests

3 participants