Separate input and output directory settings #395

0x2b3bfa0 · 2022-02-15T15:43:51Z

Follow-up of #340,

closes task: optionally skip downloading artefacts on destroy #303

Old behavior

When output was unset, "" or null it implicitly mirrored the value of input.

resource "iterative_task" "example" {
  ···
  workdir {
    input = "/path/to/input"
    # output = "/path/to/input" is implicit
  }
}

New behavior

When output is unset, "" or null it will be disabled. Doesn't implicitly take the value of input anymore.

resource "iterative_task" "example" {
  ···
  workdir {
    input = "/path/to/input"
    output = "/path/to/output"
  }
}

Follow-up of #340

DavidGOrtega · 2022-02-16T09:27:44Z

Im not sure about this PR. What's the goal? There are scenarios where syncing back to current workdir is fine (CI in example)

0x2b3bfa0 · 2022-02-16T09:30:16Z

What's the goal?

Fix #340: setting workdir.output to null will not synchronize back any artifacts, as proposed.

There are scenarios where syncing back to current workdir is fine (CI in example)

In those scenarios, you can set workdir.input and workdir.output to the same value.

0x2b3bfa0 · 2022-02-16T09:34:59Z

If I recall correctly, 🔔 @casperdcl also had a preference for the old output ||= input behavior. 🤔 Still, I believe that making input and output fully independent will be better in the long run.

🔔 @iterative/cml, any opinions/alternatives?

DavidGOrtega · 2022-02-16T09:38:12Z

I see, but it's a funny mechanism to be guessed. We should at least document it through-fully.
Also, its not to me as interesting as the one I proposed in the original issue.
Reminder:

use a prop skip-output-sync

I consider this superior because changing it would allow to change your mind if we make this property non destructive.
And also is more explicit

0x2b3bfa0 · 2022-02-16T09:49:59Z

I believe that the behavior introduced with this pull request is cleaner and more explicit:¹

workdir {input = null, output = null} — don't push nor pull any data
workdir {input = "a", output = null} — push a on apply and don't pull on destroy
workdir {input = null, output = "b"} — don't push on apply and pull to b on destroy
workdir {input = "a", output = "b"} — push a on apply and pull to b on destroy

What you propose is a bit more complex to understand, at least for me: 🤔

workdir {input = null, output = null} — don't push nor pull any data
workdir {input = "a", output = null} — push a on apply and implicitly pull to a on destroy
workdir {input = "a", output = "b"} — push a on apply and pull to b on destroy
workdir {input = "a", output = null, skip_output_sync = true} — push a on apply and don't pull on destroy

explicit is better than implicit ↩

0x2b3bfa0 · 2022-02-16T09:56:02Z

We should at least document it thoroughly

I wholeheartedly agree, no matter which option we choose.

I consider this superior because changing it would allow to change your mind if we make this property non destructive

Can't we just configure input and output with ForceNew: false in the same way? 🤔

And also is more explicit

To my mind, it's quite the opposite, but I guess that it's a subjective perception in both cases.

DavidGOrtega · 2022-02-16T10:10:43Z

I have never contemplated the idea of not pushing input data. I do not even envision too many scenarios like that.

workdir {input = "a", output = null} — push a on apply and implicitly pull to a on destroy

This is to my mind the most common and expected case. Something that already happens in the CI. If I generate artefacts during the task I want them to be in the same place as it would have been if run locally.

0x2b3bfa0 · 2022-02-16T10:13:02Z

I have never contemplated the idea of not pushing input data. I do not even envision too many scenarios like that.

There might be some users that don't want synchronizing a local working directory but want to retrieve the results after the execution. E.g. clone the repository as part of the script to reduce wait times, or DVC–driven tasks.

0x2b3bfa0 · 2022-02-16T10:18:43Z

This is to my mind the most common and expected case. Something that already happens in the CI.

You seem particularly focused on CI/CD scenarios. I wonder why. 😉

If I generate artefacts during the task I want them to be in the same place as it would have been if run locally.

Sure? Such a behavior would:

Overwrite files in the current directory, corrupting the Terraform state file and permanently undoing any local change made between apply and destroy
Produce several arcane errors (see task destroy multiple tasks may end up in transfer errors that makes impossible the task destruction #306 for example) when running destroy on several tasks that contain the same artifacts

DavidGOrtega · 2022-02-16T10:19:35Z

I believe that the behavior introduced with this pull request is cleaner and more explicit:1

To my mind is not the case because as I stated I think that the most expected way is:

push a on apply and implicitly pull to a on destroy

clone the repository as part of the script to reduce wait times

I do not envision cloning the repo and work locally because that kills the experimentation. Should they change and immediately commit?
Also working in the CI they have already cloned

, or DVC–driven tasks.

DVC driven task use to need at least a dvc pipeline that needs to be pushed?

I might be wrong but we should probably discuss this urgently instead of taking personal assumptions

0x2b3bfa0 · 2022-02-16T10:30:37Z

To my mind is not the case because as I stated I think that the most expected way is: “push a on apply and implicitly pull to a on destroy”

In CI/CD systems this may hold true, but overwriting files as warned on #395 (comment) does't look like the most user–friendly experience for local users.

I do not envision cloning the repo and work locally because that kills the experimentation. Should they change and immediately commit?

🤔 Should we even assume that users have some version control system in place?

Also working in the CI they have already cloned

CI/CD scenarios are pretty clear and work as expected. 👍🏼

DVC driven task use to need at least a dvc pipeline that needs to be pushed?

But probably with DVC itself instead of rclone 🤷🏼‍♂️ Just thinking ahead of time.

I might be wrong but we should probably discuss this urgently instead of taking personal assumptions

Shall we return to this page and refine the existing proposals or suggest new ones? 🤔

DavidGOrtega · 2022-02-16T10:32:44Z

🤔 Should we even assume that users have some version control system in place?

No. But Im not expecting them to clone

DavidGOrtega · 2022-02-16T10:35:25Z

overwriting files as warned on #395 (comment) does't look like the most user–friendly experience for local users.

Overwriting is the most common behaviour in the majority of the commands involving files. The issue here is wanting to do multiple runs in the same folder, but this is a scenario that is managed by a regular user on a daily basis. I would say

DavidGOrtega · 2022-02-16T10:42:35Z

Shall we return to this page and refine the existing proposals or suggest new ones? 🤔

We already had a behaviour in place. The question is that introducing skip is altering completely the at-least-known-not-yet-validated behaviour.

You seem particularly focused on CI/CD scenarios. I wonder why. 😉

It's not because of the CI. Its because as stated I think is the most common way. Let's imagine that task is not running in the cloud. It's just a magical power that enhances our laptop making it able of training with GPU. As I stated the most common way to understand the output is that the artefacts are going to be in the place the script is generating them.

Also we should always reduce the complexity in the story. We can not say for local do a, for CI do b. Thats more confusing than say for local or CI do a

0x2b3bfa0 · 2022-02-17T11:25:36Z

I'm not expecting them to clone

Sounds like a valid use case? 🤔

Overwriting is the most common behaviour in the majority of the commands involving files.

Overwriting is a common behavior in operations that explicitly take a list of files. Doesn't seem the expected behavior for such a complex process, especially when the working directory contains both the user's project and the Terraform code & state files.

The issue here is wanting to do multiple runs in the same folder, but this is a scenario that is managed by a regular user on a daily basis.

What do you mean by “managed by a regular user on a daily basis”? Is this process slow enough to not require any automation? 🤔

We already had a behaviour in place. The question is that introducing skip is altering completely the at-least-known-not-yet-validated behaviour.

Do you mean the original behavior, prior to #340?

It's not because of the CI. Its because as stated I think is the most common way. Let's imagine that task is not running in the cloud. It's just a magical power that enhances our laptop making it able of training with GPU. As I stated the most common way to understand the output is that the artefacts are going to be in the place the script is generating them.

Agreed! That's the behavior we had in place before #340, but that behavior caused #306 and overwrote local files with outdated copies.

Also we should always reduce the complexity in the story.

Agreed!

We can not say for local do a, for CI do b. Thats more confusing than say for local or CI do a

Sometimes, both a and b make sense for different use cases, and overloading the API (#324, #339, #389, #390) to be “consistent” for all of these use cases might be worse that letting users be aware of the differences. 🤔

docs/resources/task.md

Co-authored-by: Casper da Costa-Luis <[email protected]>

Separate input and output directory settings

0ba284c

Follow-up of #340

0x2b3bfa0 self-assigned this Feb 15, 2022

0x2b3bfa0 requested a review from a team February 15, 2022 15:44

0x2b3bfa0 added the p1-important High priority label Feb 15, 2022

0x2b3bfa0 marked this pull request as ready for review February 15, 2022 15:46

Update documentation

8aaf6db

This comment was marked as off-topic.

Sign in to view

casperdcl reviewed Feb 21, 2022

View reviewed changes

docs/resources/task.md Outdated Show resolved Hide resolved

casperdcl approved these changes Feb 21, 2022

View reviewed changes

Apply suggestions from code review

73be83c

Co-authored-by: Casper da Costa-Luis <[email protected]>

casperdcl merged commit ed322f3 into master Feb 21, 2022

casperdcl deleted the separate-input-output branch February 21, 2022 18:24

casperdcl added the resource-task iterative_task TF resource label Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate input and output directory settings #395

Separate input and output directory settings #395

0x2b3bfa0 commented Feb 15, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022

0x2b3bfa0 commented Feb 17, 2022

Separate input and output directory settings #395

Separate input and output directory settings #395

Conversation

0x2b3bfa0 commented Feb 15, 2022 • edited Loading

Follow-up of #340,

Old behavior

New behavior

DavidGOrtega commented Feb 16, 2022

0x2b3bfa0 commented Feb 16, 2022 • edited Loading

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

0x2b3bfa0 commented Feb 16, 2022 • edited Loading

Footnotes

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022 • edited Loading

0x2b3bfa0 commented Feb 16, 2022 • edited Loading

0x2b3bfa0 commented Feb 16, 2022 • edited Loading

DavidGOrtega commented Feb 16, 2022 • edited Loading

0x2b3bfa0 commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022

DavidGOrtega commented Feb 16, 2022 • edited Loading

DavidGOrtega commented Feb 16, 2022

0x2b3bfa0 commented Feb 17, 2022

0x2b3bfa0 commented Feb 15, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

0x2b3bfa0 commented Feb 16, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022 •

edited

Loading

DavidGOrtega commented Feb 16, 2022 •

edited

Loading