Fractal default arguments for tasks #177

tcompa · 2022-11-04T08:04:47Z

NOTE: this is not an urgent issue, but one that we will need to tackle at some point

Currently fractal requires a set of standard arguments for each task, on top of the task-specific ones:

def some_task(
    *,
    input_paths: Sequence[Path],  # required
    output_path: Path,            # required
    metadata: Dict[str, Any],     # required
    component: str,               # required for parallel tasks
    some_other_arg_1: str,
    some_other_arg_2: int = None,
):

While the metadata and component one will likely stay, we need to re-evaluate the assumptions behind the I/O paths arguments. Briefly, some comments on how things work at the moment:

Each task goes from N>=1 input paths to a single output path. Is it always the case? We never really use N>1 inputs, for the moment (apart from maybe in create_zarr_structure), but we kept it there especially in view of multiplexing.
I/O paths typically look like /some/path/*.extension. We often neglect the extension part. Is it always required? Is it sometimes redundant/wrong? (spoiler: yes, sometimes it is irrelevant). Can we drop it?
Tasks in a workflow form a chain, where the N-th input is always the (N-1)-th output. Are there use cases where this is not sufficiently flexible?

An obvious possibility is to remove the input_paths and output_path arguments from the fractal-required ones, and to shift their definition more towards the user. I'm not (yet) suggesting we do it, but we should explore this approach and assess pros/cons.

The text was updated successfully, but these errors were encountered:

jluethi · 2022-11-04T12:05:00Z

Good points, let's evaluate this! Two quick thought:

Tasks in a workflow form a chain, where the N-th input is always the (N-1)-th output. Are there use cases where this is not sufficiently flexible?

Let's look at this also in the setting of more flexible workflows, resubmissions and running on partial datasets. Maybe we'll need some of the flexibility we barely use for that.

An obvious possibility is to remove the input_paths and output_path arguments from the fractal-required ones, and to shift their definition more towards the user. I'm not (yet) suggesting we do it, but we should explore this approach and assess pros/cons.

I'd like to maintain the tasks as something that can be run outside of Fractal => the tasks do somehow need to get input & output paths

tcompa · 2022-11-04T12:17:55Z

I'd like to maintain the tasks as something that can be run outside of Fractal => the tasks do somehow need to get input & output paths

To clarify: what I meant was not to remove the arguments, but to have them handled in the same way as the other non-fractal-specific arguments are handled. The behavior for someone who runs the task from outside Fractal would be identical, and the only difference would be for someone who runs the tasks through Fractal.

tcompa · 2022-11-23T09:15:43Z

Related to this part of the specs:

* [ ] S8. There are relevant warnings/barriers that stop users from rerunning individual tasks when this would lead to issues (e.g. rerunning tasks that overwrite images without rerunning the original parsing of the data)
* [ ] S9. As a user, I can control at submission whether the existing outputs will be overwritten or new outputs will be created

a possible way forward is to include a default overwrite=False argument in all tasks. In this way, the server could easily identify "dangerous" tasks and raise a warning or an error.
Note that only a few tasks would actually do something with this overwrite parameter, while it could be irrelevant for others.

jluethi · 2022-11-23T09:23:21Z

Yes, I think such a flag is generally a good idea for tasks. It's quite a top-level question whether a task should overwrite existing data or not and may influence the handling quite a bit. Some tasks already have this, like illumination correction.

To address S8:
Typical behavior is that we want to run illumination correction ONCE on the images. That time, it does not need to raise any warnings. What should not be possible is to the rerun the same illumination correction task again easily (without having rerun the parsing).
Thus, not sure whether the overwrite flag solves those issues

tcompa · 2023-02-16T07:32:20Z

Some discussion points for upcoming call:

Tasks-side:

Should we remove the pydantic layer from tasks?
Known problem: extensions are often useless. I think we can simplify that part.

Server/task-integration:

Known problem: some tasks require two inputs: empty zarr structure + actual input data (either images or another zarr). This is currently handled with some workarounds (using Dataset.metadata), which doesn't look ideal.
MIP workflows: 3D -> 2D.. how can I get back to 3D data?

Out of scope for today (I'd say):

Handling point S8 above. This is related to the broad topic "can we store some status information inside the ome-ngff metadata?"

jluethi · 2023-02-16T08:16:39Z

On 1.
We keep pydantic for the moment to get input validation.
But:

Remove paths from everywhere
Remove default values from TaskArguments => via Optional, then .dict(exclude_none=True)

tcompa · 2023-02-16T08:30:43Z

On 2:

Remove glob_pattern from input_paths and output_path, and include it as an additional task argument when needed (namely for the ome-zarr creation and yokogawa-to-zarr conversion).
more general glob_patterns to only select a few images? Interesting, let's open another issue

jluethi · 2023-02-16T08:38:59Z

Also on 2: Potentially add the glob_pattern to the metadata to pass between those tasks

tcompa · 2023-02-16T09:51:22Z

(reserved-keyword arguments: input_paths, output_path, metadata, component)

When the task acts on a zarr array, the input_paths argument should look like ["/some/path/to/a.zarr"] (and not ["/some/path/to/*.zarr"], nor ["/some/path/to/"]).
Can we relax the reserved-keyword constraint, and allow a user to specify input_paths as an argument of a WorkflowTask?
Current choice: some tasks (e.g. illumination correction, or possibly ROI-selection from a different (2D) zarr) may still take part of their input as separate arguments (e.g. the correction-matrices paths, or path to the 2D zarr with ROIs). This means there is not urgent need to automate this argument definition on the server side (this means they can be separate arguments, and do not need to enter one of the reserved-keywords arguments).
Should we add a Dataset.resource.info additional attribute? That could be (1) multiplexing cycle label ("monday", "tuesday", ..), or (2) something in {"3D", "3D-illumination-corrected", "2D", ...}.
- How to update it dynamically during workflow execution (where DB is not known)
- Related to this: how do we update metadata after a task is over? Probably we should enforce an addition-only policy, with no replacements. This is partly mitigated by points 1 and 2, at least for the problem of doing 3D->2D->3D workflows.
- Relate to this: let's make sure that info information are transferred back to the DB at the end of execution (if they are modified)
Concerning compound tasks: they are logical units of two tasks (a non-parallel preliminary task and a parallel execution) that need to share some data (possibly also parameters, but Dataset.metadata is a reasonable place for that). Note that the shared data are mostly a few paths, not the actual image data. The sharing can happen on-disk, possibly in a "temporary" location, or in the metadata inputs/outputs. Current preference is for using special key-value pairs in metadata dictionary, so that the workflow&task definition remains the same, and the compound-task would "only" exist as an additional endpoint.

tcompa · 2023-02-16T11:51:20Z

Closing in favor of:

(@jluethi am I missing something?)

jluethi · 2023-02-16T11:58:44Z

@tcompa Only thing I can think of at the moment to add:
How do we pass the glob pattern (/ *.ext) between the create-ome-zarr and the yokogawa-to-ome-zarr tasks? Should it go into the metadata dictionary? => part of the discussion here: #299

tcompa · 2023-02-16T12:32:22Z

Should it go into the metadata dictionary?

Yes, I would say the first implementation follows the standard behavior which we use for num_levels or coarsening_xy, namely the new argument (glob_pattern) is added to the metadata and the downstream task reads from there.
Later on we can decide to include this parameter as part of the specific key-value pair of information to be shared among the two tasks - as in #299.

jluethi · 2023-02-16T12:48:44Z

Makes sense.

One thing I just realized though: Currently, to the user, the '.png' or '.tif' is not a task parameter, but something that is defined when adding a resource => the task can just read it from the resource? Or how does it go from the resource to the task? Does Fractal-server combine it with the base folder string and just provide that combination?

jluethi added this to Fractal Project Management Nov 4, 2022

jluethi moved this to TODO in Fractal Project Management Nov 4, 2022

tcompa mentioned this issue Nov 4, 2022

Check types for task-argument validator #173

Closed

tcompa mentioned this issue Nov 22, 2022

Each attribute in metadata needs to exist somewhere else in the OME-Zarr file #212

Closed

tcompa mentioned this issue Feb 10, 2023

Document how to use fractal tasks outside fractal #291

Closed

jluethi added Priority Important, but not the highest priority High Priority Current Priorities & Blocking Issues and removed Priority Important, but not the highest priority labels Feb 13, 2023

jluethi mentioned this issue Feb 16, 2023

Fractal workflow history fractal-analytics-platform/fractal-server#506

Closed

tcompa closed this as completed Feb 16, 2023

github-project-automation bot moved this from TODO to Done in Fractal Project Management Feb 16, 2023

jluethi removed this from Fractal Project Management Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fractal default arguments for tasks #177

Fractal default arguments for tasks #177

tcompa commented Nov 4, 2022

jluethi commented Nov 4, 2022

Uh oh!

tcompa commented Nov 4, 2022

Uh oh!

tcompa commented Nov 23, 2022

Uh oh!

jluethi commented Nov 23, 2022

Uh oh!

tcompa commented Feb 16, 2023 •

edited

Loading

Uh oh!

jluethi commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023 •

edited

Loading

Uh oh!

jluethi commented Feb 16, 2023 •

edited

Loading

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

jluethi commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

jluethi commented Feb 16, 2023 •

edited

Loading

Uh oh!

Fractal default arguments for tasks #177

Fractal default arguments for tasks #177

Comments

tcompa commented Nov 4, 2022

jluethi commented Nov 4, 2022

Uh oh!

tcompa commented Nov 4, 2022

Uh oh!

tcompa commented Nov 23, 2022

Uh oh!

jluethi commented Nov 23, 2022

Uh oh!

tcompa commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluethi commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluethi commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

jluethi commented Feb 16, 2023

Uh oh!

tcompa commented Feb 16, 2023

Uh oh!

jluethi commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcompa commented Feb 16, 2023 •

edited

Loading

tcompa commented Feb 16, 2023 •

edited

Loading

jluethi commented Feb 16, 2023 •

edited

Loading

jluethi commented Feb 16, 2023 •

edited

Loading