Skip to content

Parse additional yokogawa_to_zarr parameters to the command line interface #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jluethi opened this issue May 24, 2022 · 4 comments
Closed
Assignees

Comments

@jluethi
Copy link
Collaborator

jluethi commented May 24, 2022

After switching away from the apply file, some important parameters are hard-coded in the fractal_cmd.py file. We should parse them to the command line, as users may need to change them.
Core would be:

  • dims
  • num_levels
  • ext (=> does the extension setting actually change things?)

num_levels & ext could be optional arguments with the current defaults. Dims should be a required parameter, as it will be different for basically every experiment (unless we can parse the metadata file and get that info directly, see: https://github.com/fractal-analytics-platform/mwe_fractal/issues/46, but support for manually setting this parameter should be maintained in case there is no metadata file)

Also, how do we handle slurm parameters like number of nodes, cores, memory etc?

@jluethi
Copy link
Collaborator Author

jluethi commented May 24, 2022

Another optional argument should be the delete argument

tcompa added a commit that referenced this issue May 30, 2022
* Isolate pyramid creation function [ref #53], and test it within maximum_intensity_projection task;
* Remove "factor" from coarsening_factor variables (BREAKING);
* Remove dims argument from create_zarr_structure;
* Put back delete_in argument in yokogawa_to_zarr [ref #49];
* Fix match between (X,Y) dimensions and (chunk_size_x,chunk_size_y), when rechunking during pyramid creation [ref #32];
* Update examples folder.
@tcompa
Copy link
Collaborator

tcompa commented May 30, 2022

At the moment, workflow_apply takes as an input a JSON file which looks like

{
"workflow_name": "uzh_1_well_2x2_sites",
"dims": [2, 2],
"coarsening_xy": 3,
"coarsening_z": 1,
"num_levels": 5
}

and where one can also add the delete_in="True" item. These configurations are specific for each workflow.

The configuration for parsl/slurm is currently set globally at the fractal level, and it is written in fractal/fractal_config.py, which looks like

# Parameters of parsl.executors.HighThroughputExecutor
max_workers = 4  # This is the maximum number of workers per block

# Parameters of parsl.providers.SlurmProvider
# Note that worker_init is a command which is included at the beginning of
# each SLURM submission scripts
nodes_per_block = 1  # This implies that a block corresponds to a node
max_blocks = 15  # Maximum number of blocks (=nodes) that parsl can use
cores_per_node = 16
mem_per_node_GB = 32
partition = "main"
worker_init = "source /opt/easybuild/software/Anaconda3/2019.07/"
worker_init += "etc/profile.d/conda.sh\n"
worker_init += "conda activate fractal"

By now the key point is the distinction between global and workflow-specific options, but the current structure is still provisional.

@tcompa tcompa closed this as completed May 30, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented May 30, 2022

Yes, I think we should think about that structure a bit further. Each task may require some parameters, so it may not make sense when scaling this up to have a global parameter setting, rather either parameters passed to each task (e.g. as command line arguments) or a json file that is being generated by task.
Any other ideas for good structures once we have e.g. 10-20 tasks with ~3-6 parameters each?

@jluethi
Copy link
Collaborator Author

jluethi commented May 30, 2022

Also, global slurm specification is a good beginning, but we'll have to think about use-cases where individual tasks have very different needs (e.g. a task that needs a GPU, or high memory vs. low memory tasks, potentially even the flexibility that a task may require different amounts of memory depending on the dataset)

jacopo-exact pushed a commit that referenced this issue Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants