-
Notifications
You must be signed in to change notification settings - Fork 4
Parsl jobs lost on larger workflows: parsl.executors.high_throughput.interchange.ManagerLost #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this. I'm rerunning your example, up to the
Work on this issue is moved to the tasks repo: fractal-analytics-platform/fractal-tasks-core#72 |
Ah ok. So then, server-side, would be important that these types of messages go to some log where the user knows to look. This was in parsl.log? |
It's not Those logs are located in paths like
|
The issue of harmonizing logs is broader than this single case, so I'm closing this issue and we should come back to this topic later on with some more organized plans. But I think that we first need to consolidate our choices on parsl executors, before entering monitoring/logs organization. The current choice seems to work (up to task errors, of course), but we still need to make sure that we are happy with it. Tests (this one included) will help us to say so (or not). |
I've been trying to run a large example through the current Fractal architecture and ran into this issue:
Jobs start (up to 4 jobs for the 10 well case). After some time, some jobs finish. Some other jobs start. They run for a few minutes. After about 10 minutes, all jobs have stopped.
It created the zarr structure, created ROI tables within them. But no image data ever gets written to the zarr file. And the server contains this error message:
Currently can't use parsl visualize, because my current installation approach doesn't seem to make that easy somehow. I'm a bit worried if we're loosing the manager, if a job gets suddenly shut down. Where would we find more relevant logs?
This is my run script for the example:
The text was updated successfully, but these errors were encountered: