Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Use multiple processes per DataSource in Manager #311

Closed
Tracked by #341
JackKelly opened this issue Oct 29, 2021 · 7 comments
Closed
Tracked by #341

Use multiple processes per DataSource in Manager #311

JackKelly opened this issue Oct 29, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@JackKelly
Copy link
Member

JackKelly commented Oct 29, 2021

Some DataSources would benefit a lot from having multiple Processes per DataSource.

Maybe make the n_processes configurable.

@JackKelly JackKelly added the enhancement New feature or request label Oct 29, 2021
@JackKelly JackKelly moved this to Todo in Nowcasting Oct 29, 2021
This was referenced Nov 2, 2021
@peterdudfield peterdudfield moved this from Todo to In Progress in Nowcasting Nov 12, 2021
@JackKelly JackKelly self-assigned this Nov 18, 2021
@JackKelly
Copy link
Member Author

I'll work on this now...

@JackKelly
Copy link
Member Author

Hehe... making leonardo sweat...
image

@JackKelly
Copy link
Member Author

with 4 processes per data source, we're getting 18 batches of satellite data in 131 seconds. Which is 7 secs per batch.

@JackKelly
Copy link
Member Author

with 8 processes, it's quite juddery and does about one sat batch every 9 secs.

@JackKelly
Copy link
Member Author

with 1 process it's also 8 seconds per batch!

@JackKelly
Copy link
Member Author

8 seconds per batch isn't terrible: That's about 2.3 days to create 25,000 batches.

but, yeah, we probably want to make sure prepare_ml_data.py runs over the weekend

@JackKelly
Copy link
Member Author

and about 8 seconds per batch with 2 processes per DataSource!

OK. I think the conclusion is clear: Using multiple processes per DataSource doesn't speed up satellite. Which is probably because dask is already using multiple processes for us. So I'll close the associated PR. This isn't such bad news because concurrency definitely adds complexity!

Repository owner moved this from In Progress to Done in Nowcasting Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant