Skip to content

🎨Computation backend: refactor director-v2 internal computational scheduler to be less resource heavy #6696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Nov 8, 2024

What do these changes do?

Until now

  • dv-2 internal computational scheduler is run as a background task that concurrently schedules every ongoing computational job:
    • every time a new pipeline is started, it re-schedules all the pipelines even if unnecessary,
    • every time a pipeline is requested to stop, it re-schedules all the pipelines even if unnecessary,
    • with multiple replicas this generates a load of unnecessary scheduling as all the replicas would schedule the same pipelines,

What this PR brings

  • exclusive decorator changes (redis_utils.py, test_redis_utils.py):

    • decorator typing is now fixed
    • decorator can now create the lock_key dynamically via a lock_key_builder callable (a la aiocache),
    • decorator can now get the redis client dynamically via a redis_client_builder callable (a la aiocache),
  • dv-2 scheduling is now running 1 asyncio.Task per computational pipeline:

    • each pipeline is now scheduled independently from the others every 5 seconds,
    • when a pipeline is started, only that very pipeline is scheduled independently of any other pipeline,
    • when a pipeline is requested to stop, only that very pipeline is scheduled independently of any other pipeline,
    • when a computational node inside a pipeline is complete, only the very pipeline it lives in is re-scheduled without waiting for 5 seconds,
    • when a pipeline is complete (success, cancelled or errored), the scheduling task is removed,
    • with multiple replicas of the dv-2, only 1 replica schedule a pipeline at a time, the other replicas will skip it as the scheduling method is made exclusive

What this PR does not yet bring (will come in part 2 and is not part of this PR!!!)

  • mulitple replicas of the dv-2 shall share the load of scheduling pipelines, which will probably be done via RabbitMQ

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added the a:director-v2 issue related with the director-v2 service label Nov 8, 2024
@sanderegg sanderegg added this to the Event Horizon milestone Nov 8, 2024
@sanderegg sanderegg self-assigned this Nov 8, 2024
Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.44%. Comparing base (17f486e) to head (d9df523).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (17f486e) and HEAD (d9df523). Click for more details.

HEAD has 30 uploads less than BASE
Flag BASE (17f486e) HEAD (d9df523)
unittests 31 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6696       +/-   ##
===========================================
- Coverage   87.73%   64.44%   -23.30%     
===========================================
  Files        1571      657      -914     
  Lines       63271    33103    -30168     
  Branches     2118      265     -1853     
===========================================
- Hits        55512    21333    -34179     
- Misses       7433    11709     +4276     
+ Partials      326       61      -265     
Flag Coverage Δ *Carryforward flag
integrationtests 64.85% <ø> (+<0.01%) ⬆️ Carriedforward from 17f486e
unittests 58.38% <ø> (-27.33%) ⬇️

*This pull request uses carry forward flags. Click here to find out more.

Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.44% <ø> (-7.84%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director 58.38% <ø> (ø)
director_v2 76.42% <ø> (-14.38%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 59.79% <ø> (-30.01%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server 79.42% <ø> (-5.73%) ⬇️
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 59.79% <ø> (-28.95%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f486e...d9df523. Read the comment docs.

@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from 236a7d8 to b6428ee Compare November 11, 2024 14:44
@sanderegg sanderegg changed the title 🎨Computation backend: release director-v2 scheduler GIL 🎨Computation backend: director-v2 make computational scheduler share load among replicas Nov 12, 2024
@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from ef5a649 to 050fb67 Compare November 12, 2024 16:21
@sanderegg sanderegg changed the title 🎨Computation backend: director-v2 make computational scheduler share load among replicas 🎨Computation backend: refactor director-v2 internal computational scheduler to be less resource heavy Nov 12, 2024
@sanderegg sanderegg marked this pull request as ready for review November 12, 2024 16:46
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! thx

Copy link
Contributor

@bisgaard-itis bisgaard-itis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Just a few comments/questions from my side.

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from 04a5915 to e0b03da Compare November 14, 2024 09:50
@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch from dc69b24 to d9df523 Compare November 14, 2024 10:17
Copy link

@sanderegg sanderegg enabled auto-merge (squash) November 14, 2024 10:50
@sanderegg sanderegg merged commit 0781e63 into ITISFoundation:master Nov 14, 2024
88 of 89 checks passed
@sanderegg sanderegg deleted the computation-backend/release-scheduler-gil branch November 14, 2024 11:16
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Jan 15, 2025
58 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants