Skip to content

comp_task requests are sequentially committed for each node in the project #2043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 11, 2020

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Dec 10, 2020

What do these changes do?

NOTE: Original PR code was scraped as it did not address the issue

Issue explained by @sanderegg

see original below

THE only one (service) inserting there is the director-v2. the sidecar/services are only updating.
Therefore, I think the problem is the following:

webclient -> PUT /projects/{uuid}
-> webserver saves project AND sends a /POST /computations/{uuid} to the director-v2
-> director-v2 inserts the new computation by deleting and then inserting
PROBLEM: if you move a node the webclient also triggers a PUT cause the node changed, therefore if the user moves his things/add nodes/modify nodes triggering a lot of PUTs, I think the director-v2 receives a string of requests (which are all async). so my guess is that the director-v2 is still treating the request when a nother arrives.

Approved implementation proposal:

see original below

  • I create something similar to the run_sequentially_in_context decorator which will be applied to a function responsible for updating each individual node in the project
  • This will put calls to this function in a queue which will guarantee FIFO thus applying all the requests from the frontend in sequence.
  • The delete recreate logic will still continue to exist and I will remove the upset patter to detect future errors.

Related issue number

How to test

Checklist

  • Did you change any service's API? Then make sure to bundle document and upgrade version (make openapi-specs, git commit ... and then make version-*)
  • Unit tests for the changes exist
  • Runs in the swarm
  • Documentation reflects the changes
  • New module? Add your github username to .github/CODEOWNERS

@GitHK GitHK changed the title implementing upset pattern for director_v2 WIP: implementing upset pattern for director_v2 Dec 10, 2020
@GitHK GitHK marked this pull request as draft December 10, 2020 15:45
@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #2043 (e1ad6ce) into master (78b2e8c) will increase coverage by 0.0%.
The diff coverage is 87.7%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #2043   +/-   ##
======================================
  Coverage    72.5%   72.6%           
======================================
  Files         404     405    +1     
  Lines       14794   14844   +50     
  Branches     1496    1502    +6     
======================================
+ Hits        10738   10782   +44     
- Misses       3659    3666    +7     
+ Partials      397     396    -1     
Flag Coverage Δ
integrationtests 63.3% <87.7%> (+0.1%) ⬆️
unittests 66.6% <38.5%> (-0.2%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...c/simcore_service_director_v2/utils/async_utils.py 85.4% <85.4%> (ø)
..._director_v2/modules/db/repositories/comp_tasks.py 92.5% <100.0%> (+0.2%) ⬆️
...erver/src/simcore_service_webserver/director_v2.py 85.8% <0.0%> (-2.7%) ⬇️
...ce_webserver/resource_manager/garbage_collector.py 77.5% <0.0%> (-1.0%) ⬇️
.../director/src/simcore_service_director/producer.py 61.1% <0.0%> (+0.6%) ⬆️
...webserver/computation_comp_tasks_listening_task.py 90.0% <0.0%> (+3.3%) ⬆️

@GitHK GitHK changed the title WIP: implementing upset pattern for director_v2 implementing upset pattern for director_v2 Dec 10, 2020
@GitHK GitHK marked this pull request as ready for review December 10, 2020 16:35
@GitHK GitHK requested review from pcrespov and sanderegg December 10, 2020 16:35
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern is new to me. What is it about? Avoids the director-v2 getting angry? 😤 🤣 😉

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor optimization ;-)

**task_db.dict(by_alias=True, exclude_unset=True)
)
insert_stmt = insert(comp_tasks).values(
**task_db.dict(by_alias=True, exclude_unset=True)
Copy link
Member

@pcrespov pcrespov Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR optimization: export to dict only once

# ... let's upseRT (insert or update) so people do not get upset! ;-)

task_values: Dict[str,Any] = task_db.dict(by_alias=True, exclude_unset=True)
upsert_stmt = insert(comp_tasks).values(**task_values).on_conflict_do_update(
                index_elements=[comp_tasks.c.project_id, comp_tasks.c.node_id],
                set_=task_values,
            )

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry but I am against that change for now.
the first line in the function deletes the comp_tasks so the problem lies somewhere else.

there should be no need to upsert here. the problem is probably more that this code gets called asynchronously several times. doing upserts here will certainly generate some undefined state in the DB.

@GitHK
Copy link
Contributor Author

GitHK commented Dec 11, 2020

sorry but I am against that change for now.
the first line in the function deletes the comp_tasks so the problem lies somewhere else.

there should be no need to upsert here. the problem is probably more that this code gets called asynchronously several times. doing upserts here will certainly generate some undefined state in the DB.

You are correct about your assumption. I thought the same. But for some reason the another process/task is interacting with comp_task causing the item to already exist between the time you delete it and recreate it again, causing a duplicate key error. This happened yesterday on master.

Have a look at the attached log (the error is still present in the deployment).

unexpected_error.log

@GitHK GitHK requested a review from sanderegg December 11, 2020 06:52
@sanderegg
Copy link
Member

sorry but I am against that change for now.
the first line in the function deletes the comp_tasks so the problem lies somewhere else.
there should be no need to upsert here. the problem is probably more that this code gets called asynchronously several times. doing upserts here will certainly generate some undefined state in the DB.

You are correct about your assumption. I thought the same. But for some reason the another process/task is interacting with comp_task causing the item to already exist between the time you delete it and recreate it again, causing a duplicate key error. This happened yesterday on master.

Have a look at the attached log (the error is still present in the deployment).

unexpected_error.log

yes. as I said. THE only one inserting there is the director-v2. the sidecar/services are only updating.
Therefore, I think the problem is the following:

  • webclient -> PUT /projects/{uuid}
  • -> webserver saves project AND sends a /POST /computations/{uuid} to the director-v2
  • -> director-v2 inserts the new computation by deleting and then inserting
    PROBLEM: if you move a node the webclient also triggers a PUT cause the node changed, therefore if the user moves his things/add nodes/modify nodes triggering a lot of PUTs, I think the director-v2 receives a string of requests (which are all async). so my guess is that the director-v2 is still treating the request when a nother arrives.

@GitHK
Copy link
Contributor Author

GitHK commented Dec 11, 2020

sorry but I am against that change for now.
the first line in the function deletes the comp_tasks so the problem lies somewhere else.
there should be no need to upsert here. the problem is probably more that this code gets called asynchronously several times. doing upserts here will certainly generate some undefined state in the DB.

You are correct about your assumption. I thought the same. But for some reason the another process/task is interacting with comp_task causing the item to already exist between the time you delete it and recreate it again, causing a duplicate key error. This happened yesterday on master.
Have a look at the attached log (the error is still present in the deployment).
unexpected_error.log

yes. as I said. THE only one inserting there is the director-v2. the sidecar/services are only updating.
Therefore, I think the problem is the following:

  • webclient -> PUT /projects/{uuid}
  • -> webserver saves project AND sends a /POST /computations/{uuid} to the director-v2
  • -> director-v2 inserts the new computation by deleting and then inserting
    PROBLEM: if you move a node the webclient also triggers a PUT cause the node changed, therefore if the user moves his things/add nodes/modify nodes triggering a lot of PUTs, I think the director-v2 receives a string of requests (which are all async). so my guess is that the director-v2 is still treating the request when a nother arrives.

That sounds really possible. Would you something like this working here:

  • I create something similar to therun_sequentially_in_context decorator which will be applied to a function responsible for updating each individual node in the project
  • This will put calls to this function in a queue which will guarantee FIFO thus applying all the requests from the frontend in sequence.
  • The delete recreate logic will still continue to exist and I will remove the upset patter to detect future errors.

@sanderegg
Copy link
Member

sorry but I am against that change for now.
the first line in the function deletes the comp_tasks so the problem lies somewhere else.
there should be no need to upsert here. the problem is probably more that this code gets called asynchronously several times. doing upserts here will certainly generate some undefined state in the DB.

You are correct about your assumption. I thought the same. But for some reason the another process/task is interacting with comp_task causing the item to already exist between the time you delete it and recreate it again, causing a duplicate key error. This happened yesterday on master.
Have a look at the attached log (the error is still present in the deployment).
unexpected_error.log

yes. as I said. THE only one inserting there is the director-v2. the sidecar/services are only updating.
Therefore, I think the problem is the following:

  • webclient -> PUT /projects/{uuid}
  • -> webserver saves project AND sends a /POST /computations/{uuid} to the director-v2
  • -> director-v2 inserts the new computation by deleting and then inserting
    PROBLEM: if you move a node the webclient also triggers a PUT cause the node changed, therefore if the user moves his things/add nodes/modify nodes triggering a lot of PUTs, I think the director-v2 receives a string of requests (which are all async). so my guess is that the director-v2 is still treating the request when a nother arrives.

That sounds really possible. Would you something like this working here:

  • I create something similar to therun_sequentially_in_context decorator which will be applied to a function responsible for updating each individual node in the project
  • This will put calls to this function in a queue which will guarantee FIFO thus applying all the requests from the frontend in sequence.
  • The delete recreate logic will still continue to exist and I will remove the upset patter to detect future errors.

this sounds like a good proposal.

@GitHK GitHK changed the title implementing upset pattern for director_v2 comp_task requests are sequentially committed for each node in the project Dec 11, 2020
@GitHK GitHK self-assigned this Dec 11, 2020
@GitHK GitHK added bug buggy, it does not work as expected a:director-v2 issue related with the director-v2 service labels Dec 11, 2020
@GitHK GitHK added this to the Alfred_Büchi milestone Dec 11, 2020
@GitHK GitHK requested a review from pcrespov December 11, 2020 09:21
@GitHK GitHK requested review from sanderegg December 11, 2020 10:36
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment. I think something from master went away

@GitHK GitHK requested a review from sanderegg December 11, 2020 12:25
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good!

@GitHK GitHK merged commit 8bcfa6b into ITISFoundation:master Dec 11, 2020
@GitHK GitHK deleted the upsert-comp-tasks branch December 11, 2020 14:23
@sanderegg sanderegg mentioned this pull request Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director-v2 issue related with the director-v2 service bug buggy, it does not work as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants