Skip to content

Investigate issue in autoscaling where the wrong node has been removed #4344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sanderegg opened this issue Jun 12, 2023 · 1 comment · Fixed by #4410
Closed

Investigate issue in autoscaling where the wrong node has been removed #4344

sanderegg opened this issue Jun 12, 2023 · 1 comment · Fixed by #4410
Assignees
Labels
a:autoscaling autoscaling service in simcore's stack
Milestone

Comments

@sanderegg
Copy link
Member

A weird case happened today.
Autocaling might have removed the wrong node from the nodes list.
A node with IP 10.0.3.129 came in at around 7:08
Then another node with IP 10.0.3.12 came in at around 7:15
Later at 13:00:

  • computer with IP 10.0.3.129 is still in AWS but not in the nodes list
  • computer with IP 10.0.3.12 is not in AWS and was removed (probably by the autoscaling app), but is still in the nodes list as Down and Drain

Investigate whether some wrong handling could have gone up in autoscaling service

Graylog logs: https://monitoring.osparc.io/graylog/search/64777e171bbbd13c0565e1e5?q=%2210.0.3.12%22+OR+%2210.0.3.129%22&rangetype=relative&from=28800
CloudTrail: https://us-east-1.console.aws.amazon.com/cloudtrail/home?region=us-east-1#/events?ResourceName=i-03e858b7431e66ced

@sanderegg sanderegg added the a:autoscaling autoscaling service in simcore's stack label Jun 12, 2023
@sanderegg sanderegg added this to the Watermelon milestone Jun 12, 2023
@sanderegg sanderegg self-assigned this Jun 12, 2023
@sanderegg
Copy link
Member Author

issue confirmed yesteday:
computer with IP 10.0.2.15 shown as Down Drained, while machine with IP 10.0.2.155 is running and not part of the swarm

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:autoscaling autoscaling service in simcore's stack
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant