You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Approximately every two weeks, we observe a large number of the following error messages in the logs:
Error attempting to get plugin server address for RPC: Failed to dial the plugin server in 10s
When this error occurs:
The number of EC2 instances spikes to the configured maximum.
Instances appear to be stuck: they are not processing any jobs and are also not being automatically removed.
All jobs in GitLab become stuck with the error:
ERROR: Preparation failed: exit status 1
Manually terminating the affected EC2 instances sometimes restores normal operation, but it seems to be occurring for a period of time and we just need to keep removing EC2 instances manually for that period of time.
To Reproduce
The issue is intermittent and occurs roughly every two weeks. We have not yet identified a consistent way to reproduce it. However, we have observed that the RPC error perfectly matches the occurrence of the issue.
Expected behavior
EC2 instances should not get stuck randomly. Runners should clean up properly and continue processing jobs without requiring manual intervention.
The text was updated successfully, but these errors were encountered:
Describe the bug
We are experiencing periodic failures with our GitLab Runner setup on AWS using the
cattle-ops/gitlab-runner/aws
Terraform module.Configuration
Issue
Approximately every two weeks, we observe a large number of the following error messages in the logs:
When this error occurs:
To Reproduce
The issue is intermittent and occurs roughly every two weeks. We have not yet identified a consistent way to reproduce it. However, we have observed that the
RPC
error perfectly matches the occurrence of the issue.Expected behavior
EC2 instances should not get stuck randomly. Runners should clean up properly and continue processing jobs without requiring manual intervention.
The text was updated successfully, but these errors were encountered: