-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCP VM instance not terminating after timeout #834
Comments
Tried in different region and instance type: |
@eemelipa thanks a lot for the issue. |
hmm 🤔 the opposite of mine #808 @eemelipa can you post a screenshot for me? After you run your pipeline go to the GCP dashboard for the project then the Activity tab and take a screenshot? If it's in a busy project that has other stuff, search for the "Logs Explorer" and narrow the time to when your pipeline ran, and look for any "severity" high than "notice" |
@dacbd here's the screenshot ^ There weren't any warn/error logs 🤔 The bootCounter: 2 latebootreportevent seems to happen correctly at 5min mark which was the idle-timeout in this case. I don't know if the logs are missing any rows though The credentials have "Compute Admin" privilege |
@eemelipa there should be some firewall/network API calls. When I had the teardowns fail these calls failed and so the instance remained. |
Ok, I adjusted the log filtering and now the firewall inserts are visible: No errors/warnings though. Any thoughts on what to look for next? I tried giving the service account owner access to the project (i.e., all privileges) but that didn't help |
An error here would have made this an easy fix 😞 There is something more complicated going on, "Compute Admin" is a sufficient role. |
We might be able to get more info out of I think something like this should work:
|
Spend good time fiddling manually on the VM instance and got a step forward Looks like the problem is that the CML VM instance did not get any GCP service account: When I put that in place manually and restarted the instance things worked! The instance got deleted after the idle-timeout. Sounds like something should fail if the instance does not have a service account. Here's couple options that came to my mind (obviously you guys know better what's under the hood):
So missing service account is one problem and we seem to also have a second problem. When I give service account to cml runner command it creates the VM instance correctly with the account but it does not give correct Cloud API access scopes: When I gave manually the API scopes the idle-timeout deletion worked ok. Sounds like some changes might be needed to the CML instance creation |
Hmm, it sounds like some documentation clarification might be required? Under the hood, The Are you saying it looks like terraform tried to use those (the This should be easy for me to reproduce and I'll try to get it fixed soon, if you are on discord and willing to test out a patch I can let you know when I have something working (dabarnes on discord) |
Can you share more of the yml from gitlab-ci? I think I misunderstood some of your last reply. Can you share the permissions list or GCP managed role for the service account whose key should be set in the
edit: |
Can be closed with TPI fix: iterative/terraform-provider-iterative#333 |
fixed by iterative/terraform-provider-iterative#333; thanks @dacbd |
Similar issue to #678
I'm starting a self hosted runner via Gitlab CICD to GCP:
After the timeout the VM instance is not shutting down.
journalctl --unit cml --no-pager
command showsThe runner picks up a job correctly and the runner deregisters itself from the Gitlab project. The VM instance just does not shutdown.
On Azure similar config worked ok and the instances were shutting down
The text was updated successfully, but these errors were encountered: