You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 16, 2025. It is now read-only.
fix(runners): increase the log level to WARN when using the enable_job_queued_check parameter (#3046)
The readme section detailing the `enable_job_queued_check` makes it an
appealing function, but it was unclear to me that it could cause some
issues when used with ephemeral runners and matrix workflows.
It is particularly challenging to notice for organizations with a very
large number of repositories, at times with low activity the issue is
not present, but when matrix jobs are started or when multiple
repositories are triggering jobs around the same time window (such as
cron'd workflow), this issue is more likely to be faced.
For example, on the execution that allowed me to find the cause, when
starting a matrix of 9 jobs, 4 of them were actually triggering this log
message:
```
2023-03-10 15:29:09.728 INFO [scale-up:a0502c68-f9b9-5d2d-afd6-70edb7486316 index.js:135249 isJobQueued] Job not queued
```
This log message, aside from being one amongst many other `INFO`
messages, does not provide many details about the consequences.
I suggest bumping the level to `WARN` and adding more details in the
message itself.
Related discussion:
https://github.com/philips-labs/terraform-aws-github-runner/discussions/2931
Copy file name to clipboardExpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -324,7 +324,7 @@ You can configure runners to be ephemeral, runners will be used only for one job
324
324
325
325
- The scale down lambda is still active, and should only remove orphan instances. But there is no strict check in place. So ensure you configure the `minimum_running_time_in_minutes` to a value that is high enough to got your runner booted and connected to avoid it got terminated before executing a job.
326
326
- The messages sent from the webhook lambda to scale-up lambda are by default delayed delayed by SQS, to give available runners to option to start the job before the decision is made to scale more runners. For ephemeral runners there is no need to wait. Set `delay_webhook_event` to `0`.
327
-
- All events on the queue will lead to a new runner crated by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example whn jobs got cancelled before a runner is created. We suggest to use this in combination with a pool.
327
+
- All events on the queue will lead to a new runner created by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example when jobs got cancelled before a runner is created or if the job was already picked up by another runner. We suggest to use this in combination with a pool.
328
328
- To ensure runners are created in the same order GitHub sends the events we use by default a FIFO queue, this is mainly relevant for repo level runners. For ephemeral runners you can set `enable_fifo_build_queue` to `false`.
329
329
- Error related to scaling should be retried via SQS. You can configure `job_queue_retention_in_seconds``redrive_build_queue` to tune the behavior. We have no mechanism to avoid events will never processed, which means potential no runner could be created and the job in GitHub can time out in 6 hours.
0 commit comments