fix(runners): increase the log level to WARN when using the enable_job_queued_check parameter (#3046)

Fgerthoffert · web-flow · commit 1de73bf14c9c · 2023-03-17T10:05:43.000+01:00
The readme section detailing the `enable_job_queued_check` makes it an appealing function, but it was unclear to me that it could cause some issues when used with ephemeral runners and matrix workflows. It is particularly challenging to notice for organizations with a very large number of repositories, at times with low activity the issue is not present, but when matrix jobs are started or when multiple repositories are triggering jobs around the same time window (such as cron'd workflow), this issue is more likely to be faced. For example, on the execution that allowed me to find the cause, when starting a matrix of 9 jobs, 4 of them were actually triggering this log message: ``` 2023-03-10 15:29:09.728 INFO [scale-up:a0502c68-f9b9-5d2d-afd6-70edb7486316 index.js:135249 isJobQueued] Job not queued ``` This log message, aside from being one amongst many other `INFO` messages, does not provide many details about the consequences. I suggest bumping the level to `WARN` and adding more details in the message itself. Related discussion: https://github.com/philips-labs/terraform-aws-github-runner/discussions/2931
diff --git a/README.md b/README.md
@@ -324,7 +324,7 @@ You can configure runners to be ephemeral, runners will be used only for one job
 
 - The scale down lambda is still active, and should only remove orphan instances. But there is no strict check in place. So ensure you configure the `minimum_running_time_in_minutes` to a value that is high enough to got your runner booted and connected to avoid it got terminated before executing a job.
 - The messages sent from the webhook lambda to scale-up lambda are by default delayed delayed by SQS, to give available runners to option to start the job before the decision is made to scale more runners. For ephemeral runners there is no need to wait. Set `delay_webhook_event` to `0`.
-- All events on the queue will lead to a new runner crated by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example whn jobs got cancelled before a runner is created. We suggest to use this in combination with a pool.
+- All events on the queue will lead to a new runner created by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example when jobs got cancelled before a runner is created or if the job was already picked up by another runner. We suggest to use this in combination with a pool.
 - To ensure runners are created in the same order GitHub sends the events we use by default a FIFO queue, this is mainly relevant for repo level runners. For ephemeral runners you can set `enable_fifo_build_queue` to `false`.
 - Error related to scaling should be retried via SQS. You can configure `job_queue_retention_in_seconds` `redrive_build_queue` to tune the behavior. We have no mechanism to avoid events will never processed, which means potential no runner could be created and the job in GitHub can time out in 6 hours.
 
diff --git a/modules/runners/lambdas/runners/src/scale-runners/scale-up.ts b/modules/runners/lambdas/runners/src/scale-runners/scale-up.ts
@@ -110,7 +110,7 @@ async function isJobQueued(githubInstallationClient: Octokit, payload: ActionReq
     throw Error(`Event ${payload.eventType} is not supported`);
   }
   if (!isQueued) {
-    logger.info(`Job not queued`, LogFields.print());
+    logger.warn(`Job not queued in GitHub. A new runner instance will NOT be created for this job.`, LogFields.print());
   }
   return isQueued;
 }

Original file line number	Diff line number	Diff line change
`@@ -110,7 +110,7 @@ async function isJobQueued(githubInstallationClient: Octokit, payload: ActionReq`
`110`	`110`	throw Error(`Event ${payload.eventType} is not supported`);
`111`	`111`	`}`
`112`	`112`	`if (!isQueued) {`
`113`		- logger.info(`Job not queued`, LogFields.print());
	`113`	+ logger.warn(`Job not queued in GitHub. A new runner instance will NOT be created for this job.`, LogFields.print());
`114`	`114`	`}`
`115`	`115`	`return isQueued;`
`116`	`116`	`}`