Skip to content
This repository was archived by the owner on Jan 16, 2025. It is now read-only.

Commit 2f20a8b

Browse files
npalmforest-pr[bot]philips-labs-pr|bot
authored
fix!: Remove FIFO queues (#4072)
## Description Removes FIFO queues as described in #4068 ## Breaking The change will re-create queues in case FIFO is configured. Impact will be that queued messages are lost ## Test - [x] default example - [x] multi runner example --------- Co-authored-by: forest-pr|bot <forest-pr[bot]@users.noreply.github.com> Co-authored-by: philips-labs-pr|bot <philips-labs-pr[bot]@users.noreply.github.com>
1 parent a2280f7 commit 2f20a8b

File tree

22 files changed

+22
-101
lines changed

22 files changed

+22
-101
lines changed

README.md

-2
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,6 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
139139
| <a name="input_enable_cloudwatch_agent"></a> [enable\_cloudwatch\_agent](#input\_enable\_cloudwatch\_agent) | Enables the cloudwatch agent on the ec2 runner instances. The runner uses a default config that can be overridden via `cloudwatch_config`. | `bool` | `true` | no |
140140
| <a name="input_enable_ephemeral_runners"></a> [enable\_ephemeral\_runners](#input\_enable\_ephemeral\_runners) | Enable ephemeral runners, runners will only be used once. | `bool` | `false` | no |
141141
| <a name="input_enable_event_rule_binaries_syncer"></a> [enable\_event\_rule\_binaries\_syncer](#input\_enable\_event\_rule\_binaries\_syncer) | DEPRECATED: Replaced by `state_event_rule_binaries_syncer`. | `bool` | `null` | no |
142-
| <a name="input_enable_fifo_build_queue"></a> [enable\_fifo\_build\_queue](#input\_enable\_fifo\_build\_queue) | Enable a FIFO queue to keep the order of events received by the webhook. Recommended for repo level runners. | `bool` | `false` | no |
143142
| <a name="input_enable_jit_config"></a> [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no |
144143
| <a name="input_enable_job_queued_check"></a> [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no |
145144
| <a name="input_enable_managed_runner_security_group"></a> [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enables creation of the default managed security group. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no |
@@ -225,7 +224,6 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
225224
| <a name="input_runners_maximum_count"></a> [runners\_maximum\_count](#input\_runners\_maximum\_count) | The maximum number of runners that will be created. | `number` | `3` | no |
226225
| <a name="input_runners_scale_down_lambda_memory_size"></a> [runners\_scale\_down\_lambda\_memory\_size](#input\_runners\_scale\_down\_lambda\_memory\_size) | Memory size limit in MB for scale-down lambda. | `number` | `512` | no |
227226
| <a name="input_runners_scale_down_lambda_timeout"></a> [runners\_scale\_down\_lambda\_timeout](#input\_runners\_scale\_down\_lambda\_timeout) | Time out for the scale down lambda in seconds. | `number` | `60` | no |
228-
| <a name="input_runners_scale_up_Lambda_memory_size"></a> [runners\_scale\_up\_Lambda\_memory\_size](#input\_runners\_scale\_up\_Lambda\_memory\_size) | Memory size limit in MB for scale-up lambda. | `number` | `null` | no |
229227
| <a name="input_runners_scale_up_lambda_memory_size"></a> [runners\_scale\_up\_lambda\_memory\_size](#input\_runners\_scale\_up\_lambda\_memory\_size) | Memory size limit in MB for scale-up lambda. | `number` | `512` | no |
230228
| <a name="input_runners_scale_up_lambda_timeout"></a> [runners\_scale\_up\_lambda\_timeout](#input\_runners\_scale\_up\_lambda\_timeout) | Time out for the scale up lambda in seconds. | `number` | `30` | no |
231229
| <a name="input_runners_ssm_housekeeper"></a> [runners\_ssm\_housekeeper](#input\_runners\_ssm\_housekeeper) | Configuration for the SSM housekeeper lambda. This lambda deletes token / JIT config from SSM.<br/><br/> `schedule_expression`: is used to configure the schedule for the lambda.<br/> `enabled`: enable or disable the lambda trigger via the EventBridge.<br/> `lambda_memory_size`: lambda memery size limit.<br/> `lambda_timeout`: timeout for the lambda in seconds.<br/> `config`: configuration for the lambda function. Token path will be read by default from the module. | <pre>object({<br/> schedule_expression = optional(string, "rate(1 day)")<br/> enabled = optional(bool, true)<br/> lambda_memory_size = optional(number, 512)<br/> lambda_timeout = optional(number, 60)<br/> config = object({<br/> tokenPath = optional(string)<br/> minimumDaysOld = optional(number, 1)<br/> dryRun = optional(bool, false)<br/> })<br/> })</pre> | <pre>{<br/> "config": {}<br/>}</pre> | no |

docs/configuration.md

-1
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,6 @@ You can configure runners to be ephemeral, in which case runners will be used on
120120
- The scale down lambda is still active, and should only remove orphan instances. But there is no strict check in place. So ensure you configure the `minimum_running_time_in_minutes` to a value that is high enough to get your runner booted and connected to avoid it being terminated before executing a job.
121121
- The messages sent from the webhook lambda to the scale-up lambda are by default delayed by SQS, to give available runners a chance to start the job before the decision is made to scale more runners. For ephemeral runners there is no need to wait. Set `delay_webhook_event` to `0`.
122122
- All events in the queue will lead to a new runner created by the lambda. By setting `enable_job_queued_check` to `true` you can enforce a rule of only creating a runner if the event has a correlated queued job. Setting this can avoid creating useless runners. For example, a job getting cancelled before a runner was created or if the job was already picked up by another runner. We suggest using this in combination with a pool.
123-
- To ensure runners are created in the same order GitHub sends the events, by default we use a FIFO queue. This is mainly relevant for repo level runners. For ephemeral runners you can set `enable_fifo_build_queue` to `false`.
124123
- Errors related to scaling should be retried via SQS. You can configure `job_queue_retention_in_seconds` and `redrive_build_queue` to tune the behavior. We have no mechanism to avoid events never being processed, which means potentially no runner gets created and the job in GitHub times out in 6 hours.
125124

126125
The example for [ephemeral runners](examples/ephemeral.md) is based on the [default example](examples/default.md). Have look at the diff to see the major configuration differences.

examples/arm64/main.tf

-3
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,6 @@ module "runners" {
8080
delay_webhook_event = 5
8181
runners_maximum_count = 1
8282

83-
# set up a fifo queue to remain order
84-
enable_fifo_build_queue = true
85-
8683
# override scaling down
8784
scale_down_schedule_expression = "cron(* * * * ? *)"
8885
}

examples/default/main.tf

-3
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,6 @@ module "runners" {
8484
delay_webhook_event = 5
8585
runners_maximum_count = 2
8686

87-
# set up a fifo queue to remain order
88-
enable_fifo_build_queue = true
89-
9087
# override scaling down
9188
scale_down_schedule_expression = "cron(* * * * ? *)"
9289

examples/multi-runner/main.tf

-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ module "runners" {
4747
# labelMatchers = [["self-hosted", "linux", "x64", "amazon"]]
4848
# exactMatch = false
4949
# }
50-
# fifo = true
5150
# delay_webhook_event = 0
5251
# runner_config = {
5352
# runner_os = "linux"

lambdas/functions/control-plane/src/aws/sqs.test.ts

-17
Original file line numberDiff line numberDiff line change
@@ -27,23 +27,6 @@ describe('Publish message to SQS', () => {
2727
});
2828
});
2929

30-
it('should publish message to SQS Fifo queue', async () => {
31-
// setup
32-
mockSQSClient.on(SendMessageCommand).resolves({
33-
MessageId: '123',
34-
});
35-
36-
// act
37-
await publishMessage('test', 'https://sqs.eu-west-1.amazonaws.com/123456789/queued-builds.fifo');
38-
39-
// assert
40-
expect(mockSQSClient).toHaveReceivedCommandWith(SendMessageCommand, {
41-
QueueUrl: 'https://sqs.eu-west-1.amazonaws.com/123456789/queued-builds.fifo',
42-
MessageBody: 'test',
43-
MessageGroupId: '1', // Fifo queue
44-
});
45-
});
46-
4730
it('should log error if queue URL not found', async () => {
4831
// setup
4932
const logErrorSpy = jest.spyOn(logger, 'error');

lambdas/functions/control-plane/src/aws/sqs.ts

-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ export async function publishMessage(message: string, queueUrl: string, delayInS
1919
QueueUrl: queueUrl,
2020
MessageBody: message,
2121
DelaySeconds: delayInSeconds,
22-
MessageGroupId: queueUrl.endsWith('.fifo') ? '1' : undefined,
2322
});
2423

2524
try {

lambdas/functions/control-plane/src/local.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ const sqsEvent = {
2828
messageAttributes: {},
2929
md5OfBody: '4aef3bd70526e152e86426a0938cbec6',
3030
eventSource: 'aws:sqs',
31-
eventSourceARN: 'arn:aws:sqs:us-west-2:916370655143:cicddev-queued-builds.fifo',
31+
eventSourceARN: 'arn:aws:sqs:us-west-2:916370655143:cicddev-queued-builds',
3232
awsRegion: 'us-west-2',
3333
},
3434
],

lambdas/functions/webhook/src/ConfigLoader.test.ts

-5
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ describe('ConfigLoader Tests', () => {
2929
{
3030
id: '1',
3131
arn: 'arn:aws:sqs:us-east-1:123456789012:queue1',
32-
fifo: false,
3332
matcherConfig: {
3433
labelMatchers: [['label1', 'label2']],
3534
exactMatch: true,
@@ -100,7 +99,6 @@ describe('ConfigLoader Tests', () => {
10099
{
101100
id: '1',
102101
arn: 'arn:aws:sqs:us-east-1:123456789012:queue1',
103-
fifo: false,
104102
matcherConfig: {
105103
labelMatchers: [['label1', 'label2']],
106104
exactMatch: true,
@@ -131,7 +129,6 @@ describe('ConfigLoader Tests', () => {
131129
{
132130
id: '1',
133131
arn: 'arn:aws:sqs:us-east-1:123456789012:queue1',
134-
fifo: false,
135132
matcherConfig: {
136133
labelMatchers: [['label1', 'label2']],
137134
exactMatch: true,
@@ -211,7 +208,6 @@ describe('ConfigLoader Tests', () => {
211208
const matcherConfig: RunnerMatcherConfig[] = [
212209
{
213210
arn: 'arn:aws:sqs:eu-central-1:123456:npalm-default-queued-builds',
214-
fifo: true,
215211
id: 'https://sqs.eu-central-1.amazonaws.com/123456/npalm-default-queued-builds',
216212
matcherConfig: {
217213
exactMatch: true,
@@ -248,7 +244,6 @@ describe('ConfigLoader Tests', () => {
248244
const matcherConfig: RunnerMatcherConfig[] = [
249245
{
250246
arn: 'arn:aws:sqs:eu-central-1:123456:npalm-default-queued-builds',
251-
fifo: true,
252247
id: 'https://sqs.eu-central-1.amazonaws.com/123456/npalm-default-queued-builds',
253248
matcherConfig: {
254249
exactMatch: true,

lambdas/functions/webhook/src/runners/dispatch.test.ts

-2
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,6 @@ describe('Dispatcher', () => {
101101
eventType: 'workflow_job',
102102
installationId: 0,
103103
queueId: runnerConfig[0].id,
104-
queueFifo: false,
105104
repoOwnerType: 'Organization',
106105
});
107106
});
@@ -149,7 +148,6 @@ describe('Dispatcher', () => {
149148
eventType: 'workflow_job',
150149
installationId: 0,
151150
queueId: 'match',
152-
queueFifo: false,
153151
repoOwnerType: 'Organization',
154152
});
155153
});

lambdas/functions/webhook/src/runners/dispatch.ts

-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,6 @@ async function handleWorkflowJob(
4444
eventType: githubEvent,
4545
installationId: body.installation?.id ?? 0,
4646
queueId: queue.id,
47-
queueFifo: queue.fifo,
4847
repoOwnerType: body.repository.owner.type,
4948
});
5049
logger.info(`Successfully dispatched job for ${body.repository.full_name} to the queue ${queue.id}`);
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { SendMessageCommandInput } from '@aws-sdk/client-sqs';
2-
import { ActionRequestMessage, sendActionRequest } from '.';
2+
import { sendActionRequest } from '.';
33

44
const mockSQS = {
55
sendMessage: jest.fn(() => {
@@ -30,38 +30,16 @@ describe('Test sending message to SQS.', () => {
3030

3131
it('no fifo queue', async () => {
3232
// Arrange
33-
const no_fifo_message: ActionRequestMessage = {
34-
...message,
35-
queueFifo: false,
36-
};
3733
const sqsMessage: SendMessageCommandInput = {
3834
QueueUrl: queueUrl,
39-
MessageBody: JSON.stringify(no_fifo_message),
35+
MessageBody: JSON.stringify(message),
4036
};
4137

4238
// Act
43-
const result = sendActionRequest(no_fifo_message);
39+
const result = sendActionRequest(message);
4440

4541
// Assert
4642
expect(mockSQS.sendMessage).toHaveBeenCalledWith(sqsMessage);
4743
await expect(result).resolves.not.toThrow();
4844
});
49-
50-
it('use a fifo queue', async () => {
51-
// Arrange
52-
const fifo_message: ActionRequestMessage = {
53-
...message,
54-
queueFifo: true,
55-
};
56-
const sqsMessage: SendMessageCommandInput = {
57-
QueueUrl: queueUrl,
58-
MessageBody: JSON.stringify(fifo_message),
59-
};
60-
// Act
61-
const result = sendActionRequest(fifo_message);
62-
63-
// Assert
64-
expect(mockSQS.sendMessage).toHaveBeenCalledWith({ ...sqsMessage, MessageGroupId: String(message.id) });
65-
await expect(result).resolves.not.toThrow();
66-
});
6745
});

lambdas/functions/webhook/src/sqs/index.ts

-5
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ export interface ActionRequestMessage {
1111
repositoryOwner: string;
1212
installationId: number;
1313
queueId: string;
14-
queueFifo: boolean;
1514
repoOwnerType: string;
1615
}
1716

@@ -26,7 +25,6 @@ export interface RunnerMatcherConfig {
2625
matcherConfig: MatcherConfig;
2726
id: string;
2827
arn: string;
29-
fifo: boolean;
3028
}
3129

3230
export interface GithubWorkflowEvent {
@@ -42,9 +40,6 @@ export const sendActionRequest = async (message: ActionRequestMessage): Promise<
4240
};
4341

4442
logger.debug(`sending message to SQS: ${JSON.stringify(sqsMessage)}`);
45-
if (message.queueFifo) {
46-
sqsMessage.MessageGroupId = String(message.id);
47-
}
4843

4944
await sqs.sendMessage(sqsMessage);
5045
};

lambdas/functions/webhook/src/webhook/index.test.ts

-1
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,6 @@ function mockSSMResponse() {
290290
{
291291
id: '1',
292292
arn: 'arn:aws:sqs:us-east-1:123456789012:queue1',
293-
fifo: false,
294293
matcherConfig: {
295294
labelMatchers: [['label1', 'label2']],
296295
exactMatch: true,

main.tf

+6-10
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,11 @@ resource "aws_sqs_queue_policy" "build_queue_policy" {
5353
}
5454

5555
resource "aws_sqs_queue" "queued_builds" {
56-
name = "${var.prefix}-queued-builds${var.enable_fifo_build_queue ? ".fifo" : ""}"
57-
delay_seconds = var.delay_webhook_event
58-
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
59-
message_retention_seconds = var.job_queue_retention_in_seconds
60-
fifo_queue = var.enable_fifo_build_queue
61-
receive_wait_time_seconds = 0
62-
content_based_deduplication = var.enable_fifo_build_queue
56+
name = "${var.prefix}-queued-builds"
57+
delay_seconds = var.delay_webhook_event
58+
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
59+
message_retention_seconds = var.job_queue_retention_in_seconds
60+
receive_wait_time_seconds = 0
6361
redrive_policy = var.redrive_build_queue.enabled ? jsonencode({
6462
deadLetterTargetArn = aws_sqs_queue.queued_builds_dlq[0].arn,
6563
maxReceiveCount = var.redrive_build_queue.maxReceiveCount
@@ -80,12 +78,11 @@ resource "aws_sqs_queue_policy" "build_queue_dlq_policy" {
8078

8179
resource "aws_sqs_queue" "queued_builds_dlq" {
8280
count = var.redrive_build_queue.enabled ? 1 : 0
83-
name = "${var.prefix}-queued-builds_dead_letter${var.enable_fifo_build_queue ? ".fifo" : ""}"
81+
name = "${var.prefix}-queued-builds_dead_letter"
8482

8583
sqs_managed_sse_enabled = var.queue_encryption.sqs_managed_sse_enabled
8684
kms_master_key_id = var.queue_encryption.kms_master_key_id
8785
kms_data_key_reuse_period_seconds = var.queue_encryption.kms_data_key_reuse_period_seconds
88-
fifo_queue = var.enable_fifo_build_queue
8986
tags = var.tags
9087
}
9188

@@ -114,7 +111,6 @@ module "webhook" {
114111
(aws_sqs_queue.queued_builds.id) = {
115112
id : aws_sqs_queue.queued_builds.id
116113
arn : aws_sqs_queue.queued_builds.arn
117-
fifo : var.enable_fifo_build_queue
118114
matcherConfig : {
119115
labelMatchers : [local.runner_labels]
120116
exactMatch : var.enable_runner_workflow_job_labels_check_all

modules/multi-runner/README.md

+1-1
Large diffs are not rendered by default.

modules/multi-runner/notes.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
enable_workflow_job_events_queue

modules/multi-runner/queues.tf

+7-10
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,12 @@ data "aws_iam_policy_document" "deny_unsecure_transport" {
2727
}
2828

2929
resource "aws_sqs_queue" "queued_builds" {
30-
for_each = var.multi_runner_config
31-
name = "${var.prefix}-${each.key}-queued-builds${each.value.fifo ? ".fifo" : ""}"
32-
delay_seconds = each.value.runner_config.delay_webhook_event
33-
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
34-
message_retention_seconds = each.value.runner_config.job_queue_retention_in_seconds
35-
fifo_queue = each.value.fifo
36-
receive_wait_time_seconds = 0
37-
content_based_deduplication = each.value.fifo
30+
for_each = var.multi_runner_config
31+
name = "${var.prefix}-${each.key}-queued-builds"
32+
delay_seconds = each.value.runner_config.delay_webhook_event
33+
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
34+
message_retention_seconds = each.value.runner_config.job_queue_retention_in_seconds
35+
receive_wait_time_seconds = 0
3836
redrive_policy = each.value.redrive_build_queue.enabled ? jsonencode({
3937
deadLetterTargetArn = aws_sqs_queue.queued_builds_dlq[each.key].arn,
4038
maxReceiveCount = each.value.redrive_build_queue.maxReceiveCount
@@ -55,12 +53,11 @@ resource "aws_sqs_queue_policy" "build_queue_policy" {
5553

5654
resource "aws_sqs_queue" "queued_builds_dlq" {
5755
for_each = { for config, values in var.multi_runner_config : config => values if values.redrive_build_queue.enabled }
58-
name = "${var.prefix}-${each.key}-queued-builds_dead_letter${each.value.fifo ? ".fifo" : ""}"
56+
name = "${var.prefix}-${each.key}-queued-builds_dead_letter"
5957

6058
sqs_managed_sse_enabled = var.queue_encryption.sqs_managed_sse_enabled
6159
kms_master_key_id = var.queue_encryption.kms_master_key_id
6260
kms_data_key_reuse_period_seconds = var.queue_encryption.kms_data_key_reuse_period_seconds
63-
fifo_queue = each.value.fifo
6461
tags = var.tags
6562
}
6663

modules/multi-runner/variables.tf

-2
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,6 @@ variable "multi_runner_config" {
128128
exactMatch = optional(bool, false)
129129
priority = optional(number, 999)
130130
})
131-
fifo = optional(bool, false)
132131
redrive_build_queue = optional(object({
133132
enabled = bool
134133
maxReceiveCount = number
@@ -199,7 +198,6 @@ variable "multi_runner_config" {
199198
exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and `self-hosted`). When false if __any__ workflow label matches it will trigger the webhook."
200199
priority: "If set it defines the priority of the matcher, the matcher with the lowest priority will be evaluated first. Default is 999, allowed values 0-999."
201200
}
202-
fifo: "Enable a FIFO queue to remain the order of events received by the webhook. Suggest to set to true for repo level runners."
203201
redrive_build_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries."
204202
}
205203
EOT

0 commit comments

Comments
 (0)