Skip to content

Commit a2f013f

Browse files
AppliNHnpalmgithub-actions[bot]
authored
feat: opt-out ssm parameters for github app (#4335)
Hi! 👋 Throughout my recent experiences, we had strong requirements to **not** have secret values in the state. To handle that, I suggest to make optional the creation of SSM parameters as part of the regular flow so it doesn't leak secrets in Terraform state. We could alternatively use the latest [`ephemeral`](https://developer.hashicorp.com/terraform/language/resources/ephemeral) feature, but I'm not sure everyone is using Terraform 1.10+ atm since it's quite recent. I also think relying on ephemerals should be part of a new breaking release as it won't be compatible with Terraform < 1.10. Let me know what you think of this! --------- Co-authored-by: Niek Palm <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Niek Palm <[email protected]>
1 parent 4750ae1 commit a2f013f

29 files changed

+561
-48
lines changed

Diff for: .github/workflows/terraform.yml

+1
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ jobs:
139139
"ephemeral",
140140
"termination-watcher",
141141
"multi-runner",
142+
"external-managed-ssm-secrets"
142143
]
143144
defaults:
144145
run:

Diff for: README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ Join our discord community via [this invite link](https://discord.gg/bxgXW8jJGh)
141141
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling.<br/><br/> `enable`: Enable the EventBridge feature.<br/> `accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. | <pre>object({<br/> enable = optional(bool, true)<br/> accept_events = optional(list(string), null)<br/> })</pre> | `{}` | no |
142142
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
143143
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB. However if you are using Github Enterprise Cloud with data-residency (ghe.com), set the endpoint here. Example - https://companyname.ghe.com | `string` | `null` | no |
144-
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br/> key_base64 = string<br/> id = string<br/> webhook_secret = string<br/> })</pre> | n/a | yes |
144+
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. <br/> You can optionally create the SSM parameters yourself and provide the ARN and name here, through the `*_ssm` attributes.<br/> If you chose to provide the configuration values directly here, <br/> please ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`).<br/> Note: the provided SSM parameters arn and name have a precedence over the actual value (i.e `key_base64_ssm` has a precedence over `key_base64` etc). | <pre>object({<br/> key_base64 = optional(string)<br/> key_base64_ssm = optional(object({<br/> arn = string<br/> name = string<br/> }))<br/> id = optional(string)<br/> id_ssm = optional(object({<br/> arn = string<br/> name = string<br/> }))<br/> webhook_secret = optional(string)<br/> webhook_secret_ssm = optional(object({<br/> arn = string<br/> name = string<br/> }))<br/> })</pre> | n/a | yes |
145145
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br/> cron = string<br/> timeZone = string<br/> idleCount = number<br/> evictionStrategy = optional(string, "oldest_first")<br/> }))</pre> | `[]` | no |
146146
| <a name="input_instance_allocation_strategy"></a> [instance\_allocation\_strategy](#input\_instance\_allocation\_strategy) | The allocation strategy for spot instances. AWS recommends using `price-capacity-optimized` however the AWS default is `lowest-price`. | `string` | `"lowest-price"` | no |
147147
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |

Diff for: docs/configuration.md

+37-13
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ To be able to support a number of use-cases, the module has quite a lot of confi
1010
- Linux vs Windows. You can configure the OS types linux and win. Linux will be used by default.
1111
- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners only work in combination with the workflow job event. For ephemeral runners the lambda requests a JIT (just in time) configuration via the GitHub API to register the runner. [JIT configuration](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-just-in-time-runners) is limited to ephemeral runners (and currently not supported by GHES). For non-ephemeral runners, a registration token is always requested. In both cases the configuration is made available to the instance via the same SSM parameter. To disable JIT configuration for ephemeral runners set `enable_jit_config` to `false`. We also suggest using a pre-build AMI to improve the start time of jobs for ephemeral runners.
1212
- Job retry (**Beta**). By default the scale-up lambda will discard the message when it is handled. Meaning in the ephemeral use-case an instance is created. The created runner will ask GitHub for a job, no guarantee it will run the job for which it was scaling. Result could be that with small system hick-up the job is keeping waiting for a runner. Enable a pool (org runners) is one option to avoid this problem. Another option is to enable the job retry function. Which will retry the job after a delay for a configured number of times.
13-
- GitHub Cloud vs GitHub Enterprise Server (GHES). The runners support GitHub Cloud (Public GitHub - github.com), GitHub Data Residency instances (ghe.com), and GitHub Enterprise Server. For GHES, we rely on our community for support and testing. We have no capability to test GHES ourselves.
13+
- GitHub Cloud vs GitHub Enterprise Server (GHES). The runners support GitHub Cloud (Public GitHub - github.com), GitHub Data Residency instances (ghe.com), and GitHub Enterprise Server. For GHES, we rely on our community for support and testing. We have no capability to test GHES ourselves.
1414
- Spot vs on-demand. The runners use either the EC2 spot or on-demand life cycle. Runners will be created via the AWS [CreateFleet API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html). The module (scale up lambda) will request via the CreateFleet API to create instances in one of the subnets and of the specified instance types.
1515
- ARM64 support via Graviton/Graviton2 instance-types. When using the default example or top-level module, specifying `instance_types` that match a Graviton/Graviton 2 (ARM64) architecture (e.g. a1, t4g or any 6th-gen `g` or `gd` type), you must also specify `runner_architecture = "arm64"` and the sub-modules will be automatically configured to provision with ARM64 AMIs and leverage GitHub's ARM64 action runner. See below for more details.
1616
- Disable default labels for the runners (os, architecture and `self-hosted`) can achieve by setting `runner_disable_default_labels` = true. If enabled, the runner will only have the extra labels provided in `runner_extra_labels`. In case you on own start script is used, this configuration parameter needs to be parsed via SSM.
@@ -24,17 +24,44 @@ The module uses the AWS System Manager Parameter Store to store configuration fo
2424
| `ssm_paths.root/var.prefix?/app/` | App secrets used by Lambda's |
2525
| `ssm_paths.root/var.prefix?/runners/config/<name>` | Configuration parameters used by runner start script |
2626
| `ssm_paths.root/var.prefix?/runners/tokens/<ec2-instance-id>` | Either JIT configuration (ephemeral runners) or registration tokens (non ephemeral runners) generated by the control plane (scale-up lambda), and consumed by the start script on the runner to activate / register the runner. |
27-
| `ssm_paths.root/var.prefix?/webhook/runner-matcher-config` | Runner matcher config used by webhook to decide the target for the webhook event. |
27+
| `ssm_paths.root/var.prefix?/webhook/runner-matcher-config` | Runner matcher config used by webhook to decide the target for the webhook event. |
2828

2929
Available configuration parameters:
3030

31-
| Parameter name | Description |
32-
|-------------------------------------|---------------------------------------------------------------------------------------------------|
33-
| `agent_mode` | Indicates if the agent is running in ephemeral mode or not. |
34-
| `disable_default_labels` | Indicates if the default labels for the runners (os, architecture and `self-hosted`) are disabled |
35-
| `enable_cloudwatch` | Configuration for the cloudwatch agent to stream logging. |
36-
| `run_as` | The user used for running the GitHub action runner agent. |
37-
| `token_path` | The path where tokens are stored. |
31+
| Parameter name | Description |
32+
| ------------------------ | ------------------------------------------------------------------------------------------------- |
33+
| `agent_mode` | Indicates if the agent is running in ephemeral mode or not. |
34+
| `disable_default_labels` | Indicates if the default labels for the runners (os, architecture and `self-hosted`) are disabled |
35+
| `enable_cloudwatch` | Configuration for the cloudwatch agent to stream logging. |
36+
| `run_as` | The user used for running the GitHub action runner agent. |
37+
| `token_path` | The path where tokens are stored. |
38+
39+
### Note regarding GitHub App secrets provisioning in SSM
40+
41+
SSM parameters for GitHub App secrets (`webhook_secret`, `key_base64`, `id`) can also be manually created at the SSM path of your choice.
42+
43+
If you opt for this approach, please fill the `*_ssm` attributes of the `github_app` variable as following:
44+
45+
```
46+
github_app = {
47+
key_base64_ssm = {
48+
name = "/your/path/to/ssm/parameter/key-base-64"
49+
arn = "arn:aws:ssm:::parameter/your/path/to/ssm/parameter/key-base-64"
50+
}
51+
id_ssm = {
52+
name = "/your/path/to/ssm/parameter/id"
53+
arn = "arn:aws:ssm:::parameter/your/path/to/ssm/parameter/id"
54+
}
55+
webhook_secret_ssm = {
56+
name = "/your/path/to/ssm/parameter/webhook-secret"
57+
arn = "arn:aws:ssm:::parameter/your/path/to/ssm/parameter/webhook-secret"
58+
}
59+
}
60+
```
61+
62+
Manually creating the SSM parameters that hold the configuration of your GitHub App avoids leaking critical plain text values in your terraform state and version control system. This is a recommended security practice for handling sensitive credentials.
63+
64+
You can read more [over here](../examples/external-managed-ssm-secrets/README.md).
3865

3966
## Encryption
4067

@@ -124,7 +151,6 @@ You can configure runners to be ephemeral, in which case runners will be used on
124151

125152
The example for [ephemeral runners](examples/ephemeral.md) is based on the [default example](examples/default.md). Have look at the diff to see the major configuration differences.
126153

127-
128154
## Job retry (**Beta**)
129155

130156
You can enable the job retry function to retry a job after a delay for a configured number of times. The function is disabled by default. To enable the function set `job_retry.enable` to `true`. The function will check the job status after a delay, and when the is still queued, it will create a new runner. The new runner is created in the same way as the others via the scale-up function. Hence the same configuration applies.
@@ -133,7 +159,6 @@ For checking the job status a API call is made to GitHub. Which can exhaust the
133159

134160
The option `job_retry.delay_in_seconds` is the delay before the job status is checked. The delay is increased by the factor `job_retry.delay_backoff` for each attempt. The upper bound for a delay is 900 seconds, which is the max message delay on SQS. The maximum number of attempts is configured via `job_retry.max_attempts`. The delay should be set to a higher value than the time it takes to start a runner.
135161

136-
137162
## Prebuilt Images
138163

139164
This module also allows you to run agents from a prebuilt AMI to gain faster startup times. The module provides several examples to build your own custom AMI. To remove old images, an [AMI housekeeper module](modules/public/ami-housekeeper.md) can be used. See the [AMI examples](ami-examples/index.md) for more details.
@@ -231,7 +256,7 @@ The watcher is listening for spot termination warnings and create a log message
231256
### Termination handler
232257

233258
!!! warning
234-
This feature will only work once the CloudTrail is enabled.
259+
This feature will only work once the CloudTrail is enabled.
235260

236261
The termination handler is listening for spot terminations by capture the `BidEvictedEvent` via CloudTrail. The handler will log and optionally create a metric for each termination. The intend is to enhance the logic to inform the user about the termination via the GitHub Job or Workflow run. The feature is disabled by default. The feature is enabled once the watcher is enabled, the feature can be disabled explicit by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`.
237262

@@ -332,5 +357,4 @@ resource "aws_iam_role_policy" "event_rule_firehose_role" {
332357
}
333358
```
334359

335-
336360
NOTE: By default, a runner AMI update requires a re-apply of this terraform config (the runner AMI ID is looked up by a terraform data source). To avoid this, you can use `ami_id_ssm_parameter_name` to have the scale-up lambda dynamically lookup the runner AMI ID from an SSM parameter at instance launch time. Said SSM parameter is managed outside of this module (e.g. by a runner AMI build workflow).

Diff for: docs/examples/external-managed-ssm-secrets.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
--8<-- "examples/external-managed-ssm-secrets/README.md"

Diff for: docs/examples/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ Examples are located in the [examples](https://github.com/github-aws-runners/ter
99
- _[Prebuilt Images](prebuilt.md)_: Example usages of deploying runners with a custom prebuilt image.
1010
- _[Windows](windows.md)_: Example usage of creating a runner using Windows as the OS.
1111
- _[Termination watcher](termination-watcher.md)_: Example usages of termination watcher.
12+
- _[Externally managed SSM secrets](external-managed-ssm-secrets.md)_: Example usage of externally managed SSM secrets for the GitHub App credentials.

Diff for: examples/default/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This module shows how to create GitHub action runners. Lambda release will be do
66

77
Steps for the full setup, such as creating a GitHub app can be found in the root module's [README](https://github.com/github-aws-runners/terraform-aws-github-runner). First download the Lambda releases from GitHub. Alternatively you can build the lambdas locally with Node or Docker, there is a simple build script in `<root>/.ci/build.sh`. In the `main.tf` you can simply remove the location of the lambda zip files, the default location will work in this case.
88

9-
> The default example assumes local built lambda's available. Ensure you have built the lambda's. Alternativly you can downlowd the lambda's. The version needs to be set to a GitHub release version, see https://github.com/github-aws-runners/terraform-aws-github-runner/releases
9+
> The default example assumes local built lambda's available. Ensure you have built the lambda's. Alternatively you can download the lambda's. The version needs to be set to a GitHub release version, see https://github.com/github-aws-runners/terraform-aws-github-runner/releases
1010
1111
```bash
1212
cd ../lambdas-download

0 commit comments

Comments
 (0)