Skip to content
This repository was archived by the owner on Jan 16, 2025. It is now read-only.

Commit 6a8e1f0

Browse files
jkruse14npalm
andauthored
feat(runners): add retry logic to default install and start script for dnf operations (#3787)
## Background --- This is a continuation of work done in #3748 There seems to be a race condition some where with the user-data script and the EC2 starting up and locking RPM. This issue is seen elsewhere, not necessarily with this repo's user-data, see the following: [Amazon Linux 2023 - issue with installing packages with cloud-init](https://repost.aws/questions/QU_tj7NQl6ReKoG53zzEqYOw/amazon-linux-2023-issue-with-installing-packages-with-cloud-init) [dnf/yum both fails while being executed on instance bootstrap on Amazon Linux 2023](https://repost.aws/questions/QUgNz4VGCFSC2TYekM-6GiDQ/dnf-yum-both-fails-while-being-executed-on-instance-bootstrap-on-amazon-linux-2023) Also, https://github.com/philips-labs/terraform-aws-github-runner/issues/3741 ## Changes Made --- Added a loop to retry if the rpm lock file is found which sleeps for 5 seconds then retries again with a total of 5 iterations. This logic is now added to the `user-data.sh` for the `upgrade-minimal` operation and installation of docker, cloudwatch-agent, and curl. ## Testing Done --- In progress. I'd like to open this up to review while testing. Co-authored-by: Niek Palm <[email protected]>
1 parent 8b843ad commit 6a8e1f0

File tree

1 file changed

+35
-4
lines changed

1 file changed

+35
-4
lines changed

modules/runners/templates/user-data.sh

+35-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
11
#!/bin/bash -e
22

3+
install_with_retry() {
4+
max_attempts=5
5+
attempt_count=0
6+
success=false
7+
while [ $success = false ] && [ $attempt_count -le $max_attempts ]; do
8+
echo "Attempting $attempt_count/$max_attempts: Installing $*"
9+
dnf install -y $*
10+
if [ $? -eq 0 ]; then
11+
success=true
12+
else
13+
echo "Failed to install $1 - retrying"
14+
attempt_count=$(( attempt_count + 1 ))
15+
sleep 5
16+
fi
17+
done
18+
}
19+
320
exec > >(tee /var/log/user-data.log | logger -t user-data -s 2>/dev/console) 2>&1
421

522
# AWS suggest to create a log for debug purpose based on https://aws.amazon.com/premiumsupport/knowledge-center/ec2-linux-log-user-data/
@@ -15,15 +32,29 @@ set -x
1532

1633
${pre_install}
1734

18-
dnf upgrade-minimal -y
35+
max_attempts=5
36+
attempt_count=0
37+
success=false
38+
while [ $success = false ] && [ $attempt_count -le $max_attempts ]; do
39+
echo "Attempting $attempt_count/$max_attempts: upgrade-minimal"
40+
dnf upgrade-minimal -y
41+
if [ $? -eq 0 ]; then
42+
success=true
43+
else
44+
echo "Failed to run `dnf upgrad-minimal -y` - retrying"
45+
attempt_count=$(( attempt_count + 1 ))
46+
sleep 5
47+
fi
48+
done
1949

2050
# Install docker
21-
dnf install -y docker
51+
install_with_retry docker
52+
2253
service docker start
2354
usermod -a -G docker ec2-user
2455

25-
dnf install -y amazon-cloudwatch-agent jq git
26-
dnf install -y --allowerasing curl
56+
install_with_retry amazon-cloudwatch-agent jq git
57+
install_with_retry --allowerasing curl
2758

2859
user_name=ec2-user
2960

0 commit comments

Comments
 (0)