Skip to content

Add backup/restore plans #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 121 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from 100 commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
564cafe
Re-add backup/restore plans
timidri Feb 17, 2023
9e29589
first WIP version of test plan and workflow
timidri Apr 12, 2023
05c5009
remove buildevents refs
timidri Apr 12, 2023
42ee72f
copy inventory file for teardown
timidri Apr 14, 2023
af03f37
fix typo
timidri Apr 16, 2023
d134538
don't start ssh session during provisioning
timidri Apr 16, 2023
b39cec0
cleanup even if cancelled
timidri Apr 16, 2023
a0414da
wait for pause even if previous step failed
timidri Apr 16, 2023
791a2db
try to wait before tearing down
timidri Apr 16, 2023
ca063d6
add continue on error
timidri Apr 17, 2023
34311d4
another attempt
timidri Apr 17, 2023
2db76d5
removed always()
timidri Apr 19, 2023
eb86c03
don't wait in the ssh loop if previous job was cancelled
timidri Apr 19, 2023
293c5a4
continue on error
timidri Apr 19, 2023
fc5d215
now looking at correct pause file
timidri Apr 19, 2023
d3bcb83
refactor into 2 jobs
timidri Apr 24, 2023
1a9deb8
fix check wait; fix indentation
timidri Apr 24, 2023
8d50129
fix conditions
timidri Apr 24, 2023
7214920
backup test WIP
timidri Apr 28, 2023
8292db3
Merge main
timidri Nov 8, 2023
01b23db
allow for ca backup
timidri Nov 10, 2023
0d4d485
change implementation to ruby
timidri Nov 21, 2023
9c764db
formatting
timidri Nov 21, 2023
15ad2df
typo and path fixes, use cert auth for pdb export
timidri Nov 21, 2023
b12008a
update REFERENCE.md
timidri Nov 21, 2023
c7151ee
use cert auth for pdb import
timidri Nov 21, 2023
dfbf1f8
regenerate REFERENCE.md using the rake task
timidri Nov 21, 2023
b368a7a
Change params to support recovery use case
timidri Nov 24, 2023
c0d73d8
restore config from backup if primary degrated
timidri Nov 24, 2023
3f19b4c
fix quoting error
timidri Nov 24, 2023
208f809
check if backup or restore run on a non-peadm cluster
timidri Nov 24, 2023
88ab524
Add backup_type and restore_type params
timidri Nov 28, 2023
a331103
Adding test for backup restore
ragingra Nov 28, 2023
182e6f1
Adding test for backup restore
ragingra Nov 28, 2023
4f8a0a4
Merge branch 'SOLARCH-1160' of github.com:puppetlabs/puppetlabs-peadm…
timidri Nov 28, 2023
b4f87f9
Fix issue in test_restore.pp
timidri Nov 28, 2023
713a9ea
fix merge conflict
timidri Nov 28, 2023
4ad111d
Changed backup to use ip instead of hostname
ragingra Nov 28, 2023
f7764de
Merge branch 'SOLARCH-1160' of github.com:puppetlabs/puppetlabs-peadm…
ragingra Nov 28, 2023
4879fa8
Use $targets for running tasks and commands
timidri Nov 28, 2023
da2d273
Moving pause below yq
ragingra Nov 29, 2023
fdba96b
Moving .uri outside select
ragingra Nov 29, 2023
c74410c
Added inventory file to commands and moving backup file lookup inside…
ragingra Nov 29, 2023
7050b83
Fixing backup file location
ragingra Nov 29, 2023
cedb4c1
Fix ls command result
timidri Nov 29, 2023
7fee7b3
strip leading spaces from tar filename
timidri Nov 30, 2023
4e7163f
untar before using peadm config file
timidri Nov 30, 2023
dff1c23
Adding host entries to /etc/hosts
timidri Nov 30, 2023
32513a4
fix hosts file format
timidri Nov 30, 2023
ce15596
Break by removing classifier database files
timidri Nov 30, 2023
55131f1
add plan to edit inventory file
timidri Dec 1, 2023
710c2d8
merge recovery backup and restore
timidri Dec 1, 2023
bf98786
Modify inventory to include name
timidri Dec 1, 2023
3b7cedd
stop breaking classifier db
timidri Dec 1, 2023
7355630
add PE reinstall step
timidri Dec 2, 2023
a7efdc7
reinstate remove classifier db
timidri Dec 2, 2023
0a54f83
Fix calling the reinstall task
timidri Dec 2, 2023
694561f
Add continue on error for when reinstall exits with non-zero code
timidri Dec 2, 2023
074a69d
some textual fixes
timidri Dec 3, 2023
dc0c376
update docs
timidri Dec 4, 2023
7a99564
add in-code docs
timidri Dec 4, 2023
5badbb7
Merge branch 'main' into SOLARCH-1160
timidri Dec 4, 2023
11673d6
Adding PR review trigger
ragingra Dec 5, 2023
cc0bdc1
Add TODOs
timidri Dec 6, 2023
c5163b7
Edit backup_restore (first pass)
J-Hunniford Dec 7, 2023
ce347a0
Edit to be explicit about primary and db
J-Hunniford Dec 7, 2023
3435e41
another quick edit to clarify primary and db
J-Hunniford Dec 7, 2023
a79db7c
Another quick edit for clarity
J-Hunniford Dec 7, 2023
e81ab57
revert previous commit
J-Hunniford Dec 7, 2023
7d05ec2
Merge pull request #411 from puppetlabs/J-Hunniford-patch-1
timidri Dec 7, 2023
2af7768
Manually copying Jason's suggestion
timidri Dec 7, 2023
89fa277
Update backup_restore.md
J-Hunniford Dec 11, 2023
8cda117
Merge pull request #414 from puppetlabs/J-Hunniford-patch-1
ragingra Dec 12, 2023
ca990e3
Added todos
timidri Jan 8, 2024
7d36a7c
Merge branch 'SOLARCH-1160' of github.com:puppetlabs/puppetlabs-peadm…
timidri Jan 8, 2024
a925ad1
added recovery-db restore type
timidri Jan 10, 2024
5dca665
Updating workflow for db recovery
ragingra Jan 11, 2024
a01eb43
Add init_db_server and refactor
timidri Jan 29, 2024
782fd1d
fix error handling
timidri Jan 29, 2024
7b080df
document changes needed
timidri Feb 13, 2024
b0de776
Updating db recovery step
ragingra Feb 19, 2024
f9a8751
Updating db restore
ragingra Mar 6, 2024
621e3d5
revert back to running puppet installer on the db server
timidri Mar 8, 2024
32f6c5f
* add node_manager to bolt-project
timidri Mar 12, 2024
4aedd18
split recovering primary and database
timidri Mar 19, 2024
718e800
remove init_db_server call from the restore plan
timidri Mar 19, 2024
18a3590
Adding puppetrun smoke tests
ragingra Mar 24, 2024
7041b8e
Moving resinstall and init_db
ragingra Mar 25, 2024
47efb18
Changing smoke test to run on all hosts
ragingra Apr 4, 2024
35232d3
rewrite spec test
timidri Apr 4, 2024
1a29af5
Merge branch 'SOLARCH-1160' of github.com:puppetlabs/puppetlabs-peadm…
timidri Apr 4, 2024
da2662b
restore spec WIP
timidri Apr 4, 2024
e198fa4
removing obsolete files
timidri Apr 4, 2024
eec3fb7
add restore spec
timidri Apr 5, 2024
6daff4f
add peadm_config fixtures to test broken cluster restore
timidri Apr 5, 2024
8b41b24
Merge main
timidri Apr 5, 2024
8357651
Add require 'spec_helper' to sanitize_pg_pe_conf_spec.rb
timidri Apr 5, 2024
1dd6d77
Update documentation
timidri Apr 5, 2024
e49d6b6
fix rubocop checks
timidri Apr 6, 2024
57f60b7
Adding couple more spec cases
ragingra Apr 8, 2024
5c1c4af
Update documentation/backup_restore.md
timidri Apr 8, 2024
bffd086
Update documentation/backup_restore.md
timidri Apr 8, 2024
74b6e99
Update documentation/backup_restore.md
timidri Apr 8, 2024
e6458f5
Update documentation/backup_restore.md
timidri Apr 8, 2024
b50d3fe
Update documentation/backup_restore.md
timidri Apr 8, 2024
fa0d3eb
Update documentation/backup_restore.md
timidri Apr 8, 2024
83568f0
Update documentation/backup_restore.md
timidri Apr 8, 2024
ffd6f28
Update documentation/backup_restore.md
timidri Apr 8, 2024
aecb6a5
Update documentation/backup_restore.md
timidri Apr 8, 2024
334dfe5
Some suggested updates to backup-restore docs.
timidri Apr 8, 2024
10f903d
Updating backup options table
ragingra Apr 10, 2024
1c5eede
update language
timidri Apr 16, 2024
da3e07c
add info on hostname equality
timidri May 15, 2024
e29233a
Updating output message to remote quote, updating docs to include pe_…
ragingra May 21, 2024
d17a2db
Merge branch 'main' into SOLARCH-1160
ragingra May 30, 2024
bffe3f8
Fixing rubocop issues
ragingra May 30, 2024
697c1b4
Fixing expected text in spec test
ragingra May 30, 2024
e71d9e1
Removing todo comments
ragingra Jun 3, 2024
376c65e
[ITHELP-87329] Update test-backup-restore.yaml (#447)
binford2k Jun 25, 2024
9c1a20d
Update backup_restore.md (#432)
J-Hunniford Jun 26, 2024
7606aa5
Merge branch 'main' into SOLARCH-1160
ragingra Jun 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 241 additions & 0 deletions .github/workflows/test-backup-restore-migration.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
---
name: "Backup and restore test"

on:
workflow_dispatch:
inputs:
image:
description: "GCP image for test cluster"
required: true
default: "almalinux-cloud/almalinux-8"
architecture:
description: "PE architecture to test"
required: true
default: "standard"
version:
description: "PE version to install"
required: true
default: "2021.7.4"
ssh-debugging:
description: "Boolean; whether or not to pause for ssh debugging"
required: true
default: "false"

jobs:
backup:
name: "Backup: Cluster A: PE ${{ inputs.version }} ${{ inputs.architecture }} on ${{ inputs.image }}"
runs-on: ubuntu-20.04
env:
BOLT_GEM: true
BOLT_DISABLE_ANALYTICS: true
LANG: "en_US.UTF-8"

steps:
- name: "Start SSH session"
if: ${{ github.event.inputs.ssh-debugging == 'true' }}
uses: luchihoratiu/debug-via-ssh@main
with:
NGROK_AUTH_TOKEN: ${{ secrets.NGROK_AUTH_TOKEN }}
SSH_PASS: ${{ secrets.SSH_PASS }}

- name: "Checkout Source"
uses: actions/checkout@v2

- name: "Activate Ruby 2.7"
uses: ruby/setup-ruby@v1
with:
ruby-version: "2.7"
bundler-cache: true

- name: "Print bundle environment"
if: ${{ github.repository_owner == 'puppetlabs' }}
run: |
echo ::group::info:bundler
bundle env
echo ::endgroup::

- name: "Provision test cluster"
timeout-minutes: 15
run: |
echo ::group::prepare
mkdir -p $HOME/.ssh
echo 'Host *' > $HOME/.ssh/config
echo ' ServerAliveInterval 150' >> $HOME/.ssh/config
echo ' ServerAliveCountMax 2' >> $HOME/.ssh/config
bundle exec rake spec_prep
echo ::endgroup::

echo ::group::provision
bundle exec bolt plan run peadm_spec::provision_test_cluster \
--modulepath spec/fixtures/modules \
provider=provision_service \
image=${{ inputs.image }} \
architecture=${{ inputs.architecture }}
echo ::endgroup::

echo ::group::info:request
cat request.json || true; echo
echo ::endgroup::

echo ::group::info:inventory
sed -e 's/password: .*/password: "[redacted]"/' < spec/fixtures/litmus_inventory.yaml || true
echo ::endgroup::

# - name: Save inventory file A to an artifact
# uses: actions/upload-artifact@v3
# with:
# name: inventory_A
# path: spec/fixtures/litmus_inventory.yaml

- name: "Install PE on test cluster"
timeout-minutes: 120
run: |
bundle exec bolt plan run peadm_spec::install_test_cluster \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules \
architecture=${{ inputs.architecture }} \
version=${{ inputs.version }}

- name: "Start SSH session"
if: github.event.inputs.ssh-debugging == 'true'
uses: luchihoratiu/debug-via-ssh@main
with:
NGROK_AUTH_TOKEN: ${{ secrets.NGROK_AUTH_TOKEN }}
SSH_PASS: ${{ secrets.SSH_PASS }}

# - name: Download artifacts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was that committed if it isn't used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have been moved to another branch until migration was picked up, but wasn't doing any harm for the time being.

# # if: always()
# uses: actions/download-artifact@v3
# with:
# path: spec/fixtures/

- name: perform PE backup of cluster A
timeout-minutes: 10
continue-on-error: true
run: |
echo ::group::prepare
mkdir -p $HOME/.ssh
echo 'Host *' > $HOME/.ssh/config
echo ' ServerAliveInterval 150' >> $HOME/.ssh/config
echo ' ServerAliveCountMax 2' >> $HOME/.ssh/config
bundle exec rake spec_prep
echo ::endgroup::

echo ::group::backup
bundle exec bolt plan run peadm_spec::test_backup \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules
echo ::endgroup::

- name: "Wait as long as the file ${HOME}/pause file is present"
continue-on-error: true
# if: ${{ always() && github.event.inputs.ssh-debugging == 'true' }}
if: github.event.inputs.ssh-debugging == 'true'
run: |
while [ -f "${HOME}/pause" ] ; do
echo "${HOME}/pause present, sleeping for 60 seconds..."
sleep 10
done
echo "${HOME}/pause absent, continuing workflow."

- name: "Tear down cluster A"
if: always()
run: |
if [ -f spec/fixtures/litmus_inventory.yaml ]; then
echo ::group::tear_down
bundle exec rake 'litmus:tear_down'
echo ::endgroup::

echo ::group::info:request
cat request.json || true; echo
echo ::endgroup::
fi

restore:
name: "Restore: Cluster B: PE ${{ inputs.version }} ${{ inputs.architecture }} on ${{ inputs.image }}"
runs-on: ubuntu-20.04
env:
BOLT_GEM: true
BOLT_DISABLE_ANALYTICS: true
LANG: "en_US.UTF-8"

steps:
- name: "Checkout Source"
uses: actions/checkout@v2

- name: "Activate Ruby 2.7"
uses: ruby/setup-ruby@v1
with:
ruby-version: "2.7"
bundler-cache: true

- name: "Print bundle environment"
if: ${{ github.repository_owner == 'puppetlabs' }}
run: |
echo ::group::info:bundler
bundle env
echo ::endgroup::

- name: "Provision test cluster"
timeout-minutes: 15
run: |
echo ::group::prepare
mkdir -p $HOME/.ssh
echo 'Host *' > $HOME/.ssh/config
echo ' ServerAliveInterval 150' >> $HOME/.ssh/config
echo ' ServerAliveCountMax 2' >> $HOME/.ssh/config
bundle exec rake spec_prep
echo ::endgroup::

echo ::group::provision
bundle exec bolt plan run peadm_spec::provision_test_cluster \
--modulepath spec/fixtures/modules \
provider=provision_service \
image=${{ inputs.image }} \
architecture=${{ inputs.architecture }}
echo ::endgroup::

echo ::group::info:request
cat request.json || true; echo
echo ::endgroup::

echo ::group::info:inventory
sed -e 's/password: .*/password: "[redacted]"/' < spec/fixtures/litmus_inventory.yaml || true
echo ::endgroup::

# - name: Save inventory file B to an artifact
# uses: actions/upload-artifact@v3
# with:
# name: inventory_B
# path: spec/fixtures/litmus_inventory.yaml

- name: "Install PE on test cluster"
timeout-minutes: 120
run: |
bundle exec bolt plan run peadm_spec::install_test_cluster \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules \
architecture=${{ inputs.architecture }} \
version=${{ inputs.version }}

- name: Wait for backup to finish
uses: lewagon/[email protected]
with:
ref: ${{ github.ref }}
check-name: "Backup: Cluster A: PE ${{ inputs.version }} ${{ inputs.architecture }} on ${{ inputs.image }}"
repo-token: ${{ secrets.GITHUB_TOKEN }}
wait-interval: 10

- name: "Tear down cluster B"
if: always()
run: |
cp spec/fixtures/inventory_B/litmus_inventory.yaml spec/fixtures/litmus_inventory.yaml || true
if [ -f spec/fixtures/litmus_inventory.yaml ]; then
echo ::group::tear_down
bundle exec rake 'litmus:tear_down'
echo ::endgroup::

echo ::group::info:request
cat request.json || true; echo
echo ::endgroup::
fi
Loading
Loading