Skip to content

Updates documentation #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 39 additions & 3 deletions documentation/automated_recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,29 @@

These instructions provide automated procedures for recovering from select failures of PE components which are managed by PEADM.

Additional manual procedures are documented in [recovery.md](recovery.md)
Manual procedures are documented in [recovery.md](recovery.md)

## Recover from failed primary Puppet server

1. Promote the replica ([official docs](https://puppet.com/docs/pe/2019.8/dr_configure.html#dr-promote-replica))
2. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server)

## Replace missing or failed replica Puppet server

This procedure uses the following placeholder references.

* _\<primary-server-fqdn\>_ - The FQDN and certname of the primary Puppet server
* _\<replica-postgres-server-fqdn\>_ - The FQDN and certname of the PE-PostgreSQL server which resides in the same availability group as the replacement replica Puppet server
* _\<replacement-replica-fqdn\>_ - The FQDN and certname of the replacement replica Puppet server

1. Run `peadm::add_replica` plan to deploy replacement replica Puppet server
1. For Standard and Large deployments

bolt plan run peadm::add_replica primary_host=<primary-server-fqdn> replica_host=<replacement-replica-fqdn>

2. For Extra Large deployments

bolt plan run peadm::add_replica primary_host=<primary-server-fqdn> replica_host=<replacement-replica-fqdn> replica_postgresql_host=<replica-postgres-server-fqdn>

## Replace failed PE-PostgreSQL server (A or B side)

Expand All @@ -22,7 +44,7 @@ Procedure:

2. Temporarily set both primary and replica server nodes so that they use the remaining healthy PE-PostgreSQL server

bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> primary_postgresql_host=<working-postgres-server-fqdn> override=true
bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> postgresql_host=<working-postgres-server-fqdn> override=true

3. Restart `pe-puppetdb.service` on Puppet server primary and replica

Expand All @@ -34,4 +56,18 @@ Procedure:

5. Run `peadm::add_database` plan to deploy replacement PE-PostgreSQL server

bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>
bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>

## Replace failed replica puppet server AND failed replica pe-postgresql server

This procedure uses the following placeholder references.

* _\<primary-server-fqdn\>_ - The FQDN and certname of the primary Puppet server
* _\<failed-replica-fqdn\>_ - The FQDN and certname of the failed replica Puppet server

1. Ensure the old replica server is forgotten.

bolt command run "/opt/puppetlabs/bin/puppet infrastructure forget <failed-replica-fqdn>" --targets <primary-server-fqdn>

2. [Replace failed PE-PostgreSQL server (A or B side)](#replace-failed-pe-postgresql-server-a-or-b-side)
3. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server)
16 changes: 14 additions & 2 deletions plans/add_replica.pp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@
$replica_postgresql_target,
]))

# Get current peadm config to ensure we forget active replicas
$peadm_config = run_task('peadm::get_peadm_config', $primary_target).first.value

# Make list of all possible replicas, configured and provided
$replicas = peadm::flatten_compact([
$replica_host,
$peadm_config['params']['replica_host']
]).unique

$certdata = run_task('peadm::cert_data', $primary_target).first.value
$primary_avail_group_letter = $certdata['extensions'][peadm::oid('peadm_availability_group')]
$replica_avail_group_letter = $primary_avail_group_letter ? { 'A' => 'B', 'B' => 'A' }
Expand All @@ -40,7 +49,9 @@
$dns_alt_names = [$replica_target.peadm::certname()] + (pick($certdata['dns-alt-names'], []) - $certdata['certname'])

# This has the effect of revoking the node's certificate, if it exists
run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica_target.peadm::certname()}", $primary_target, _catch_errors => true)
$replicas.each |$replica| {
run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica}", $primary_target, _catch_errors => true)
}

run_plan('peadm::subplans::component_install', $replica_target,
primary_host => $primary_target,
Expand Down Expand Up @@ -76,7 +87,8 @@
server_a_host => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef },
server_b_host => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef },
internal_compiler_a_pool_address => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef },
internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef }
internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef },
peadm_config => $peadm_config
)

# Source the global hiera.yaml from Primary and synchronize to new Replica
Expand Down