These instructions provide automated procedures for recovering from select failures of PE components which are managed by PEADM.
Additional manual procedures are documented in recovery.md
The procedure for replacing a failed PE-PostgreSQL server is the same regardless of which PE-PostgreSQL server is missing or if the name of the PE-PostgrSQL server is the same or different. This procedure uses the following placeholder references.
- <replacement-postgres-server-fqdn> - The FQDN and certname of the new server being brought in to replace the failed PE-PostgreSQL server
- <working-postgres-server-fqdn> - The FQDN and certname of the still-working PE-PostgreSQL server
- <failed-postgres-server-fqdn> - The FQDN and certname of the failed PE-PostgreSQL server
- <primary-server-fqdn> - The FQDN and certname of the primary Puppet server
- <replica-server-fqdn> - The FQDN and certname of the replica Puppet server
Procedure:
-
Stop
puppet.service
on Puppet server primary and replicabolt task run service name=puppet.service action=stop --targets <primary-server-fqdn>,<replica-server-fqdn>
-
Temporarily set both primary and replica server nodes so that they use the remaining healthy PE-PostgreSQL server
bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> primary_postgresql_host=<working-postgres-server-fqdn>
-
Restart
pe-puppetdb.service
on Puppet server primary and replicabolt task run service name=pe-puppetdb.service action=restart --targets <primary-server-fqdn>,<replica-server-fqdn>
-
Purge failed PE-PostgreSQL node from PuppetDB
bolt command run "/opt/puppetlabs/bin/puppet node purge <failed-postgres-server-fqdn>" --targets <primary-server-fqdn>
-
Run
peadm::add_database
plan to deploy replacement PE-PostgreSQL serverbolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>