title | excerpt | updated |
---|---|---|
Configuring Disaster Recovery with Metro |
Implementing Metro for a Disaster Recovery Plan |
2024-05-13 |
This guide introduces Metro Availability, which provides an automated disaster recovery plan.
Warning
OVHcloud provides services for which you are responsible, with regard to their configuration and management. It is therefore your responsibility to ensure that they work properly.
This guide is designed to assist you as much as possible with common tasks. Nevertheless, we recommend contacting a specialist provider if you experience any difficulties or doubts when it comes to managing, using or setting up a service on a server.
- Access to the OVHcloud Control Panel
- Access to your clusters via Prism Central
- You need to have 3 Nutanix clusters within the OVHcloud infrastructure with Pro or Ultimate packs if you have a Nutanix on OVHcloud packaged service on both clusters in the P.R.A. These 3 clusters will need to be at remote sites for maximum security.
- You must have less than 5 ms of latency between the two replicated clusters. Please note that latency is not covered by SLAs.
We will set up a two-way disaster recovery plan between two clusters with this hardware:
- A Nutanix cluster in Roubaix, France, with virtual machines replicated in Gravelines.
- A Nutanix cluster in Gravelines, France, with virtual machines replicated in Roubaix.
- A Nutanix cluster in Erith, England with Prism Central to serve as a witness in the disaster recovery plan.
We will only use one vRack, which will contain:
- The three Nutanix clusters.
- Load balancers.
- Additional IP addresses on the rtvRack.
Below is the diagram showing the three sites:
- Step 1 Configuration
- Step 1.1 Interconnection of the three clusters
- Step 1.2 Delete the Prism Central records for the Roubaix and Gravelines clusters
- Step 1.3 Register both clusters on Prism Central in Erith
- Step 1.4 Adding IP Addresses for iSCSI Connections on All Three Clusters
- Step 1.5 Creating two Storage Containers
- Step 1.6 Move virtual machines to the Storage Container
- Step 1.7 Creation of a category to be used when implementing the P.R.A.
- Step 1.8 Add virtual machines in categories
- Step 1.9 Setting up synchronous replications between Roubaix and Gravelines
- Step 1.10 Create Subnets for Disaster Recovery Plan
- Step 1.11 Implementation of disaster recovery plans
- Step 2 - Validate Disaster Recovery Plan
We will implement this disaster recovery plan step by step.
The cluster configuration information used in our guide is as follows:
-
Roubaix cluster:
- Server 1: VM address CVM
192.168.0.21
, hypervisor IP address AHV192.168.0.1
. - Server 2: VM address CVM
192.168.0.22
, hypervisor IP address AHV192.168.0.2
. - Server 3: VM address CVM
192.168.0.23
, hypervisor IP address AHV192.168.0.3
. - Prism Element virtual address:
192.168.0.100
. - Prism Element iSCSI address:
192.168.0.102
. - Prism Central IP address:
192.168.0.101
. - Gateway:
192.168.3.254
. - Mask:
255.255.252.0
- Cluster version:
6.5
- Server 1: VM address CVM
-
Gravelines cluster:
- Server 1: VM address CVM
192.168.1.21
, hypervisor IP address AHV192.168.1.1
. - Server 2: VM address CVM
192.168.1.22
, hypervisor IP address AHV192.168.1.2
. - Server 3: VM address CVM
192.168.1.23
, hypervisor IP address AHV192.168.1.3
. - Prism Element virtual address:
192.168.1.100
. - Prism Element iSCSI address:
192.168.1.102
. - Prism Central IP address:
192.168.1.101
. - Gateway:
192.168.3.254
. - Mask:
255.255.252.0
- Cluster version:
6.5
.
- Server 1: VM address CVM
-
Erith Cluster:
- Server 1: VM address CVM
192.168.2.21
, hypervisor IP address AHV192.168.2.1
. - Server 2: VM address CVM
192.168.2.22
, hypervisor IP address AHV192.168.2.2
. - Server 3: VM address CVM
192.168.2.23
, hypervisor IP address AHV192.168.2.3
. - Prism Element virtual address:
192.168.2.101
. - Prism Element iSCSI address:
192.168.2.102
. - Prism Central IP address:
192.168.2.100
. - Gateway:
192.168.3.254
. - Mask:
255.255.252.0
- Cluster version:
6.5
.
- Server 1: VM address CVM
In addition to this guide, you can use these documents:
The first step is to interconnect the three clusters on the same OVHcloud vRack.
Use this guide to connect your clusters: Interconnect clusters through the vRack. To connect the three clusters, use the instructions provided in the guide:
- Roubaix clusters in the vRack dedicated to Gravelines.
- Erith clusters in the vRack dedicated to Gravelines.
When you have finished configuring your vRack, you will have these elements in your vRack:
- 9 dedicated servers (3 per cluster)
- 3 public IP addresses
- 3 Load Balancers
The three clusters are currently accessible from the Prism Central URL of each cluster.
To implement a disaster recovery plan solution with Metro Availability, a cluster witness is required to automate tasks in the event of one of the clusters becoming unavailable. The cluster witness is located on a Prism Central virtual machine.
The Erith cluster will host the Prism Central virtual machine for the three clusters, and serve as a cluster witness for the disaster recovery plan between Roubaix and Gravelines.
Connect via SSH to the Prism Element cluster in Roubaix:
ssh nutanix@private_ip_address_prism_element_Roubaix
Enter Prism Element password
Run this command to remove Prism Element from the Prism Central configuration:
ncli multicluster remove-from-multicluster external-ip-address-or-svm-ips=private_ip_address_central_roubaix\
username=admin password=pwd_pe_Roubaix force=true
This message appears when disconnecting from Prism Central.
Cluster unregistration is currently in progress. This operation may take a while.
Enter this command:
ncli cluster info
Note the value of the Cluster UUID that must have this form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
Disconnect from Prism Element and connect via SSH on the Prism Central virtual machine in Roubaix.
ssh nutanix@private_ip_address_prism_central_roubaix
Enter Prism Central password
Enter this command:
python /home/nutanix/bin/unregistration_cleanup.py cluster_uuid_prism_element_Roubaix
Log in to the Prism Element cluster in Gravelines via SSH.
ssh nutanix@private_ip_address_prism_element_Gravelines
Enter Prism Element password
Enter this command:
ncli multicluster remove-from-multicluster external-ip-address-or-svm-ips=private_ip_address_prism_central_Gravelines\
username=admin password=pwd_pe_Gravelines force=true
This message appears when disconnecting from Prism Central.
Cluster unregistration is currently in progress. This operation may take a while.
Enter this command:
ncli cluster info
Note the value of Cluster UID that should be in this form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Disconnect from Prism Element and connect via SSH on the Prism Central virtual machine in Gravelines.
ssh nutanix@private_ip_address_prism_central_Gravelines
enter Prism Central password
python /home/nutanix/bin/unregistration_cleanup.py cluster_uuid_prism_element_Gravelines
Log in to the Prism Element in Roubaix via SSH:
ssh nutanix@private_ip_address_prism_element_Roubaix
enter Prism Element password
Run this command:
ncli multicluster register-to-prism-central username=admin password=passwod_admin\ external-ip-address-or-svm-ips=private_ip_address_prism_central_Erith
This message appears:
Cluster registration is currently in progress. This operation may take a while.
Wait and enter this command:
ncli multicluster get-cluster-state
If the cluster is connected to Prism Central in Erith, you will see this information:
Registered Cluster Count: 1
Cluster Id : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Cluster Name : Prism-Central-Erith-FQDN
Is Multicluster : true
Controller VM IP Addre... : [private_ip_address_prism_central_Erith]
External or Masqueradi... :
Cluster FQDN :
Controller VM NAT IP A... :
Marked for Removal : false
Remote Connection Exists : true
Log in to Prism Element in Gravelines via SSH:
ssh nutanix@adresse_ip_prism_element_Gravelines
Enter Prism Element password from Gravelines
Run this command:
ncli multicluster register-to-prism-central username=admin password=passwod_admin_Erith external-ip-address-or-svm-ips=private_ip_address_central_Erith
This message appears:
Cluster registration is currently in progress. This operation may take a while.
Wait and enter this command:
ncli multicluster get-cluster-state
If the cluster is connected to the Prism Central in Erith, you will see this information:
Registered Cluster Count: 1
Cluster Id : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Cluster Name : Prism-Central-Erith-FQDN
Is Multicluster : true
Controller VM IP Addre... : [private_ip_address_prism_central_Erith]
External or Masqueradi... :
Cluster FQDN :
Controller VM NAT IP A... :
Marked for Removal : false
Remote Connection Exists : true
From a web browser, log in to the URL from Prism-Central to Erith, you will see the three clusters.
The Prism Central virtual machines in Gravelines and Roubaix are no longer being used. You can stop them.
In the main menu, click Vms
{.action} in the Compute & Storage submenu.
Select the Prism Central virtual machines in Gravelines and Roubaix and click Guest Shutdown
{.action} from the Actions
{.action} menu.
From the Prism Central dashboard, click the link to the Erith cluster
{.action} .
On the Prism Element dashboard, click the cluster name
{.action} in the top left-hand corner.
Scroll down the window and check the IP address in ISCSI Data Services IP
.
From the Prism Central dashboard, click the link to the "Gravelines cluster".
On the Prism Element dashboard, click "the cluster name" in the top left-hand corner.
Scroll down the window and check the IP address in ISCSI Data Services IP
.
From the Prism Central dashboard, click on the link to the "Roubaix cluster".
On the Prism Element dashboard, click the "cluster name" in the top left-hand corner.
Scroll down the window and check the IP address in ISCSI Data Services IP
.
We will create two Storage Containers with the same name, one in Roubaix and the other in Gravelines.
From the Prism Element main menu, click Storage Containers
{.action} in the Compute & Storage
{.action} submenu.
Click Create Storage Container
{.action}.
Type UsedForDR
in Name, choose the Roubaix cluster
in Cluster, and click Create
{.action}.
Click Create Storage Container
{.action}.
Type UsedForDR
in Name, choose the Gravelines cluster in Cluster
, and click Create
{.action}.
In the list of Storage Containers
, you will see two Storage Containers with the same name. One on the Roubaix cluster and the other on the Gravelines cluster.
We will move the virtual machine storage to the Storage Container
we have created.
Connect via SSH on the Prism Element of the Roubaix cluster:
ssh nutanix@private_ip_address_Prism_element_Roubaix
Enter the Nutanix account password of Prism Element
Run this command for each VM we will move to the Storage Container
, replacing vmname with the name of the virtual machine (in our disaster recovery plan, we have two virtual machines in Roubaix, one on Windows and one on Linux).
acli vm.update_container vmname container=UsedForDR
Enter the Nutanix account password of Prism Element
Log in to the Prism Element of the Gravelines cluster via SSH:
ssh nutanix@private_ip_address_Prism_element_Gravelines
Enter the Nutanix account password of Prism Element
Execute this command for each VM that we will move to the Storage Container
, replacing vmname with the name of the virtual machine (in our disaster recovery plan, we have three virtual machines in Gravelines, one on Windows, another on Linux and the gateway that gives access to the Internet).
acli vm.update_container vmname container=UsedForDR
Enter the Nutanix account password of Prism Element
We will create a category with two values in Prism Central to assign the virtual machines involved in replication.
Scroll through the main menu, click Categories
{.action} on the Administration
{.action} submenu.
Click New Category
{.action}.
Type Protected VM
in Name, add the Roubaix
and Gravelines
values in Values and click on the next button Save
{.action}.
The category appears in the list and is ready to use.
We will assign two virtual machines on the Roubaix cluster in one category and three virtual machines on the Gravelines cluster in another category.
From the Prism Central main menu, click Vms
{.action} in the Compute & Storage
{.action} submenu.
Select the two virtual machines in Roubaix on the left, then on the Actions
{.action} menu, click Manage Categories
{.action}.
Add the category ProtectedVM: Roubaix
, then click Save
{.action}.
Select the three virtual machines
{.action} in Gravelines on the left, and on the Actions
{.action} menu, click Manage Categories
{.action}.
Add the category ProtectedVM: Gravelines
, then click Save
{.action}.
Synchronous replication allows permanent replication with 0 seconds of data loss.
On the Prism Central main menu, click Protection Policies
{.action} in the Data Protection
{.action} submenu.
Click Create Protection Policy
{.action}.
Type ROUBAIX-TO-GRAVELINES
in Policy name, keep Local AZ
, and click Select Cluster
{.action} in Primary Location.
Choose the Roubaix cluster and click Save
{.action}.
In the top left-hand corner next to Disaster Recovery, click Enable
{.action}.
The system checks that everything is correct before enabling Disaster Recovery.
Click Enable
{.action} to enable the Disaster Recovery option.
Click Enable
{.action} again.
Your Disaster Recovery option is being activated.
Keep Local AZ
, select the cluster in Recovery Location and click Save
{.action}.
Click + Add Schedule
{.action}.
Choose Synchronous
{.action} for Protection Type and Automatic
{.action} for Failure Detection Mode. Then click Save Schedule
{.action}.
Click Next
{.action}.
Select the category ProtectedVM : Roubaix
{.action} and click Add
{.action}.
Click Create
{.action}.
Virtual machines in Roubaix are now replicated to Gravelines. You must wait for a first full replication to have permanent replication.
Replication can be two-way. We will now create replication from Gravelines to Roubaix.
Click Create Protection Policy
{.action}.
Choose as name GRAVELINES-TO-ROUBAIX
in Policy Name, keep Local AZ
and choose Gravelines cluster in Primary Location. Then click Save
{.action}.
Keep Local AZ, select the Roubaix cluster and click Save
{.action}.
Click + Add Schedule
{.action}.
Choose Synchronous
{.action} for Protection Type and Automatic
{.action} for Failure Detection Mode. Then click Save Schedule
{.action}.
Click Next
{.action}.
Select the category ProtectedVM: Gravelines
{.action} and click Add
{.action}.
Click Create
{.action}.
A second protection strategy is in place.
We will create subnets that will be used to test disaster recovery plans.
For each existing subnet, a test network is required. On the two clusters of the Disaster Recovery Plan, we have three production subnets.
- based on VLAN 0.
- infrastructure on VLAN 1.
- production on VLAN 2.
We will therefore create 3 additional subnets on the Gravelines and Roubaix clusters with these names:
- testing on VLAN 100.
- testinfra on VLAN 101.
- production on VLAN 102.
Use this guide to create VLANs on your Nutanix clusters: isolate production management machines.
In the Prism Central Subnets
dashboard, you will see six new subnets.
Now that the replications and subnets are in place, we will implement automated or manual disaster recovery plans on demand to:
- migrate virtual machines on the fly between the two clusters
- Test that replication is working properly
- Automatically restart the VMs that are members of the P.R.A in the event of a failure of one of two clusters.
In the main menu of Prism Central, click Recovery Plans
{.action} in the Data Protection
{.action} submenu.
Click on Enable Disaster Recovery
{.action} on the left.
Normally, the recovery plan must be activated as indicated with the message Disaster Recovery enabled. Click on the right to close this window.
Click Create New Recovery Plan
{.action}.
Choose this information:
-
Recovery Plan Name:
Recovery VM from ROUBAIX to GRAVELINES
. -
Primary Location:
Local AZ
. -
Primary Cluster:
cluster in Roubaix
. -
Recovery Location:
Local AZ
. -
Recovery Cluster:
cluster in Gravelines
. -
Failure Execution Mode:
Automatic
. -
Execute failover after disconnectivity of:
30 seconds
.
Then click Next
{.action}.
Click + Add VM(s)
{.action}.
Select both virtual machines and click Add
{.action}.
Click Next
{.action}.
Click OK. Got it
{.action}.
Click Stretch networks
{.action}.
Click Proceed
{.action}.
Choose the VLANs that will be used during the IP like this:
- Primary
- Production :
production
- Test Failback :
testproduction
- Production :
- Recovery
- Production :
production
- Test Failback :
testproduction
- Production :
Then click Done
{.action}.
The Disaster Recovery Plan has been created for the Roubaix site. Click Create Recovery Plan
{.action} to create the Gravelines Disaster Recovery Plan.
Choose this information:
-
Recovery Plan Name:
Recovery VM from Gravelines to Roubaix
. -
Primary Location:
Local AZ
. -
Primary Cluster:
cluster in Gravelines
. -
Recovery Location:
Local AZ
. -
Recovery Cluster:
cluster in Roubaix
. -
Failure Execution Mode:
Automatic
. -
Execute failover after disconnectivity of:
30 seconds
.
Then click Next
{.action}.
Click + Add VM(s)
{.action}.
Select the three virtual machines and click Add
{.action}.
Click Next
{.action}.
Click Stretch networks
{.action}.
Click Proceed
{.action}.
Choose this information:
- Primary
- Production:
basis
- Failback test:
test
- Production:
- Recovery
- Production:
basis
- Failback test:
test
- Production:
Then click + Add Network Mapping
{.action}.
Choose this information:
- Primary
- Production :
infra
- Test Failback :
testinfra
- Production :
- Recovery
- Production :
infra
- Test Failback :
testinfra
- Production :
Then click + Add Network Mapping
{.action}.
Choose this information:
- Primary
- Production :
production
- Test Failback :
testproduction
- Production :
- Recovery
- Production :
production
- Test Failback :
testproduction
- Production :
Then click Done
{.action}.
[!primary] 3 networks have been added to this disaster recovery plan because the Gateway virtual machine uses these three networks.
Both disaster recovery plans are in production.
You can validate the disaster recovery plan via Prism Central.
Click on the Recovery VM from Roubaix
{.action} to validate and test.
Click Validate
{.action}
Select the Roubaix cluster for Entity Failing Over From and the Gravelines cluster for Entity Failing Over To. Then click Proceed
{.action}
The recovery plan has been validated. Click Close
{.action}
We can test the disaster recovery plan without impacting production. The test creates virtual machines with different names on the destination cluster in the VLANs created earlier.
Click Test
{.action}.
Select the Roubaix cluster for Entity Failing Over From and the Gravelines cluster for Entity Failing Over To. Then click Test
{.action}.
[!primary] Make sure you have the right licences if you have chosen the Nutanix on OVHcloud packaged service. You need to have signed up to the Pro or Ultimate packs for the Roubaix and Gravelines clusters..
Click Execute Anyway
{.action}.
Go to the VM dashboard in Prism Central and you will see the test virtual machines that are created with the replicated data.
Return to your recovery plan and click Clean-up test entities
{.action} to remove the test virtual machines.
Click Clean Up
{.action}
On a fully operational infrastructure, it is possible to move virtual machines from one cluster to another without any service downtime.
Go to a virtual machine in Roubaix that is part of the recovery plan. We will ping the OVHcloud DNS server 213.186.33.99.
Return to your recovery plan and click Failover
{.action} on the More
{.action} menu.
Choisissez Planned Failover
{.action}, cochez Live Migrate Entities
{.action}.
Take the Roubaix cluster for Entity Failing Over From and the Gravelines cluster for Entity Failing Over To.
Then click Failover
{.action}.
Type Failover
and click Failover
{.action}.
Hot migration is in progress.
The migration was completed successfully without any service downtime.
You can go back to the virtual machine and see that the ping continues to work even if the virtual machine has been moved from one cluster to another.
After a migration, it is necessary to reverse the replication and operation of the disaster recovery plan.
On the Prism Central main menu, click Protections Policies
{.action} in the Data Protection
{.action} submenu.
Click on the protection plan named ROUBAIX TO GRAVELINES
{.action}.
Click Update
{.action}.
Position the mouse below the Roubaix cluster name in Primary Location and click Edit
{.action}.
Check the Gravelines
{.action} cluster instead of the Roubaix cluster.
Click Save
{.action}.
Click Update Location
{.action}.
Position the mouse below the Gravelines cluster name in Recovery Location and click Edit
{.action}.
Select the Roubaix
{.action} cluster instead of the Gravelines cluster.
Click Save
{.action}.
Click Update Location
{.action}.
Click Next
{.action}.
Click Update
{.action}.
Replication is reversed, click the button to close the protection plan.
In the main menu of Prism Central, click Recovery Plans
{.action} in the Data Protection
{.action} submenu.
Click Recovery VM from Roubaix to Gravelines
{.action}.
On the More
{.action} menu, click Update
{.action}.
In locations, put the Gravelines cluster in Primary CLusters and the Roubaix cluster in Recovery Clusters and then click Next
{.action}.
Click Proceed
{.action}.
Click Next
{.action}.
Choose this information:
- Primary
- Production :
production
- Test Failback :
testproduction
- Production :
- Recovery
- Production :
production
- Test Failback :
testproduction
- Production :
Click Done
{.action}.
[!primary] Replication and recovery plans were reversed following a migration of virtual machines from Roubaix to Gravelines.
To return to the original state, you need to perform a hot migration again and reverse replication and the disaster recovery plan. You can use this part of the guide if your disaster recovery plan is triggered because a cluster is unavailable.
We will simulate a total loss of connection to Gravelines where three virtual machines are located in the disaster recovery plan (the Internet gateway and two other virtual machines).
Log in to the command line and ping the public address of the gateway.
## Ping from a remote linux console
ping xx.xx.xx.xx
Reply from xx.xx.xx.xx: bytes=32 time=21ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=21ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=23ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Leave the ping command to run continuously and return to Prism Central.
In the main menu, click VMs
{.action} in the Compute & Storage
{.action} submenu.
The three virtual machines in the disaster recovery plan are functional.
All three nodes in the Gravelines cluster will be disconnected.
[!primary] The disconnection is done by deleting the 3 nodes of the Gravelines cluster from the vRack.
Return to the console that is pinging to the gateway, and you will see a connection loss.
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=21ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Request timed out.
Request timed out.
Request timed out.
Request timed out.
In Prism Central, click the top right on the tasks
{.action} to display the task launch, including Recovery plan execute.
Warning
In the event of an incident on an entire cluster (there are not enough nodes to function, or a network outage), the virtual machines that are part of the IP and that are on this cluster will be started on the other cluster. The RPO (Recovery Point Objective) is 0 seconds, which means that no data loss will be reported.
However, it will take a while for the virtual machines to reboot on the other cluster. In this guide, 3 virtual machines are restarted on the remote cluster. It will take you 4 minutes to start the virtual machines. This time can be measured by regularly running tests on disaster recovery plans.
Go back to the text console and you will see that the ping works again.
Request timed out.
Reply from xx.xx.xx.xx: bytes=32 time=20ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=19ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=18ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=18ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=19ms TTL=58
Reply from xx.xx.xx.xx: bytes=32 time=19ms TTL=58
Go to Prism Central in the virtual machine management, you will see the three virtual machines of the recovery plan in duplicate. They are marked as started, but in reality, only the ones restarted in Roubaix are working.
We will reconnect the three nodes in the vRack to return to normal mode.
After the recovery, the virtual machines on the original cluster are still visible but are turned off. You can delete or keep them if problems occur on the VMs that are being rebooted.
You can view the history of Disaster Recovery actions in Prism Central.
Click the button in the top right-hand corner to go to the Prism Central configuration.
Left-click Witness
{.action} and click View Usage History
{.action}.
The list of events appears, click Close
{.action} to close.
Interconnect clusters through the vRack
Disaster Recovery Plan for Nutanix
Asynchronous or NearSync replication through Prism Element
Advanced replication with Leap
Documentation Nutanix AHV Metro - Witness Option
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for assisting you on your specific use case of your project.
Join our community of users on https://community.ovh.com/en/.