title | excerpt | updated |
---|---|---|
Replacing a defective disk |
Find out how to identify a defective disk, and request a replacement |
2018-06-21 |
If you notice that a disk is faulty, or receive a notification email about a faulty disk, you must take the measures required to replace it as soon as possible.
This guide explains how to identify a defective disk, and how to request a disk replacement from our teams.
Warning
OVHcloud provides services for which you are responsible with regard to their configuration and management. It is therefore your responsibility to ensure that they function correctly.
This guide is designed to help you with common tasks. Nevertheless, we recommend contacting a specialist service provider or reaching out to the OVHcloud community if you encounter any difficulties. You can find more information in the Go further section of this guide.
- A dedicated server
- Administrative access (sudo) to the server via SSH
Before you do anything else, you will need to back up your data. The sole purpose of RAID, apart from RAID 0, is to protect your data against disks that become faulty. Once a disk becomes unusable, all of your data is reliant on the remaining disk (or disks) working properly.
Although it’s rare to have two disks become faulty at the same time, it’s not impossible.
We will not carry out any disk replacements without:
- Confirmation from you that you have backed up your data.
- Confirmation that you accept full knowledge of the risk of data loss as a result of disk replacement.
If you receive an email alert, or notice any signs that you might have a faulty disk, it is essential to check that all your disks are working properly. If two disks that make up part of the same RAID array seem to be faulty, we will replace the one that flags the highest number of errors as a priority.
If you have a server that uses soft RAID, please refer to the software RAID guide to find the disks installed on your server.
Once you have found the access path for your disks, you can test them using the smartctl
command, as follows:
smartctl -a /dev/sdX
[!primary]
Please remember to replace
/dev/sdX
with the access path to your disk, with sdX being the disk concerned, i.e. sdA, sdB, etc.
By running this command, you can also retrieve the Serial Number of the disks that need to be replaced, so that you can give them to the technician.
Here is an example of a result that may be returned:
smartctl -a /dev/sda
>>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.32-xxxx-grs-ipv6-64] (local build)
>>> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>>> === START OF INFORMATION SECTION ===
>>> Device Model: TOSHIBA DT01ACA050
>>> Serial Number: 5329T58NS
>>> LU WWN Device Id: 5 000039 ff6d28993
>>> Firmware Version: MS1OA750
>>> User Capacity: 500 107 862 016 bytes [500 GB]
>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>> Device is: Not in smartctl database [for details use: -P showall]
>>> ATA Version is: 8
>>> ATA Standard is: ATA-8-ACS revision 4
>>> Local Time is: Thu Nov 24 15:51:25 2016 CET
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
In this case, the line to look out for is as follows:
Serial Number: 5329T58N
If you have a server that uses hard RAID, please refer to the hardware RAID guide, and use the appropriate procedure for your RAID controller type to find the access paths to your disks.
Once you have found the access path for your disks, you can test them using the smartctl
command, as follows:
smartctl -d megaraid,N -a /dev/sdX
[!primary]
Please remember to replace
/dev/sdX
with the access path to your disk, with sdX being the disk concerned, i.e. sdA, sdB, etc.
Warning
In some cases, the command may return the following message: /dev/sda [megaraid_disk_00][SAT]: Device open changed type from 'megaraid' to 'sat'
.
In this case, you will need to replace megaraid
with sat+megaraid
as follows: smartctl -d sat+megaraid,N -a /dev/sdX
.
For LSI RAID cards, you can test the disks using the smartctl
command, as follows:
smartctl -a /dev/sgY
You will need to specify the RAID number (/dev/sg0 = 1er RAID, /dev/sg1 = 2e RAID, etc.).
If you have an NVMe disk, you will need to put the server into rescue mode and install the nvme-cli
tool.
apt install nvme-cli
You will then need to use the nvme list
command, and retrieve your disks’ serial numbers:
root@rescue:~# nvme list
>>> Node SN Model Namespace Usage Format FW Rev
>>> -------------- ------------------- --------------------- --------- ------------------------- ------------- --------
>>> /dev/nvme0n1 CVPF636600YC450RGN INTEL SSDPE2MX450G7 1 450.10 GB / 450.10 GB 512 B + 0 B MDV10253
>>> /dev/nvme1n1 CVPF6333002Y450RGN INTEL SSDPE2MX450G7 1 450.10 GB / 450.10 GB 512 B + 0 B MDV10253
To request a disk replacement, you simply need to create a ticket through your OVHcloud Help Center. You can speed up the process by providing the information required for the tests. Below is a list of what you will need to provide:
- The serial number of the disk that needs to be replaced, as well as the serial numbers for all other disks that are working properly. To retrieve the serial number of the disk that needs to be replaced, please follow this guide. If, for any reason, you are unable to retrieve the disk’s serial number, please let us know in the ticket, and list the serial numbers of the disks that don’t need to be replaced.
As a reminder, it’s important to include the serial numbers of all the disks. They will be sent to the datacentre technician, and this will avoid any mistakes being made as the replacement operation is carried out.
-
The intervention date and time. Please note that there will be a short service interruption, but you can schedule the intervention to take place anytime, 24/7.
-
Confirmation that your data is backed up, and confirmation that you accept the potential risk of your data being lost.
[!primary]
This replacement type is only possible for dedicated servers with a RAID card.
If you are hot-swapping a disk on a server with a megaRAID card, please make the LED light flash for the disk that needs to be replaced, once the intervention has been scheduled. This will make the process easier for the teams who are working on the replacement operation.
If your server uses a megaRAID card, please use the following commands:
- To make the LED light flash:
MegaCli -PdLocate -start -physdrv[E0:S0] -a0
- To stop the LED light from flashing:
MegaCli -PdLocate -stop -physdrv[E0:S0] -a0
[!primary]
Equivalent via the
storcli
command:
- To make the LED light flash:
storcli /c0/e0/s0 start locate
- To stop the LED light from flashing:
storcli /c0/e0/s0 stop locate
[!primary]
Even though you’re making the disk’s LED light flash, please remember to include the disk’s serial number and slot in your support ticket.
If you have a server that uses hard RAID, then the RAID will rebuild itself. Please note that auto-rebuild is enabled by default. For it to work, please ensure that you have not disabled it. The resync process will take a few minutes, and may decrease your RAID’s read/write performance.
If you have a server that uses soft RAID, we recommend that you resync your disks manually. To do this, you can refer to our software RAID guide.
For specialized services (SEO, development, etc.), contact OVHcloud partners.
If you would like assistance using and configuring your OVHcloud solutions, please refer to our support offers.
Join our community of users.