Skip to content

Latest commit

 

History

History
434 lines (315 loc) · 14.2 KB

File metadata and controls

434 lines (315 loc) · 14.2 KB
title excerpt updated
Managing hardware RAID (EN)
Find out how to verify the state of your hardware RAID and the health of your hard drives
2025-03-19

Objective

On a server with a hardware RAID configuration, the RAID array is managed by a physical component called a RAID controller.

Requirements

  • a dedicated server{.external} with a hardware RAID configuration
  • administrative (sudo) access to the server via SSH

Warning

It is not advisable to reconfigure your RAID controller using MegaCli and lsiutil if you're unfamiliar with these tools, as you could risk losing your data. Please make a backup before making any changes.

Instructions

Using the MegaRaid RAID controller

Step 1: Retrieve RAID information

Prior to verifying your RAID state, verify that you have a MegaRaid controller:

lspci | grep -i lsi | grep -i megaraid
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

This confirms the server has a MegaRaid controller installed.

To gather and list available RAID arrays, you can use the MegaCli command:

MegaCli -LDInfo -Lall -aALL (Or : storcli /c0 /vall show)
Adapter 0 - Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 36.321 GB
Sector Size : 512
Mirror Data : 36.321 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: No

Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 2.727 TB
Sector Size : 512
Mirror Data : 2.727 TB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only

Exit Code: 0x00

We can see two virtual drives which are composed of two physical hard drives each, so a total of four physical disks. In this case, the RAID status is "Optimal", which means the RAID is functioning correctly.

If the RAID status is "Degraded", we recommend that you verify the hard drive's state as well.

Step 2: Determine the disk's state

First, you must list the device Id for each drive in order to fully test them with smartmontools:

MegaCli -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g' (Or : storcli /c0 /eall /sall show)
 
Slot Number: 0
Device Id: 4
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL3450062J300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 1
Device Id: 5
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL345003X6300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 2
Device Id: 7
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8K2PKDYHGST HUS724030ALA640                    MF8OAA70
 
Slot Number: 3
Device Id: 6
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8JYP59YHGST HUS724030ALA640                    MF8OAA70

With smartmontools' smartctl command, we will test each hard drive like this:

smartctl -d megaraid,N -a /dev/sdX

In this example, /dev/sda is the first RAID, and /dev/sdb is the second.

[!primary]

In some situations, you may receive this output:

/dev/sda [megaraid_disk_00] [SAT]: Device open changed type from 'megaraid' to 'sat'

You must then replace megaraid with sat+megaraid:

smartctl -d sat+megaraid,N -a /dev/sdX

Warning

If one of your hard drives is showing SMART errors, you should perform a full backup of your data as soon as possible and contact our support team. Our support team will need the slot number and device ID in order to identify the faulty disk.

Step 3: Verify the health of the RAID controller

To make sure, your RAID controller is working correctly, you can list all information with

MegaCli -AdpAllInfo -aALL

The most important section of the output is the error counter:

Error Counters
                ================
Memory Correctable Errors   : 0
Memory Uncorrectable Errors : 0

If the counted errors are more than zero, you should create a backup of your data and contact the support with the full output. Then, the support will schedule an intervention for the replacement of the RAID controller.

For a succinct output of only the error counters, the command can be expanded by a grep:

MegaCli -AdpAllInfo -aALL | grep "Errors"
Memory Correctable Errors   : 0
Memory Uncorrectable Errors : 0

Step 4: Resynchronising the RAID

If you had one or more hard drives replaced, the RAID will re-synchronise automatically. You can use this command to see which hard drives are currently rebuilding:

MegaCli -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g' (Or : storcli /c0 /eall /sall show)
 
Slot Number: 0
Device Id: 4
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL3450062J300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 1
Device Id: 5
Raw Size: 279.460 GB [0x22eec130 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: BTWL345003X6300PGN  INTEL SSDSC2BB300G4                     D2010355
 
Slot Number: 2
Device Id: 7
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data:       PN2234P8K2PKDYHGST HUS724030ALA640                    MF8OAA70
 
Slot Number: 3
Device Id: 6
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Firmware state: Rebuild
Inquiry Data:       PN2234P8JYP59YHGST HUS724030ALA640                    MF8OAA70 

To monitor the progress of the rebuild operation, you can use this command:

MegaCli -PDRbld -ShowProg -PhysDrv [EncID:SlotID] -aALL (Or : storcli /c0/eEncID/sSlotID show rebuild)

The command will retrieve the enclosure ID and slot ID, as shown above.

Step 5a: Using CacheCade

[!primary]

CacheCade is a module from LSI used to improve random read performance of hard drives using an SSD as front caching device.

To verify the CacheCade's configuration, use the following command:

MegaCli -CfgCacheCadeDsply -a0 (Or : storcli /c0 /dall show cachecade)

To see which RAID array is associated with the CacheCade:

MegaCli -CfgCacheCadeDsply -a0 | grep "Associated LDs"

Step 5b: Checking the status of the backup battery unit

to receive a full list of status parameters for the BBU, use this command:

MegaCli -AdpBbuCmd -aALL

the most important value to check is if Battery State is Optimal. If there are indicators of a failing battery, create a backup of your data and provide the outpout of this command to the support, when creating the Ticket.

Using the LSI RAID controller

Warning

This RAID controller card is deprecated and no longer available for new servers. It is gradually replaced by MegaRaid controllers.

Step 1: Retrieve RAID information

Prior to verifying the RAID state, ensure that an LSI RAID controller card is installed with the following command:

lspci | grep -i lsi | grep -v megaraid
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] (rev 03)

This confirms the presence of an LSI RAID controller.

[!primary]

The grep -v megaraid command removes the MegaRaid RAID controller card from the lspci output, as MegaRaid cards are made by LSI Corporation as well.

To gather and list available RAID arrays, you can use the lsiutil command:

Warning

Caution, the values (1,0 21) may differ depending on the version. Be very careful when handling this type of control.

lsiutil -p1 -a 1,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1
 
Volume 0 is DevHandle 011e, Bus 1 Target 0, Type RAID1 (Mirroring)
  Volume Name:
  Volume WWID:  0aaf504551c8efe5
  Volume State:  optimal, enabled, background init complete
  Volume Settings:  write caching disabled, auto configure hot swap enabled
  Volume draws from Hot Spare Pools:  0
  Volume Size 1906394 MB, 2 Members
  Primary is PhysDisk 1 (DevHandle 0009, Bus 0 Target 0)
  Secondary is PhysDisk 0 (DevHandle 000a, Bus 0 Target 1)
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

In the example above, we can see one virtual drive, which is composed of two physical hard drives. In this case, the RAID status is "Optimal", which means the RAID is functioning correctly.

If the RAID status is "Degraded", we recommend that you verify the hard drive's state as well.

[!primary]

In the case of a newly provisioned server, you may see this message: [In Progress: data scrub]. This message is not an error. Rather, it's an automated process generated by the controller's firmware in order to lower uncorrectable errors as much as possible.

Step 2: Determine the disk's state

To take a look at the hard drive's state from the RAID controller, you can use this command:

lsiutil -p1 -a 2,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 2

PhysDisk 0 is DevHandle 000a, Bus 0 Target 1
  PhysDisk State:  optimal
  PhysDisk Size 1906394 MB, Inquiry Data:  ATA      HGST HUS724020AL AA70
  Path 0 is DevHandle 000a, Bus 0 Target 1, online, primary
  Path 1 is DevHandle 000a, invalid

PhysDisk 1 is DevHandle 0009, Bus 0 Target 0
  PhysDisk State:  optimal
  PhysDisk Size 1906394 MB, Inquiry Data:  ATA      HGST HUS724020AL AA70
  Path 0 is DevHandle 0009, Bus 0 Target 0, online, primary
  Path 1 is DevHandle 0009, invalid

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

In this case both drives show as "Optimal".

Since the LSI card uses sg-map, we must test the /dev/sgX (X being the device number, like /dev/sg1, for example) corresponding to the hard drives in order to test them with smartmontools.

Here's how to list them:

cat /proc/scsi/scsi | grep Vendor
  Vendor: LSI      Model: Logical Volume   Rev: 3000
  Vendor: ATA      Model: HGST HUS724020AL Rev: AA70
  Vendor: ATA      Model: HGST HUS724020AL Rev: AA70

Each line represents an sg device, which is mapped according to the order of the device shown here:

Vendor: LSI      Model: Logical Volume   Rev: 3000 => /dev/sg0
Vendor: ATA      Model: HGST HUS724020AL Rev: AA70 => /dev/sg1
Vendor: ATA      Model: HGST HUS724020AL Rev: AA70 => /dev/sg2

In order to list the right devices within one command, use the following:

cat /proc/scsi/scsi | grep Vendor | nl -v 0 | sed 's/^/\/dev\/sg/' | grep -v LSI | cut -d ' ' -f1,6 | sed 's/sg\ /sg/' | sed 's/\/dev\/sg.\ /\/dev\/sg/'
/dev/sg1
/dev/sg2

With smartmontools' smartctl command, we will test each hard drive, as shown below:

smartctl -a /dev/sgX 

The sg device number is shown in the above command.

Warning

If one of your hard drives is showing SMART errors, you should perform a full backup of your data as soon as possible and contact our support team.

Step 3: Resynchronise the RAID

If you had one or more hard drives replaced, the RAID will re-synchronise automatically. To see if the RAID is in re-sync and monitor the resync progression, use this command:

Warning

Caution, the values (3,0 21) may differ depending on the version. Be very careful when handling this type of control.

lsiutil -p1 -a 3,0 21
 
LSI Logic MPT Configuration Utility, Version 1.63-OVH (27a4f9f54c)
 
1 MPT Port found
 
     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2004 03      200      13000000     0
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 3
 
Volume 0 is DevHandle 011e, Bus 1 Target 0, Type RAID1 (Mirroring)
 
Volume 0 State:  degraded, enabled, resync in progress
Resync Progress:  total blocks 624943104, blocks remaining 484024888, 77%
 
RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

Warning

The percentage value shown in the command result is NOT the completion percentage. It is the remaining percentage.

3Ware RAID controller

[!alert]

This RAID controller card is deprecated. We highly recommend that you contact OVHcloud Support teams to schedule an intervention to replace the RAID controller with a MegaRaid controller, as 3ware RAID controllers are proven to be rather unstable. This type of intervention requires a reinstallation of your server. Be sure to backup your data first.

Go further

Hot Swap - Hardware RAID

Configuring MegaRAID for RAID Level 0

OVHcloud API and Storage

Managing software RAID

Hot Swap - Software RAID

Join our community of users.