Delete an Offline RAID6 virtual drive and create a new one with AVAGO storcli

Offline virtual device means it cannot be used because the missing or bad or failed disks are more than the fault tolerance it is offering. In this case, there is a RAID 6 on a AVAGO MegaRAID 3018 controller with 2 x RAID6 virtual drives with 6 disks each. One of the virtual drives misses 3 of the 6 disks in the group, so this virtual drive is in Offline state and it cannot be repaired. Three new disks are put to replace the failed disks. Here is what command to issue with the AVAGO command-line utility storcli under CentOS 7 to delete and then create a healthy new RAID 6 virtual drive:

  1. Delete the Offline virtual drive.
  2. Create a new RAID 6 virtual drive with 6 disks.
  3. Initialize the newly create virtual drive to make it consistent.

On each step, it is included additional show storcli commands to better preset what happens in reality and how the controller reflects the changes.
The initial state of the whole configuration is shown below:

[root@srv ~]# /opt/MegaRAID/storcli/storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0709.0000.0000 Aug 14, 2018
Operating system = Linux 3.10.0-957.1.3.el7.x86_64
Controller = 0
Status = Success
Description = None

Product Name = AVAGO 3108 MegaRAID
Serial Number = FW-AC5CMJEAARBWA
SAS Address =  500304802426b600
PCI Address = 00:01:00:00
System Time = 09/20/2022, 14:09:12
Mfg. Date = 00/00/00
Controller Time = 09/20/2022, 14:09:08
FW Package Build = 24.21.0-0028
BIOS Version = 6.36.00.2_4.19.08.00_0x06180202
FW Version = 4.680.00-8290
Driver Name = megaraid_sas
Driver Version = 07.705.02.00-rh1
Current Personality = RAID-Mode 
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x15D9
SubDevice Id = 0x809
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 1
Device Number = 0
Function Number = 0
Drive Groups = 2

TOPOLOGY :
========

----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT      Size PDC  PI SED DS3  FSpace TR 
----------------------------------------------------------------------------
 0 -   -   -        -   RAID6 OfLn  N  43.654 TB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID6 Dgrd  N  43.654 TB dflt N  N   dflt N      N  
 0 0   0   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 0 0   1   8:1      13  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   2   8:2      10  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   3   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 0 0   4   8:4      11  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   5   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 1 -   -   -        -   RAID6 Optl  N  43.654 TB dflt N  N   dflt N      N  
 1 0   -   -        -   RAID6 Optl  N  43.654 TB dflt N  N   dflt N      N  
 1 0   0   8:6      20  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   1   8:7      19  DRIVE Onln  N  12.732 TB dflt N  N   dflt -      N  
 1 0   2   8:8      18  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   3   8:9      15  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   4   8:10     12  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   5   8:11     14  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
----------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 2

VD LIST :
=======

------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name     
------------------------------------------------------------------
0/0   RAID6 OfLn  RW     No      RAWBD -   ON  43.654 TB storage1 
1/1   RAID6 Optl  RW     Yes     RAWBD -   ON  43.654 TB storage2 
------------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 12

PD LIST :
=======

---------------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp Type 
---------------------------------------------------------------------------------
8:0       9 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:1      13 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:2      10 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:3      17 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:4      11 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM001G-2MV103 U  -    
8:5      16 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:6      20 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:7      19 Onln  1  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 U  -    
8:8      18 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:9      15 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:10     12 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:11     14 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
---------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Cachevault_Info :
===============

------------------------------------
Model  State   Temp Mode MfgDate    
------------------------------------
CVPM02 Optimal 28C  -    2018/01/11 
------------------------------------

The show storcli command for the first virtual drive “/c0/v0” is also possible:

[root@srv ~]# /opt/MegaRAID/storcli/storcli64 /c0/v0 show all
CLI Version = 007.0709.0000.0000 Aug 14, 2018
Operating system = Linux 3.10.0-957.1.3.el7.x86_64
Controller = 0
Status = Success
Description = None


/c0/v0 :
======

------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name     
------------------------------------------------------------------
0/0   RAID6 OfLn  RW     No      RAWBD -   ON  43.654 TB storage1 
------------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency


PDs for VD 0 :
============

---------------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp Type 
---------------------------------------------------------------------------------
8:1      13 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:2      10 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:4      11 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM001G-2MV103 U  -    
---------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


VD0 Properties :
==============
Strip Size = 1.0 MB
Number of Blocks = 93746888704
VD has Emulated PD = Yes
Span Depth = 1
Number of Drives Per Span = 6
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disk's Default
Encryption = None
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
OS Drive Name = N/A
Creation Date = 19-12-2018
Creation Time = 06:11:08 AM
Emulation type = default
Cachebypass size = Cachebypass-64k
Cachebypass Mode = Cachebypass Intelligent
Is LD Ready for OS Requests = Yes
SCSI NAA Id = 600304802426b60023ac9d7c0a7a305b
SCSI Unmap = No

Keep on reading!

Surviving 3 disks failure of RAID 6 with AVAGO 3108 MegaRAID and foreign config

Whatever is the reason to end up with 3 broken hard disks in a RAID 6 setup it does not matter! What matters is to recover the data if possible and the most important thing in this situation is to find the LAST hard disk, which was marked as failed and removed from the array. Then the array gets a offline state immediately! So if the last broken hard disk might have a little light of life, probably it is easy to recover the data. The hardware controller is an additional Supermicro board – AOC-S3108L-H8iR.
What happened – third disk got failed status and a virtual device using RAID 6 setup is in offline state. In offline state the virtual device would not execute any READ or WRITE operations, because part of the data is missing and the virtual drive has no meaningfully user data.
To survive to backup the data:

  1. Power off the server. And it is better to remove the power cord, afterwards, and wait for at least a minute before plugging back the power cord back.
  2. Power on the server.
  3. When prompted for actions during initialization the AVAGO 3108 MegaRAID just continue the server loading without accepting any changes.
  4. Boot a recovery disk and using the AVAGO command-line (cli) tool dump the “events” in a file. A sample command might be:
    /opt/MegaRAID/storcli/storcli64 /c0 show events >show.event.log
    

    Assuming the Offline RAID 6 virtual drive is “/c0”. Other possible options are “/c1”, “/c2” and so on.

  5. Read from the end till start the AVAGO 3108 MegaRAID events dump and find which hard drive was marked as failed LAST, i.e. with the most latest date and time. And then there are events marking the the virtual device as Offline.
    seqNum: 0x00009a46
    Time: Mon Jun 27 01:49:54 2022
    
    Code: 0x00000072
    Class: 0
    Locale: 0x02
    Event Description: State change on PD 10(e0x08/s5) from ONLINE(18) to FAILED(11)
    Event Data:
    ===========
    Device ID: 16
    Enclosure Index: 8
    Slot Number: 5
    Previous state: 24
    New state: 17
    
    
    seqNum: 0x00009a47
    Time: Mon Jun 27 01:49:54 2022
    
    Code: 0x00000051
    Class: 0
    Locale: 0x01
    Event Description: State change on VD 00/0 from DEGRADED(2) to OFFLINE(0)
    Event Data:
    ===========
    Target Id: 0
    Previous state: 2
    New state: 0
    

    The first event of the list above logs the hard drive PD 10(e0x08/s5) gets FAILED status. And immediately after that the virtual drive VD 00/0 goes Offline, which means the last disk before the RAID 6 virtual drive stops working is the PD 10(e0x08/s5)The “/s5” from PD 10(e0x08/s5) points to the “Slot 5” hard drive.

  6. Reboot the server and when prompted the AVAGO 3108 MegaRAID BIOS Configuration Utility this time enter the utility.
  7. Make the found hard drive from the previous steps with ONLINE state. The hard drive might be in a foreign configuration or just in a bad state, so import the foreign configuration, make the drive a GOOD state and its state will immediately be ONLINE, which mean it is a part of an existing virtual drive. The virtual drive state will immediately be changed to DEGRADED (still two broken disks are out of the virtual drive). Follow the screenshots below to get the last broken disk back ONLINE and the virtual drive in an operable state – DEGRADED. If the drive is only in BAD/FAILED state, just skip the Foreign part and make the disk ONLINE (it may require first to make the disk “unconfigured-good”)
  8. Recover the data by simply copy it to another server or a healthy virtual drive. DO NOT TRY TO REMOVE data, i.e. do not use “rm”, the real state of this third broken disk is unknown and writing would probably kill it off. A good idea is to mount the filesystems on this virtual drive read-only and just rsync the data to a backup.

Here is the process of getting the third disk on “Slot 5” from a “Missing” and the “Virtual Drive 0Offline to the ONLINE state of the hard drive and a DEGRADED state of the “Virtual Drive 0“, i.e. operating.

SCREENSHOT 1) The Drive in Slot 5 is missing and the Virtual Drive 0 is in OFFLINE state

Slot 5 is the hard drive we need to recover, but it reports the hard drive is missing. Missing points out there is another configuration, so press “Ctrl+N” to change to the next page (i.e. menu), which is “PD Mgmt” – physical disk management.

main menu
slot 5 disk missing and VD offline

Keep on reading!

Add cachecade SSD to an existing virtual drive in AVAGO MegaRAID with storcli64

Increase the IO performance of big storage consisting of hard drives with the CacheCade feature with MegaRAID hardware controllers. Here are the commands to ensure adding an SSD to cache storage of hard drives and the whole output:

storcli64 /c0 add VD cachecade r0 drives=252:0 WB
storcli64 /c0/v0 set ssdcaching=on

Summary and explanation

  • The controller used is AVAGO MegaRAID SAS9361-4i
  • Three 12T hard drives and 1 SSD 1T.
  • RAID5 of 3 hard drive. The name of the virtual drive is v0.
  • The CacheCade is in RAID0. The name of the virtual drive is v1.
  • The CacheCade is in WriteBack mode. The WriteBack mode is dangerous when the CacheCade device is RAID0, because there is no redundancy, but it’s OK if the purpose of the machine is just a proxy cache. WriteBack mode leads to much more performance to gain from the cache device over slow device sotrage, but should be carefully planned when using RAID0 cache (or single) device.
  • Using storcli – the cli command line tool to manage a (LSI) MegaRAID.
  • Add CacheCade RAID0 device by the first command. It will create a virtual drive RAID0 with the name v1.
  • Ensure the caching is enabled for the virtual device it should be by explicitly setting it with the second command. Adding only a CacheCade device may not add the cache virtual device to the existing virtual devices!

Output of a real world example

Keep on reading!

storcli with multiple disks from different enclosures

Creating a Virtual device with the AVAGO storcli command-line tool under Linux. Two examples are included:

  1. All disks are from one of the enclosure. All disks are included explicitly.
  2. Disks from two enclosures are included. One controller with two enclosures.

Check out how to Install the new storcli to manage (LSI/AVAGO/Broadcom) MegaRAID controller under CentOS 7
There are 31 disks of 36 harddisk bays. 5 are missing on purpose for the examples.

The initial states of the controller and the disks.

livecd ~ # opt/MegaRAID/storcli/storcli /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0510.0000.0000 May 4, 2018
Operating system = Linux 4.19.72-gentoo
Controller = 0
Status = Success
Description = None

Product Name = LSI 2108 MegaRAID
Serial Number = FW-ABQRCBEAARBWA
SAS Address =  5003048004015f00
PCI Address = 00:06:00:00
System Time = 07/20/2020 22:58:35
Mfg. Date = 00/00/00
Controller Time = 07/20/2020 22:58:36
FW Package Build = 12.15.0-0239
FW Version = 2.130.403-4660
BIOS Version = 3.30.02.2_4.16.08.00_0x06060A05
Driver Name = megaraid_sas
Driver Version = 07.706.03.00-rc1
Vendor Id = 0x1000
Device Id = 0x79
SubVendor Id = 0x15D9
SubDevice Id = 0x700
Host Interface = PCI-E
Device Interface = SAS-6G
Bus Number = 6
Device Number = 0
Function Number = 0
Physical Drives = 31

PD LIST :
=======

-----------------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                   Sp Type 
-----------------------------------------------------------------------------------
12:0      0 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HUA723020ALA640 D  -    
12:1      1 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS724020ALA640    D  -    
12:2      2 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:3      3 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:4      4 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:5      5 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS722T2TALA604    D  -    
12:6      6 UGood -  1.817 TB SATA HDD N   N  512B TOSHIBA DT01ACA200      D  -    
12:7      7 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:8      8 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:9      9 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:10    10 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
12:11    11 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 D  -    
37:0     13 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:1     14 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:2     15 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:3     16 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HUA723020ALA640 U  -    
37:4     17 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:6     19 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:7     20 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS722T2TALA604    U  -    
37:8     21 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS724020ALA640    U  -    
37:10    23 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:11    24 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:13    26 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:14    27 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS724020ALA640    U  -    
37:16    29 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS724020ALA640    U  -    
37:17    30 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HUA723020ALA640 U  -    
37:19    32 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:20    33 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:21    34 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS722T2TALA604    U  -    
37:22    35 UGood -  1.817 TB SATA HDD N   N  512B Hitachi HDS723020BLA642 U  -    
37:23    36 UGood -  1.817 TB SATA HDD N   N  512B HGST HUS724020ALA640    U  -    
-----------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

Keep on reading!