Really bad performance when going from Write-Back to Write-Through in a LSI controller

Ever wonder what is the impact of write-through of an LSI controller in a real-world streaming server? Have no wonder anymore!

you can get several (multiple?) times slower with the write-through mode than if your controller were using the write-back mode of the cache

And it could happen any moment because when charging the battery of the LSI controller and you have set “No Write Cache if Bad BBU” the write-through would kick in. Of course, you can make a schedule for the battery charging/discharging process, but in general, it will happen and it will hurt your IO performance a lot!

In simple words a write operation is successful only if the controller confirms the write operation on all disks, no matter the data has already been in the cache.

This mode puts pressure on the disks and Write-Through is a known destroyer of hard disks! You can read a lot of administrator’s feedback on the Internet about crashed disks using write-through mode (and sometimes several simultaneously on one machine losing all your data even it would have redundancy with some of the RAID setups like RAID1, RAID5, RAID6, RAID10 and so).

srv ~ # sudo megacli -ldinfo -lall -aall
                                     
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :system
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 13.781 TB
Sector Size         : 512
Mirror Data         : 13.781 TB
State               : Optimal
Strip Size          : 128 KB
Number Of Drives per span:2
Span Depth          : 6
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only

Exit Code: 0x00

As you can see our default cache policy is WriteBack and “No Write Cache if Bad BBU”, the BBU is not bad, but charging!
Keep on reading!

megacli – FW error description: The current operation is not allowed … offline or missing virtual drives

Probably everyone who has ever touched LSI controllers and the megacli tool has had this error or he is going to have it for sure! It’s only a matter of time when you receive it when creating or replacing a disk! No this is not another article for this error!!!

FW error description: The current operation is not allowed because the controller has data in cache for offline or missing virtual drives.

Probably you want to replace a failed disk and you have inserted the new one and you cannot add it into the RAID array or something similar. In our case, the drive failed and even stopped working, the slot showed “missing drive” status. We replace the hard drive and it was in “Unconfigured(Good), Spun Up”

You have to use exactly the name of the Virtual Drive with the zero leading if any and even you may use double quotes

root@srv ~ # megacli -DiscardPreservedCache -L"08" -a0
                                     
Adapter #0

Virtual Drive(Target ID 08): Preserved Cache Data Cleared.

Exit Code: 0x00

As you can see: “Target ID 08“, the ID is 08, NOT 8!

Here are some unsuccessful tries (even with 08 was unsuccessful – it might be from our cli version or shell, but it did not work!). Note the last command to get the status for all devices:

root@srv ~ # megacli -DiscardPreservedCache -L8 -a0
                                     
Adapter 0: No Virtual Drive has Preserved Cache Data.

Exit Code: 0x00
root@srv ~ # megacli -DiscardPreservedCache -L08 -a0
                                     
Adapter 0: No Virtual Drive has Preserved Cache Data.

Exit Code: 0x00

root@srv ~ # megacli -GetPreservedCacheList -a0
                                     
Adapter #0

Virtual Drive(Target ID 08): Missing.

Exit Code: 0x00

It reports there is no drive with preserved cache on the very ID, YOU TYPED, but then getting the Preserved cached list there is a drive in the list because 8 is different from 08.

For farther problems getting your new disk in the array you can check our tested replace procedure: megacli – restart a rebuild with a disk in failed state

Install the new storcli to manage (LSI/AVAGO/Broadcom) MegaRAID controller under CentOS 7

After the acquisition of LSI there was a major change with the management console utility for the MegaRAID controllers. The utility was renamed from MegaCli (MegaCli64, megacli) to

storcli (storcli64)

We have new controllers like AVAGO MegaRAID SAS-9361-4i and really old ones like LSI 2108 MegaRAID (in fact Supermicro AOC-USAS2LP-H8iR) and the two controllers could be manage with the new cli. even the old controller, which is on more than 8 years could be manage by the new cli.
Interesting fact is that the storcli output and argument syntax and is almost identical to the one really old cli – tw_cli – the 3Ware management utility. As you know LSI bought 3ware RAID adapter business in 2009.
Keep on reading!

AVAGO MegaRAID SAS-9361-4i with CacheCade – create a new virtual drive RAID5 with SSD caching

Here is howto article for creating a RAID5 device in MegaRAID SAS-9361-4i with SSD caching. First and really important thing is to have the CacheCade ability to the controller, which should be purchased because it is a software add-on.
To have an SSD caching your virtual raid drive (probably hard disk drives) with a MegaRAID controller one setup is the following:

  1. LSI LSI00415 MegaRAID 9361-4i SGL
  2. LSI LSI00293

And it is advisable to have Cache Protection to protect your setup (it is extra protection to the battery kit – it is not the same) – LSI LSI00418 LSICVM02.

You can check also our AVAGO MegaRaid SAS 9361-4i with CacheCade and CacheVault BIOS configuration utilities review.

Here are the steps to create a RAID5 device with SSD caching using the BIOS Configuration Utility:

STEP 1) Supermicro device initialization

main menu
Start up your server

Keep on reading!

megacli – restart a rebuild with a disk in failed state

Sometimes we need to start a rebuild with a disk in failed state when using a LSI hardware controller, but if we just return the good state of the failed disk, it will return immediately in the array and our filesystem will be broken for sure! In addition it happens that when we replace a disk the new disk to be in failed state, too.

So here are simple and tested steps for proper resetting a failed state of a disk to a good state and starting a rebuild. In the example below the disk in failed state is [32:1], replace with the proper [enclosure_id:slot_id] in your case.

  1. Make “Failed State” in “Unconfigured(BAD)”
    megacli -pdmarkmissing -physdrv[32:1] -aAll
    
  2. Prepare for removal (this command could fail, not a critical one)
    megacli -pdprprmv -physdrv[32:1] -a0
    
  3. Make the state of the disk “Unconfigured(Good), Spun Up”
    megacli -PDMakeGood -PhysDrv[32:1] -a0
    
  4. Start rebuild (this command could fail) – if the command fails continue with the next step, if not, the rebuild is restarted successfully.
    megacli -PDRbld -Start -PhysDrv[32:1] -a0
    

    Or

    megacli -pdlocate -start -physdrv[32:1] -a0
    

    One of the two commands will probably start the rebuild, but if the two fail then continue to the next step.

  5. Start rebuild, first clean the foreign configuration and then make the device hot spare (only if 4 the above command failed)
    megacli -CfgForeign -Clear -aALL
    #set global hostspare
    megacli -PDHSP -Set -PhysDrv [32:1] -a0
    

* If you need to unset/remove a global hotspare:

megacli -PDHSP -Rmv -PhysDrv [32:1] -aN