Sometimes we need to start a rebuild with a disk in failed state when using a LSI hardware controller, but if we just return the good state of the failed disk, it will return immediately in the array and our filesystem will be broken for sure! In addition it happens that when we replace a disk the new disk to be in failed state, too.
So here are simple and tested steps for proper resetting a failed state of a disk to a good state and starting a rebuild. In the example below the disk in failed state is [32:1], replace with the proper [enclosure_id:slot_id] in your case.
- Make “Failed State” in “Unconfigured(BAD)”
megacli -pdmarkmissing -physdrv[32:1] -aAll
- Prepare for removal (this command could fail, not a critical one)
megacli -pdprprmv -physdrv[32:1] -a0
- Make the state of the disk “Unconfigured(Good), Spun Up”
megacli -PDMakeGood -PhysDrv[32:1] -a0
- Start rebuild (this command could fail) – if the command fails continue with the next step, if not, the rebuild is restarted successfully.
megacli -PDRbld -Start -PhysDrv[32:1] -a0
megacli -pdlocate -start -physdrv[32:1] -a0
One of the two commands will probably start the rebuild, but if the two fail then continue to the next step.
- Start rebuild, first clean the foreign configuration and then make the device hot spare (only if 4 the above command failed)
megacli -CfgForeign -Clear -aALL #set global hostspare megacli -PDHSP -Set -PhysDrv [32:1] -a0
* If you need to unset/remove a global hotspare:
megacli -PDHSP -Rmv -PhysDrv [32:1] -aN