Remove disk (all partitions) from software RAID1 with mdadm and change layout of the disk

The following article is to show how to remove healthy partitions from software RAID1 devices to change the layout of the disk and then add them back to the array.
The mdadm is the tool to manipulate the software RAID devices under Linux and it is part of all Linux distributions (some don’t install it by default so it may need to be installed).

Software RAID layout

[root@srv ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sda4[1] sdb3[0]
      1047552 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb2[0] sda3[1]
      32867328 blocks super 1.2 [2/2] [UU]
      
md127 : active raid1 sda2[1] sdb1[0]
      52427776 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

STEP 1) Make the partitions faulty.

The partitions cannot be removed if they are not faulty.

[root@srv ~]# mdadm --fail /dev/md125 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md125
[root@srv ~]# mdadm --fail /dev/md126 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md126
[root@srv ~]# mdadm --fail /dev/md127 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md127


And now the software device layout:

[root@srv ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sda4[1] sdb3[0](F)
      1047552 blocks super 1.2 [2/1] [_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb2[0](F) sda3[1]
      32867328 blocks super 1.2 [2/1] [_U]
      
md127 : active raid1 sda2[1] sdb1[0](F)
      52427776 blocks super 1.2 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

The “sdb1”, “sdb2” and “sdb3” are marked faulty and now they can be removed.
The dmesg contains error of the kind:

[ 1445.230797] md/raid1:md125: Disk failure on sdb3, disabling device.
               md/raid1:md125: Operation continuing on 1 devices.
[ 1463.658032] md/raid1:md126: Disk failure on sdb2, disabling device.
               md/raid1:md126: Operation continuing on 1 devices.
[ 1470.374611] md/raid1:md127: Disk failure on sdb1, disabling device.
               md/raid1:md127: Operation continuing on 1 devices.

STEP 2) Remove the partitions from the RAID1 devices.

[root@srv ~]# mdadm --remove /dev/md125 /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md125
[root@srv ~]# mdadm --remove /dev/md126 /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md126
[root@srv ~]# mdadm --remove /dev/md127 /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md127

The RAID devices consist only with one partition per device.

[root@srv ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sda4[1]
      1047552 blocks super 1.2 [2/1] [_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sda3[1]
      32867328 blocks super 1.2 [2/1] [_U]
      
md127 : active raid1 sda2[1]
      52427776 blocks super 1.2 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

STEP 3) Change the disk layout of the removed disk.

When you remove all the partitions from the software raid devices (and there are no other partitions mounted) the disk layout could be changed with “parted” program. We use sgdisk (part of gdisk package) to copy the partitions layout from the “sda” but you may just create them manually with “parted”.

[root@srv ~]# sgdisk /dev/sda -R /dev/sdb
The operation has completed successfully.
[root@srv ~]# sgdisk -G /dev/sdb
The operation has completed successfully.

Copy the disk layout from “sda” to “sdb” and then randomize the partitions’ GIUDs (it is important the GIUDs be unique!!!)

STEP 4) Add the new partition to the RAID devices to rebuild the array.

The rebuild of the array will start automatically as soon as you add the missing partition in a software raid device.

[root@srv ~]# mdadm --add /dev/md125 /dev/sdb4
mdadm: added /dev/sdb4
[root@srv ~]# mdadm --add /dev/md126 /dev/sdb3
mdadm: added /dev/sdb3
[root@srv ~]# mdadm --add /dev/md127 /dev/sdb2
mdadm: added /dev/sdb2

And the MD device information:

[root@srv ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sdb4[2] sda4[1]
      1047552 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2] sda3[1]
      32867328 blocks super 1.2 [2/1] [_U]
      [=>...................]  recovery =  7.7% (2546752/32867328) finish=2.3min speed=212229K/sec
      
md127 : active raid1 sdb2[2] sda2[1]
      52427776 blocks super 1.2 [2/1] [_U]
        resync=DELAYED
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

The RAID devices are recovering one by one. If there is more than one device to recover one is in synchronization mode and all others are in delay.
Wait for a while to finish the recovery synchronization and all devices are healthy:

[root@srv ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sdb4[2] sda4[1]
      1047552 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb3[2] sda3[1]
      32867328 blocks super 1.2 [2/2] [UU]
      
md127 : active raid1 sdb2[2] sda2[1]
      52427776 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

Several notifications in dmesg also exist:

[ 3425.774344]  sdb: sdb1 sdb2 sdb3 sdb4
[ 3426.815466]  sdb: sdb1 sdb2 sdb3 sdb4
[ 3531.407084]  sdb: sdb1 sdb2 sdb3 sdb4
[ 3531.416109]  sdb: sdb1 sdb2 sdb3 sdb4
[ 3532.431930]  sdb: sdb1 sdb2 sdb3 sdb4
[ 4130.027439]  sdb: sdb1 sdb2 sdb3 sdb4
[ 4395.883816] device-mapper: uevent: version 1.0.3
[ 4395.883886] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
[ 4495.084070]  sdb: sdb1 sdb2 sdb3 sdb4
[ 4758.775855] md: recovery of RAID array md125
[ 4764.393607] md: md125: recovery done.
[ 4765.208595] md: recovery of RAID array md126
[ 4774.956777] md: delaying recovery of md127 until md126 has finished (they share one or more physical units)
[ 4931.000901] md: md126: recovery done.
[ 4931.005296] md: recovery of RAID array md127
[ 5194.962056] md: md127: recovery done.

Install Grub2

This operation for changing the layout of the second disk was to add bios_grub special partition at the beginning of the second disk. When the first disk fails “sda” the server is going to boot up from the second disk “sdb”.

[root@srv ~]# grub2-install /dev/sda
Installing for i386-pc platform.
grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Installation finished. No error reported.
[root@srv ~]# grub2-install /dev/sdc
Installing for i386-pc platform.
^C
[root@srv ~]# grub2-install /dev/sdb
Installing for i386-pc platform.
grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Installation finished. No error reported.

For UEFI BIOS mode you should include the efi parameters.

Leave a Reply

Your email address will not be published. Required fields are marked *