Tag: mdadm

  • Replacing failed disks in a MD RAID in Ubuntu Server

    Hello everyone! Been a moment since my last blog update. This is a special one that I have been wanting to write, but wanted to wait until I actually had to do it so I can show real world examples, and boy, is this one for the record books.

    So, my secondary KVM server has a 5 disk hot swappable chassis that I bought on NewEgg about 7 years ago that allows you to install 5 SATA disks and these disks are connected to the mother board from the chassis into the 5 SATA ports. This allows me to hot swap the hard drives if they ever fail, and well, two of them did about a month ago. The system is setup as a RAID-5. So all of the disks are members of the RAID and then the 5th disk is a Hot Spare. Well, Disk 4 and 5 failed together. Basically, disk 4 failed, and while 5 was becoming the 4th disk, it failed. Luckily the Array was still good, but now I need to replace the failed disks.

    I bought 2 new 2TB disks from NewEgg and installed them in the array. Unfortunately, the system does not automatically detect new drives installed or removed, so I had to run the following commands to get the disks recognized by the system.

    sudo -i
    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan

    I then listed the /dev/ directory to make sure that /dev/sdd and /dev/sde were no longer being seen as they have been removed. I also checked the raid configuration to make sure that they were not listed any longer:

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    Both arrays no longer listed the failed disks, so I’m ready to physically add the new disks.

    I installed the new disks. Now I need to re-scan the bus for Linux to see the disks:

    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan

    I then listed the /dev directory and I can now see the new disks, sdd and sde.

    I then need to make sure that they have the correct format and partition layout to work with my existing array. For this I used the sfdisk command to copy a partition layout and then apply it to the new disks:

    sfdisk -d /dev/sda > partitions.txt
    sfdisk /dev/sdd < partitions.txt
    sfdisk /dev/sde < partitions.txt

    If I do another listing of the /dev directory I can see the new drives have the partitions. I’m now ready to add the disks back to the array:

    mdadm --add /dev/md0 /dev/sdd2
    mdadm --add /dev/md1 /dev/sdd3
    mdadm --add-spare /dev/md0 /dev/sde2
    mdadm --add-spare /dev/md1 /dev/sde3

    I then check the status of the array to make sure it is rebuilding:

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    The system shown it was rebuilding the arrays and at the current rate it was going to take about a day.

    The next day I go and check the status, and low and behold I found out that disk 5 (sde) had failed and was no longer reporting in. I got a bad disk shipped to me. So I contacted NewEgg and they sent me out a replacement as soon as I sent them the failed disk. Luckily it was the hot spare so it didn’t have any impact on the system removing it or adding it back, but I did run the following command to remove the spare from the array and then re-scanned the bus so that the disk was fully removed from the server:

    sudo mdadm --remove /dev/md0 /dev/sde2
    sudo mdadm --remove /dev/md1 /dev/sde3
    sudo echo "0 0 0" >/sys/class/scsi_host/host0/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host1/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host2/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host3/scan
    sudo mdadm -D /dev/md0
    sudo mdadm -D /dev/md1

    The MDADM reported that there was no longer a spare available and the listing of the /dev directory no longer shown /dev/sde. A week later, I got my new spare from NewEgg and installed it and ran the following:

    sudo -i
    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan
    ls /dev
    sfdisk /dev/sde < partitions.txt
    ls /dev
    mdadm --add-spare /dev/md0 /dev/sde2
    mdadm --add-spare /dev/md1 /dev/sde3
    mdadm -D /dev/md0
    mdadm -D /dev/md1

    This added the disk and then added it as a hot spare for the arrays. Since it’s a hot spare, it does not need to resync.

    And there you have it, how to replace the disks in a MD RAID on Ubuntu.