{"id":298,"date":"2025-03-10T10:43:17","date_gmt":"2025-03-10T17:43:17","guid":{"rendered":"https:\/\/www.lucaswilliams.net\/?p=298"},"modified":"2025-03-10T10:43:22","modified_gmt":"2025-03-10T17:43:22","slug":"replacing-failed-disks-in-a-md-raid-in-ubuntu-server","status":"publish","type":"post","link":"https:\/\/www.lucaswilliams.net\/index.php\/2025\/03\/10\/replacing-failed-disks-in-a-md-raid-in-ubuntu-server\/","title":{"rendered":"Replacing failed disks in a MD RAID in Ubuntu Server"},"content":{"rendered":"\n<p>Hello everyone! Been a moment since my last blog update. This is a special one that I have been wanting to write, but wanted to wait until  I actually had to do it so I can show real world examples, and boy, is this one for the record books.<\/p>\n\n\n\n<p>So, my secondary KVM server has a 5 disk hot swappable chassis that I bought on NewEgg about 7 years ago that allows you to install 5 SATA disks and these disks are connected to the mother board from the chassis into the 5 SATA ports. This allows me to hot swap the hard drives if they ever fail, and well, two of them did about a month ago. The system is setup as a RAID-5. So all of the disks are members of the RAID and then the 5th disk is a Hot Spare. Well, Disk 4 and 5 failed together. Basically, disk 4 failed, and while 5 was becoming the 4th disk, it failed. Luckily the Array was still good, but now I need to replace the failed disks. <\/p>\n\n\n\n<p>I bought 2 new 2TB disks from NewEgg and installed them in the array. Unfortunately, the system does not automatically detect new drives installed or removed, so I had to run the following commands to get the disks recognized by the system.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo -i\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host0\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host1\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host2\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host3\/scan<\/code><\/pre>\n\n\n\n<p>I then listed the <code>\/dev\/<\/code> directory to make sure that \/dev\/sdd and \/dev\/sde were no longer being seen as they have been removed. I also checked the raid configuration to make sure that they were not listed any longer:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mdadm -D \/dev\/md0\nmdadm -D \/dev\/md1<\/code><\/pre>\n\n\n\n<p>Both arrays no longer listed the failed disks, so I&#8217;m ready to physically add the new disks. <\/p>\n\n\n\n<p>I installed the new disks. Now I need to re-scan the bus for Linux to see the disks:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>echo \"0 0 0\" >\/sys\/class\/scsi_host\/host0\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host1\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host2\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host3\/scan<\/code><\/pre>\n\n\n\n<p>I then listed the <code>\/dev<\/code> directory and I can now see the new disks, sdd and sde. <\/p>\n\n\n\n<p>I then need to make sure that they have the correct format and partition layout to work with my existing array. For this I used the <code>sfdisk<\/code> command to copy a partition layout and then apply it to the new disks:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sfdisk -d \/dev\/sda > partitions.txt\nsfdisk \/dev\/sdd &lt; partitions.txt\nsfdisk \/dev\/sde &lt; partitions.txt<\/code><\/pre>\n\n\n\n<p>If I do another listing of the <code>\/dev<\/code> directory I can see the new drives have the partitions. I&#8217;m now ready to add the disks back to the array:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mdadm --add \/dev\/md0 \/dev\/sdd2\nmdadm --add \/dev\/md1 \/dev\/sdd3\nmdadm --add-spare \/dev\/md0 \/dev\/sde2\nmdadm --add-spare \/dev\/md1 \/dev\/sde3<\/code><\/pre>\n\n\n\n<p>I then check the status of the array to make sure it is rebuilding:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mdadm -D \/dev\/md0\nmdadm -D \/dev\/md1<\/code><\/pre>\n\n\n\n<p>The system shown it was rebuilding the arrays and at the current rate it was going to take about a day. <\/p>\n\n\n\n<p>The next day I go and check the status, and low and behold I found out that disk 5 (sde) had failed and was no longer reporting in. I got a bad disk shipped to me. So I contacted NewEgg and they sent me out a replacement as soon as I sent them the failed disk. Luckily it was the hot spare so it didn&#8217;t have any impact on the system removing it or adding it back, but I did run the following command to remove the spare from the array and then re-scanned the bus so that the disk was fully removed from the server:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mdadm --remove \/dev\/md0 \/dev\/sde2\nsudo mdadm --remove \/dev\/md1 \/dev\/sde3\nsudo echo \"0 0 0\" >\/sys\/class\/scsi_host\/host0\/scan\nsudo echo \"0 0 0\" >\/sys\/class\/scsi_host\/host1\/scan\nsudo echo \"0 0 0\" >\/sys\/class\/scsi_host\/host2\/scan\nsudo echo \"0 0 0\" >\/sys\/class\/scsi_host\/host3\/scan\nsudo mdadm -D \/dev\/md0\nsudo mdadm -D \/dev\/md1<\/code><\/pre>\n\n\n\n<p>The MDADM reported that there was no longer a spare available and the listing of the \/dev directory no longer shown \/dev\/sde. A week later, I got my new spare from NewEgg and installed it and ran the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo -i\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host0\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host1\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host2\/scan\necho \"0 0 0\" >\/sys\/class\/scsi_host\/host3\/scan\nls \/dev\nsfdisk \/dev\/sde &lt; partitions.txt\nls \/dev\nmdadm --add-spare \/dev\/md0 \/dev\/sde2\nmdadm --add-spare \/dev\/md1 \/dev\/sde3\nmdadm -D \/dev\/md0\nmdadm -D \/dev\/md1<\/code><\/pre>\n\n\n\n<p>This added the disk and then added it as a hot spare for the arrays. Since it&#8217;s a hot spare, it does not need to resync. <\/p>\n\n\n\n<p>And there you have it, how to replace the disks in a MD RAID on Ubuntu.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello everyone! Been a moment since my last blog update. This is a special one that I have been wanting to write, but wanted to wait until I actually had to do it so I can show real world examples, and boy, is this one for the record books. So, my secondary KVM server has [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,49,116,6],"tags":[120,27,128,55,50,127,129,130,67,5],"class_list":["post-298","post","type-post","status-publish","format-standard","hentry","category-howto","category-kvm","category-storage","category-ubuntu","tag-22-04","tag-canonical","tag-hard-disk","tag-howto","tag-kvm","tag-mdadm","tag-raid","tag-raid-5","tag-server","tag-ubuntu"],"_links":{"self":[{"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/posts\/298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/comments?post=298"}],"version-history":[{"count":1,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/posts\/298\/revisions"}],"predecessor-version":[{"id":299,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/posts\/298\/revisions\/299"}],"wp:attachment":[{"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/media?parent=298"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/categories?post=298"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lucaswilliams.net\/index.php\/wp-json\/wp\/v2\/tags?post=298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}