Tag: 22.04

  • Replacing failed disks in a MD RAID in Ubuntu Server

    Hello everyone! Been a moment since my last blog update. This is a special one that I have been wanting to write, but wanted to wait until I actually had to do it so I can show real world examples, and boy, is this one for the record books.

    So, my secondary KVM server has a 5 disk hot swappable chassis that I bought on NewEgg about 7 years ago that allows you to install 5 SATA disks and these disks are connected to the mother board from the chassis into the 5 SATA ports. This allows me to hot swap the hard drives if they ever fail, and well, two of them did about a month ago. The system is setup as a RAID-5. So all of the disks are members of the RAID and then the 5th disk is a Hot Spare. Well, Disk 4 and 5 failed together. Basically, disk 4 failed, and while 5 was becoming the 4th disk, it failed. Luckily the Array was still good, but now I need to replace the failed disks.

    I bought 2 new 2TB disks from NewEgg and installed them in the array. Unfortunately, the system does not automatically detect new drives installed or removed, so I had to run the following commands to get the disks recognized by the system.

    sudo -i
    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan

    I then listed the /dev/ directory to make sure that /dev/sdd and /dev/sde were no longer being seen as they have been removed. I also checked the raid configuration to make sure that they were not listed any longer:

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    Both arrays no longer listed the failed disks, so I’m ready to physically add the new disks.

    I installed the new disks. Now I need to re-scan the bus for Linux to see the disks:

    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan

    I then listed the /dev directory and I can now see the new disks, sdd and sde.

    I then need to make sure that they have the correct format and partition layout to work with my existing array. For this I used the sfdisk command to copy a partition layout and then apply it to the new disks:

    sfdisk -d /dev/sda > partitions.txt
    sfdisk /dev/sdd < partitions.txt
    sfdisk /dev/sde < partitions.txt

    If I do another listing of the /dev directory I can see the new drives have the partitions. I’m now ready to add the disks back to the array:

    mdadm --add /dev/md0 /dev/sdd2
    mdadm --add /dev/md1 /dev/sdd3
    mdadm --add-spare /dev/md0 /dev/sde2
    mdadm --add-spare /dev/md1 /dev/sde3

    I then check the status of the array to make sure it is rebuilding:

    mdadm -D /dev/md0
    mdadm -D /dev/md1

    The system shown it was rebuilding the arrays and at the current rate it was going to take about a day.

    The next day I go and check the status, and low and behold I found out that disk 5 (sde) had failed and was no longer reporting in. I got a bad disk shipped to me. So I contacted NewEgg and they sent me out a replacement as soon as I sent them the failed disk. Luckily it was the hot spare so it didn’t have any impact on the system removing it or adding it back, but I did run the following command to remove the spare from the array and then re-scanned the bus so that the disk was fully removed from the server:

    sudo mdadm --remove /dev/md0 /dev/sde2
    sudo mdadm --remove /dev/md1 /dev/sde3
    sudo echo "0 0 0" >/sys/class/scsi_host/host0/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host1/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host2/scan
    sudo echo "0 0 0" >/sys/class/scsi_host/host3/scan
    sudo mdadm -D /dev/md0
    sudo mdadm -D /dev/md1

    The MDADM reported that there was no longer a spare available and the listing of the /dev directory no longer shown /dev/sde. A week later, I got my new spare from NewEgg and installed it and ran the following:

    sudo -i
    echo "0 0 0" >/sys/class/scsi_host/host0/scan
    echo "0 0 0" >/sys/class/scsi_host/host1/scan
    echo "0 0 0" >/sys/class/scsi_host/host2/scan
    echo "0 0 0" >/sys/class/scsi_host/host3/scan
    ls /dev
    sfdisk /dev/sde < partitions.txt
    ls /dev
    mdadm --add-spare /dev/md0 /dev/sde2
    mdadm --add-spare /dev/md1 /dev/sde3
    mdadm -D /dev/md0
    mdadm -D /dev/md1

    This added the disk and then added it as a hot spare for the arrays. Since it’s a hot spare, it does not need to resync.

    And there you have it, how to replace the disks in a MD RAID on Ubuntu.

  • Growing Ubuntu LVM After Install

    Hello everyone. I hope you have all been well.

    I have a new blog entry on something I just noticed today.

    So I typically don’t use LVM in my Linux Virtual Machines, mainly because I have had some issues in the past trying to migrate VM’s from one hypervisor type to another, for example, VMware to KVM or vice versa. I have found that if I use LVM, I have mapping issues and it takes some work to get the VM’s working again after converting the raw disk image from vmdk to qcow2 or vice versa.

    However, since I don’t plan on doing that anymore (I’m sticking with KVM/Qemu for the time being) I have looked at using LVM again since I like how easy it is to grow the volume if I have to in the future. While growing a disk image is fairly easy, trying to grow a /dev/vda or /dev/sda is a little cumbersome, usually requiring me to boot my VM with a tool like PMagic or even the Ubuntu install media and using gparted to manipulate the size and then rebooting back into the VM after successfully growing it.

    With LVM, this is much simpler. 3 commands and I’m done, and don’t need a reboot. Those commands:

    • pvdisplay
    • lvextend
    • resize2fs

    Now, One thing I have noticed after a fresh install of Ubuntu Server 22.04.2, using LVM, I don’t get all my hard drive partition used. I noticed this after I installed, I ran df -h and noticed that my / folder was at 32%. I built the VM with a 50G hard drive, yet df was only seeing 23GB. I then ran

    sudo pvdisplay

    Sure enough, the device was 46GB in size. I then ran

    sudo lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv

    This command extended my partition out to the remaining space. Next, I grew the file system to use the new space:

    sudo resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv

    I then ran df -h again, and low and behold, my / folder is now saying 46GB and 16% used instead of 32%.

    I hope this helps anyone else!

  • Building ONIE with DUE

    Howdy everyone, been a while since I’ve had a post but this one is long overdue.

    I’m still working in Networking, and every once in a while, I need to update the ONIE software on a switch, or even create a KVM version for GNS3 so that I can test latest versions of NOS’s.

    Well, a lot has changed and improved since I had to do this. ONIE now has a build environment using DUE, or Dedicated User Environment. Cumulus has made this, and it is in the APT repos for Ubuntu and Debian. This does make building much easier as trying to build a build machine with today’s procedure from OCP’s GitHub repo is 100% broken and doesn’t work. They still ask to use Debian 9, which most of the servers hosting packages have been retired since Debian 9 has EOL’d. I’ve tried with Debian 10, only to have packages not be supported. So I found out about DUE and was having issues with that, but after much searching and reading, I finally found a way to build ONIE images successfully and consistently.

    Just a slight Caution: At the rate of change with ONIE, this procedure can change again. I will either update this blog or create a new one when necessary.

    So, lets get to building!

    The first thing I did, was install Docker and DUE on my Ubuntu 22.04.4 server

    sudo apt update
    sudo apt install docker.io
    sudo usermod -aG docker $USER
    logout

    I then log back in to the server so that my new group association takes place and install DUE

    sudo apt update
    sudo apt install due
    

    I then installed the ONIE DUE environment for Debian 10. From my research this one is the most stable and worked the best for me:

    due --create --from debian:10 --description "ONIE Build Debian 10" --name onie-build-debian-10 \
    --prompt ONIE-10 --tag onie --use-template onie

    This download and sets up the build environment to build ONIE based on Cumulus’s best practices. Once this process is complete, we now get into the environment with the following command:

    due --run -i due-onie-build-debian-10:onie --dockerarg --privileged

    You are now in the Docker Container running Debian 10 and has the prerequisites for building ONIE already installed. Now we need to clone the ONIE repo from GitHub and do some minor settings to make sure the build goes smoothly.

    mkdir src
    cd src
    git clone https://github.com/opencomputeproject/onie.git

    I then update the git global config to include my email address and name so that during the building process when it grabs other repos to build, it doesn’t choke out and die and tell me to do it later:

     git config --global user.email "wililupy@lucaswilliams.net"
     git config --global user.name "Lucas Williams"

    So, I am building for a KVM instance of ONIE for testing in GNS3. First thing I need to do is build the security key

    cd onie/build-config/
    make signing-keys-install MACHINE=kvm_x86_64
    make -j4 MACHINE=kvm_x86_64 shim-self-sign
    make -j4 MACHINE=kvm_x86_64 shim
    make -j4 MACHINE=kvm_x86_64 shim-self-sign
    make -j4 MACHINE=kvm_x86_64 shim

    I had to run the shim-self-sign after the shim build option again to create self-signed shims after creating the shim, and then had to run shim again to install the signed shims in the correct directory so that ONIE build would get pass the missing shim files.

    Now we are ready to actually build the KVM ONIE image.

     make -j4 MACHINE=kvm_x86_64 all

    Now, I’m not sure if this is a bug or what, but I actually had to run the previous command about 10 times after every time it completed, because it didn’t actually complete. I would just press UP on my keyboard arrow key to re-run the previous command, and I did this until I got the following output:

    Added to ISO image: directory '/'='/home/wililupy/src/onie/build/kvm_x86_64-r0/recovery/iso-sysroot'
    Created: /home/wililupy/src/onie/build/images/onie-updater-x86_64-kvm_x86_64-r0
    === Finished making onie-x86_64-kvm_x86_64-r0 master-06121636-dirty ===
    $

    I then ran ls ../build/images to verify that my recovery ISO file was there:

    $ ls ../build/images
    kvm_x86_64-r0.initrd       kvm_x86_64-r0.vmlinuz.unsigned
    kvm_x86_64-r0.initrd.sig   onie-recovery-x86_64-kvm_x86_64-r0.iso
    kvm_x86_64-r0.vmlinuz      onie-updater-x86_64-kvm_x86_64-r0
    kvm_x86_64-r0.vmlinuz.sig
    $

    I then logged out of the DUE environment and my ISO was in my home directory under the src/onie/build/images/onie-recovery-x86_64-kvm_x86_64-r0.iso file. From here I was able to upload it to my GNS3 server and create a new ONIE template and map the ISO as the CD-ROM and created a blank qcow2 hard disk image to use the recovery and build the image to use on my GNS3.

    One thing to note is that this procedure is for building the KVM version of ONIE. To build others, just change the MACHINE= variable to be what ever platform you are building for.

    Good luck and let me know in the comments if this worked for you.