Browsed by
Month: July 2016

Boot a Rescue CD in Openstack

Boot a Rescue CD in Openstack

The other day, a colleague of mine was greeted with an unpleasant surprise when one of his instances booted:

grub-prompt

For those who may be unfamiliar, this is the shell GRUB drops you into if for some reason it wasn’t able to load your kernel or its own configuration.  Further incantations indicated that the root file system (where /boot was located) was sufficiently damaged that GRUB wouldn’t even recognize it as ext4.  At this point, I figured the instance was probably toast.  But, if this were a physical piece of hardware (or even just a local virtual machine) I’d usually have a few more tricks up my sleeve before calling it quits.

What I really wanted to do was run fsck on it just to see if there was any chance it could be repaired.  But how?

(NOTE: If your system is booting okay, or at least getting to a grub menu, you can get to the grub shell prompt by pressing ‘c’.  If you don’t see the menu at all, you may have to adjust the settings in /boot/grub/grub.cfg).

A Very Roundabout Solution

My first thought was something along the lines (and I won’t go into detail because I didn’t actually do this and there is a better solution below…):

  1. Take a snapshot of the failed instance
  2. Download the snapshot from the glance service
  3. Convert the snapshot to a raw image using qemu-img convert
  4. Expose the partitions of the raw image using loop devices via the kpartx tool
  5. Run e2fsck on the now exposed root file system, hopefully correcting the problems
  6. Remove the kpartx mappings
  7. Convert back to the source image format
  8. Upload back into glance
  9. Rebuild the instance from this image

And it might have worked… but that’s an awful lot of shuffling around, network transfer, disk space to use.  And it doesn’t mirror very well how I’d try to tackle the same problem outside of Openstack.  As you’ve probably guessed from the title of this post, I rather preferred the idea of booting a rescue CD.

Figuring Out How to Boot an ISO

So, I started by grabbing my favorite rescue CD, SystemRescueCd and downloading the latest version, which as of this writing was 4.8.0.  Being able to at least get to GRUB, I was hoping that I could find a way to chainload into the rescue CD.  That’s a sound idea in theory, but how do you get it there in the first place?  After all, you can’t quite create and attach a virtual CD drive to an instance (or, at least I couldn’t find a way to do so).  Along the way I did discover that you can create an instance by booting from an ISO. While interesting, that won’t help us much here.

Then what options does that leave us? Well, assuming you’ve set up the Cinder block storage service, it’s possible to dump the rescue CD there and boot from it.  Well… except it’s an ISO.  And the mechanics of booting from a CD are different than booting from a hard disk.  And that being said, you can’t simply dump an ISO onto a hard disk and boot it and expect that to work out (and yes, while trying to figure this out I did try it “just in case”).

Looking around a bit for what to do, I stumbled across an interesting tool that’s part of syslinux called isohybrid. The isohybrid program basically adds an MBR to the ISO to make it bootable as a hard disk.  It’s mentioned mostly in the context of taking ISOs with isolinux and turning them into bootable USB sticks, but whether or not it’s USB is inconsequential.  The image was created as follows:

You can see the additional bits that have been added here.  Now that we have something we can boot, let’s walk through actually doing it.

Putting It Together and Booting the Rescue CD

First, upload the new hybrid ISO image:

then create a volume containing that image and attach it to the instance in question:

Then boot the instance and from here we can chainload our new volume:

grub-chainloader

Then hit enter, and…

grub-sysrescuecd

Tada! You’ve now successfully booted into the rescue CD and can do whatever dirty work you need to do.  In my case, it was a happy ending.  I was able to run e2fsck on the broken file system, and after repairing lots of problems, the file system was mountable again and the system booted successfully.

Conclusions and Caveats

I can’t help but note some caveats about doing this.  Particularly,

  • This is definitely not appropriate for a production environment.  If your file system is that damaged, you should restore from a backup (you did remember to take backups, right?)
  • These instructions are written around grub-legacy (old!).  The procedure is likely similar for grub-pc.

But, this could come in handy in a pinch like it did for me, or for experimenting with changes that can’t be made while a system is booted.