mdadm

mdadm

Article by Joseph Purcell on May 19, 2014, last modified on November 12, 2016

Setup RAID1 on Existing OS

My approach to RAID is that I just want a redundant drive in case one fails and I want to use it for storage and not as an OS. So, this setup will be for creating a RAID1 array on an existing Linux OS.

1. Find Your /dev's

First, you need to figure out what your devices are:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0    70G  0 disk 
├─sda1   8:1    0    50G  0 part 
├─sda2   8:2    0    20G  0 part
sdb      8:0    0   500G  0 disk
sdc      8:0    0   500G  0 disk

Mine are sdb and sdc.

2. Create Partitions on Your Drives

Next, create partition tables:

$ fdisk /dev/sdb
Command (m for help): n
# then, hit enter a bunch of times
Command (m for help): w

Then, do the same for sdc.

3. Create the RAID Drive

Next, create the RAID drive as /dev/md0 whatever device you want to call it:

$ mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1

4. Edit the mdadm.conf

Save the array details to mdadm.conf:

$ mdadm -Es | grep md0 >> /etc/mdadm/mdadm.conf

5. Update initramfs

$ update-initramfs -u

6. Create the Filesystem

$ mkfs.ext4 /dev/md0

7. Add the Drive to fstab

Add the drive to fstab so it will auto-mount on boot:

$ /dev/md0 /mnt/bup ext4 defaults,nobootwait,noatime 0 2

Change "/mnt/bup" to whatever path you want the drive to be mounted at. If you are unfamiliar with fstab then the options above should be sufficient. Otherwise, you can look at the mount manual page to see what other options are there.

8. Mount the Drive

$ mount -a

Recovery

Force Mount After Failure

If a drive fails, you reboot the PC, and it won't mount you can force it to mount via:

$ mdadm --manage /dev/md0 --run

You should never do this. You should replace the drive. However, in practice, there may be a need.

Adding a New Drive

If a drive fails, power off the machine and replace the drive. How to find which drive has failed? I'm using RAID 1 and I just boot with one drive and make sure this command fails:

$ mdadm -Ds /dev/md0 
mdadm: md device /dev/md0 does not appear to be active.

After replacing the drive, on boot, you will need to re-partition the drive and add it to your array. Here is an example of RAID1 with sdb and sdc and sdc has failed:

$ fdisk /dev/sdc

Run through the prompts, it should be "n" for new then "p" for primary, and "1" for partition number, then hit "enter" a bunch of times, then it should prompt you with "Command (m for help):". Here you can type "p" to show the partition was created. Finally, hit "w" and "enter" to save the created partition.

Next, add the drive to the array:

$ mdadm /dev/md0 -a /dev/sdc

Finally, check the progress of re-syncing:

$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdc[2] sdb[0]
      2930135360 blocks super 1.2 [2/1] [U_]
      [>....................]  recovery =  0.5% (17077248/2930135360) finish=257.4min speed=188574K/sec
      
unused devices: <none>

Testing Failure

To test your recovery process after you have setup your RAID array:

add some data to your RAID drive
poweroff the machine and unplug one of the drives
poweron the machine and force mount:
```
$ mdadm --manage /dev/md0 --run
```
add some more data to your RAID drive
poweroff the machine and plug back in the other drive
poweron the machine and sync the drives
poweroff the machine and unplug the other drive that you never unplugged before
poweron the machine and force mount;
```
$ mdadm --manage /dev/md0 --run
```
check to ensure that at step 6 all the data you created at step 4 was synced.

Now, I need to double check that this actually works. When I did it the first time I had a bunch of trouble and ended up running mdadm -a /dev/md0 /dev/sdc.

Monitoring

You will likely want to be notified somehow if a drive fails. mdadm has good built-in support for emailing alerts. However, you may want to check it manually or may not be able to send email from your server (e.g. you're behind a ISP that blocks SMPT).

To check the status run:

$ mdadm -Ds /dev/md0

If everything is normal you will see:

 State : clean

If a drive has failed you will see:

 State : clean, degraded

It will also show which drive has failed:

     Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       2       8       32        1      active sync   /dev/sdc

You can also get information from /proc/mdstat, or you can check a specific drive with "mdstat -E /dev/sdb".

This information may need updated periodically, from what I understand. There is what seems to be a good answer on Stack Exchange that shows a few tips. To check the disk while it's running you can run:

$ echo check > /sys/block/md0/md/sync_action

I have not tried this, but there is an additional reference by Thomas Krenn. I've also seen this:

$ /usr/share/mdadm/checkarray --cron --all --idle --quiet

There seems to be some good information here.

The person on the Stack Exchange answer also said that they run the following command in a cron job once a month:

$ ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

The author says,

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors.

The author goes on to say that in the three years of using a RAID array it was this command that caught a bad drive, but warns that if you have a large RAID array it will take a long time, estimating 6 hrs per terabyte.

I also have not tested this cron idea.

Statistics

To get statistics such as drive status, re-syncing status, etc, you have a few ways:

$ mdadm -Ds

$ mdadm -D /dev/md0

$ cat /proc/mdstat

$ mdadm --detail /dev/md0

Resyncing

To resync a drive, I *think* you do:

$ umount /dev/md0
$ mdadm --stop /dev/md0
$ mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb1 /dev/sdc1
 mdadm: /dev/md0 has been started with 2 drives.

But, it should re-sync automatically.

Reference: http://www.thomas-krenn.com/en/wiki/Mdadm_recovery_and_resync#Resync