Search A-Z index Help
University of Cambridge Home Physics Dept Home Mike Roses' Home Page TCM Group Home

Hardware RAID HDD Controllers and Linux

This is not a general disk controller guide, but rather a story of us trying to get a simple set of 5 SCSI disks working as one RAID5 array, with linux as the OS, with reasonable write performance and stability:

In the Beginning

There was an intel Xeon server (1012Mb 1xXeon at 2791MHz) with 5 SCSI disks and an Adaptec 2120S hardware RAID controller.

Adaptec 2120S card configured to provide a RAID5 array of 5 SCSI disks to the OS. Redhat 9.0 installed. A simple write test of a large file to the array (ext3) was done:

> time dd if=/dev/zero of=32gb bs=1024k count=32768
32768+0 records in
32768+0 records out
 
 real    143m50.650s
 user    0m0.170s
 sys     1m33.840s

32768/(143*60+50) = 3.7969 MB/sec. The machine was utterly unresponsive for all that time.

Just for comparison:

A Sun V240, with s/w RAID1+0 over four disks and two SCSI buses, and
ufs+logging (the Linux test was ext3) 

bash-2.05$ time dd if=/dev/zero of=32gb bs=1024k count=32768
32768+0 records in
32768+0 records out

real    6m49.461s
user    0m0.280s
sys     4m34.120s

(80MB/s)

and no discernable change to interactive performance.

Turning on write-back caching on the Adaptec 2120S yielded about 15MB/sec still with terrible interactive performance.

Turns out the linux 2.4 kernel was not so good at I/O and the 2.6 kernel has much improbed disk I/O: Linux: Anticipatory I/O Scheduler

Suse 9.1 and a 2.6 kernel

Installed Suse 9.1 on the intel computer with the 2120S. Things looked good. The same I/O test (time dd if=/dev/zero of=32gb bs=1024k count=32768) yielded about 36MB/sec. Also the interactive performance of the computer during the write test was fine.

Then we tried a multiple file write test:

#!/bin/sh

total=1

while [ "$total" -le 20 ]
do
        date >> /root/io_test.log
        rm -rf /data1/*
        count=1
        while [ "$count" -le "40" ]
        do
                cp -r /usr/src/linux-2.6.5-7.111 /data1/thing$count
                count=$((count+1))
        done
        total=$((total+1))
done

This cause the aacraid driver to fall over very quickly with lots of errors like this:

Sep 9 09:52:39 tcmis kernel: end_request: I/O error, dev sda, sector 476263772
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935884
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935885
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3

In summary we tried:

  • upgrade firmware -> slightly improved performance (39 MB/sec), but still unstable
  • compile latest aacraid driver -> did not compile against 2.6 kernel
  • get hold of latest aacraid driver code (alpha) -> compiled and ran, but still unstable
  • install Windows 2000 onto computer -> 40MN/sec write performance, interactivity is fine AND it is stable using a similar multiple file write test

You can read more about aacraid bad and Adaptec disk controllers here.

So we think the hardware is OK, and aacraid is a problem. A fair bit of google then turned up LSI. We obtained an LSI Megaraid controller (Megaraid SCSI 320-1) for evaluation and got 39MB/sec write performance (same test). We then tried the multiple file writing for 16 hours and the machine was stable. Interactive performance during both tests was also good. We then bought a Megaraid 320-1 card AND the wee on-card battery for the write-back cache. Write performance went up to 50MB/sec with the write-back cache switched on. Yes you can supposedly turn the write-back cache on without the battery, but performance stays the same...

We logged a support call with Adaptec and eventually were told that the aacraid driver does not work with a 2.6 kernel. I now notice that Adaptec have released a new aacraid driver just in December 2004 that is for Suse 9.1 (approximately 6 months after the release of Suse 9.1).... maybe it works?

Conclusion

LSI Megaraid SCSI 320-1 card works well. Adaptec 2120S card did not.

Further Work with LSI Controller

The raidmon (different name for Suse) init.d script for Suse was not on the software CD. LSI admitted this. There was an init.d script for other Linux distros and a quick bit of hacking produced this:

#!/bin/sh
#
# chkconfig: 2345 20 80
# description: RAIDMon is a daemon that monitors the RAID subsystem
#               And generates e-mail to root
# processname: MegaServ.

# source function library
. /lib/lsb/init-functions

# Hacking by mr349 01/02/2005

case "$1" in
  start)
        megadevice="megadev0"
        rm -f /dev/$megadevice 2>/dev/null
        megamajor=`cat /proc/devices|gawk '/megadev/{print$1}' `
        mknod /dev/$megadevice c $megamajor 0 2>/dev/null
        # New check: 10-31-01: Does node exist
        if [ ! -c /dev/$megadevice ]
        then
          echo "
                Character Device Node /dev/$megadevice does not exist.
                Raid Monitor could not be started
                "
          exit 1
        fi

        echo -n 'Starting RAID Monitor:'
        startproc /usr/sbin/MegaCtrl -start > /dev/null
        sleep 1 ; MegaCtrl -disMail
        touch /var/lock/subsys/raidmon
        MegaCtrl -enChkCon
        # check consistency on a Saturday at 01:00 very 4 weeks
        MegaCtrl -cons -h01 -w4 -d6
    echo
        ;;
  stop)
        echo -n 'Stopping RAID Monitor:'
        startproc /usr/sbin/MegaCtrl -stop
        megadevice="megadev0"
        rm -f /dev/$megadevice 2>/dev/null
        rm -f /var/lock/subsys/raidmon 2>/dev/null
    echo
        ;;
  restart|reload)
        $0 stop
        $0 start
        ;;
  *)
        echo "RAID Monitor is not Started/Stopped"
        echo "Usage: raidmon {start|stop|restart}"
        exit 1
esac

exit 0

I do not like to be emailed about every consistency check that the LSI controller does so turned that off (MegaCtrl -disMail) BUT I do want to know when there is a problem with the RAID array. My simple solution is to grep /var/log/megaserv.log everyday. Then to email me if FAIL is found in the log file.

# check for any reported RAID array errors
30 4 * * * /root/bin/check_raid.sh

The script that is run by crontab:

#!/bin/sh

# Simple RAID check script. Relies on consistency check run by /etc/init.d/raidmon at
# 01:00 every day.
# This is then run by roots crontab at 04:30 everyday.

if grep -iq fail /var/log/megaserv.log
        then date | mail -s "tcmis1 has a RAID array problem" XXXXroot email addressXXXX
fi

exit 0

3Ware 9500 8 port SATA controller

Summary: best I've yet tried for linux.

RAID5 array with 5 SATA (400GB) disks (6th disk is hot spare) with 3Ware controller in a single Xeon (2.8GHz) 1GB memory computer.

Continuous write performance is almost 50MB per second.

The CLI (Command Line Interface) for Linux is excellent. I can do what I like to the controller and cards from the command line + easily script it.