Thursday, February 3, 2011

Should I worry about the integrity of my linux software RAID5 after a crash or kernel panic?

I have a dual core Intel i5 Ubuntu Server 10.04 LTS system running kernel 2.6.32-22-server #33-Ubuntu SMP with three 1TB SATA hard drives set up in a RAID5 array using linux md devices. I have read about the RAID5 write hole and am concerned: if my linux system locks up or kernel panics, should I be assume that the integrety of my data has been compromised and restore from backup? How can I know if the data on the RAID5 array is "safe"?

EDIT: Output of mdadm --detail:

root@chef:/var/lib/vmware# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Thu May 27 04:03:01 2010
     Raid Level : raid5
     Array Size : 1953521536 (1863.02 GiB 2000.41 GB)
  Used Dev Size : 976760768 (931.51 GiB 1000.20 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Jun  7 19:12:07 2010
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 34bc9cc3:02783ea4:65f2b931:77c8854b
         Events : 0.688611

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
  • You should probably be more concerned as to why your system crashed or Kernel Panicked.

    Raid Cards these days do an extremely good job at using cache to its advantage and this significantly reduces the likely hood of a "hole." If it was something in particular i was paranoid about, I'd setup a tripwire-like system (see link below) for detecting corruption in my key files.

    As for actually testing for corruption, http://linas.org/linux/raid.html Most of the tools listed on that website under "General System Corruption" should do the trick for 99% of corruption.

    Josh : Thanks. I will be posting a second question about why the system locked up -- but I believe it was a one time thing. The system does not have a hardware RAID card, which is why I was concerned about the RAID5 write hole.
    GruffTech : Ahhh, Software raid is another story. I don't have any negative experiences with software raid myself, However I don't trust it because its at the OS level which is susceptible to a whole other level of problems. I strongly suggest getting a hardware raid card if the data is mission critical.
    Josh : @GruffTech: Thanks. Hardware raid is not in the budget for this server at this point, so I'm resorting to rigorous backups. That's why I asked about how much I can trust software RAID5. The link you provided was very useful. Thanks!
    From GruffTech

0 comments:

Post a Comment