On 6 March, 2006 there was an alert from iocean regarding probable failure with the RAID disk system:
==============================================================
A problem has been detected on this server.
Status Summary
Reason(s) for notification:
Drives
Server:
Host : iocean
Model : RackMac3,1
Uptime : 67016 minutes
OS version : Mac OS X Server 10.4.4 (8G32)
Processor : 2 x 2000 MHz
Memory : 1024 MB
BootROM : $0005.17f1
Serial : QP41703XPNK
Memory:
Memory Slot "DIMM0/J11" : 512MB, ECC DDR SDRAM, PC3200U-30330
Memory Slot "DIMM1/J12" : 512MB, ECC DDR SDRAM, PC3200U-30330
Drives:
Drive 1 (disk2) : Normal
Drive 2 (disk1) : Normal
Drive 3 (disk0) : Warning
==============================================================
Over the course of the few days leading up to the failure warning, a number of users reported slow database performance in some web applications and services provided by iocean (Hlab Forum, etc.). The data disk[s] on iocean (/Volumes/iodata) is a RAID 1 (mirrored) diskset consisting of two (disk0 and disk2) of the three disks on the server (the third disk, disk1, is a standalone volume, ioceanHD, which contains the operating system and applications). The RAID volume, iodata, is a software-based RAID.
iocean is under AppleCare warranty which provides hardware support for up to three years after purchase. We ordered a new disk from Apple through the Upper-Campus Tech Services and it arrived late afternoon on 7 March. We replaced the failed disk with the new disk the next morning, 8 March, and tried to start rebuilding the RAID array using the Disk Utility (GUI) tool on iocean. This is accomplished by dragging the new disk icon into the RAID window. However, this didn’t work. The GUI tool gave no indication as to why this failed. We next went to the commandline tool, ‘diskutil’ to diagnose the problem. Using the command:
===========================================
iocean:~ jrw$ diskutil list
/dev/disk0
#: type name size identifier
0: Apple_partition_scheme *233.8 GB disk0
1: Apple_partition_map 31.5 KB disk0s1
2: Apple_Boot 128.0 MB disk0s2
3: Apple_RAID 233.6 GB disk0s3
/dev/disk1
#: type name size identifier
0: Apple_partition_scheme *76.7 GB disk1
1: Apple_partition_map 31.5 KB disk1s1
2: Apple_HFS ioceanHD 76.6 GB disk1s3
/dev/disk2
#: type name size identifier
0: Apple_partition_scheme *233.8 GB disk2
1: Apple_partition_map 31.5 KB disk2s1
2: Apple_Boot 512.0 KB disk2s2
3: Apple_RAID 233.8 GB disk2s3
/dev/disk3
#: type name size identifier
0: Apple_HFSX iodata *233.6 GB disk3
===========================================
This showed that the old disk, ‘disk2′ (formated in Panther) had a slightly different partition mapping than the new disk, ‘disk0′. The disk2 had a smaller “Apple_Boot” (disk2s2) partition than the new disk formatted with Apple Disk Utility, thus, the main data partition (disk2s3) was larger than the old disk. The software RAID application won’t allow RAIDs with disks of dissimilar partition maps.
The solution:
Create a single filesystem disk out of the new disk using the Disk Utility partitioning option (HFS+ w/Case Insensitivity and Journaling) .
Copy the current filesystem running on the remaining disk of the degraded RAID set over to the newly formatted disk using Carbon Copy Cloner (or the Restore tab of Disk Utility).
Create a new unpaired mirror RAID set on the new disk, using the “enableRAID” command under the commandline ‘diskutil’ application.
Delete the old RAID array (do this with extreme caution because all of the data on this disk will be erased, i.e., make sure all of the data on this disk has been copied to the new unpaired RAID array before taking this step) using either Disk Utility (GUI) or diskutil (commandline).
Repartition the old disk (the remaining good disk in the old RAID array) so it matches the partition map of the new disk
Using the “repairMirror” option of the commandline ‘diskutil’ application, or by dragging the newly partitioned old disk into the new RAID set in Disk Utility, this disk is incorporated into the newly established RAID array.
The RAID repair, or rebuild, is run as a background process, which means that the computer continues to function online, though with somewhat degraded performance, throughout the rebuild. All user activity should be unaffected. If the problem with the different partition maps hadn’t cropped up, the entire process outlined above could have happened without ever taking the system offline. The physical disks are “hot-swappable” (they can be removed and inserted without taking the system offline) so they can be replaced and the RAID rebuilt without a break in service.