I've been using Linux software raid (md) for a very long time - more or less since the beginning, and it's quite honestly been great to me. I've always deployed in a raid5 configuration and never gave much thought to the other levels. Recently, raid10 became available. Not raid1+0 but was is considered (by some) to be a non-standard raid10 implementation which allows a non-even number of components to comprise the (usual) raid1 portion. See the
wikipedia article and 'man md' for more details.
Some notes about the hardware and setup:
- the kernel is 2.6.22.12, openSUSE "default" kernel
- the CPU is an AMD x86-64, x2 3600+ in power-saving mode (1000 MHz)
- the motherboard is an EPoX MF570SLI which uses the nVidia MCP55 SATA controller (PCIe)
- the drives are 3 different makes of SATA II, 7200 rpm
- each drive is capable of not more than 75 MiB/s (at best - the outermost tracks) and closer to 70 MiB/s the portions of the disk involved in these tests
- each drive is partitioned, identically, into 4 partitions. This test involves the third partition, 4 GiB in size, which is 2 GiB from the start.
- the system was largely idle but does other things
- the system has 1 GiB of RAM
- the raid was created with:
mdadm --create /dev/md2 --level=${level} --raid-devices=3 --spare-devices=0 --layout=${format} --chunk=256 --assume-clean --metadata=1.0 ${DEVICES}
- the deadline I/O scheduler was used on each component drive
- the stripe_sizes and caches, queues sizes, and flusher parameters were left at their defaults
- the caches were dropped before each invocation of 'dd':
echo 3 > /proc/sys/vm/drop_caches
- the 'write' portion consisted of this dd invocation:
dd if=/dev/zero of=/dev/md2 bs=256K count=15000 conv=fdatasync
- the 'read' portion consisted of this dd invocation:
dd if=/dev/md2 of=/dev/null bs=256K count=15000
- I did not test any chunk size other than 256K but probably will in the future.
- I can supply the entire test script if necessary (I intend on doing this in the future, after some additional refinement.)
It is also worthwhile to note that these tests involve the block layer and not the filesystem layer, and are not intended to be a my-raid-is-faster-than-your-raid test but instead a (brief) exploration into raid10 on Linux md. I chose to compare it against raid0 and raid5 as these are common and more likely to be well understood.
The Results:
The results are all in MiB/s.
NOTE: As of 2007-Dec-30 I have updated the table (but not the chart, yet) with more representative numbers. I removed some run-time noise and ran each test 3 times, taking the mean average.
| level |
format |
Writing |
Reading |
Writing (Degraded) |
Reading (Degraded) |
| raid5 |
left-asymmetric |
55 |
129 |
46 |
124 |
| raid5 |
left-symmetric |
54 |
123 |
50 |
122 |
| raid5 |
right-asymmetric |
54 |
124 |
49 |
124 |
| raid5 |
right-symmetric |
54 |
128 |
49 |
116 |
| raid10 |
n2 |
103 |
95 |
103 |
104 |
| raid10 |
o2 |
102 |
94 |
100 |
102 |
| raid10 |
f2 |
97 |
162 |
97 |
51 |
| raid0 |
- |
205 |
186 |
n/a |
n/a |
An alternate chart (thanks due to suggestions received on the Linux-RAID mailing list.), and assuming 'x' is 70 MiB/s, or the speed of one component:
| level |
format |
Writing |
Reading |
Writing (Degraded) |
Reading (Degraded) |
| raid5 |
left-asymmetric |
0.8 |
1.8 |
0.7 |
1.8 |
| raid5 |
left-symmetric |
0.8 |
1.8 |
0.7 |
1.7 |
| raid5 |
right-asymmetric |
0.8 |
1.8 |
0.7 |
1.8 |
| raid5 |
right-symmetric |
0.8 |
1.8 |
0.7 |
1.7 |
| raid10 |
n2 |
1.5 |
1.4 |
1.4 |
1.4 |
| raid10 |
o2 |
1.5 |
1.4 |
1.4 |
1.4 |
| raid10 |
f2 |
1.4 |
2.3 |
1.4 |
0.7 |
| raid0 |
- |
3.0 |
2.8 |
n/a |
n/a |
Chart:
What do these numbers tell us?
(Of course, these observations only apply for 3 drives in this configuration. Caveat, handwaving.)
- Degraded read speed on raid5 is 85-90% of non-degraded. That's pretty good.
- Degraded writing on raid5 is virtually indistinguishable for non-degraded.
- raid10 near and offset performance, reading or writing, degraded or not, is very consistent.
- raid10 far layout has awesome read performance (non-degraded) - I'd ballpark it near raid0 performance. Degraded, however, shows much worse performance. Why?
- raid10 far layout has no discernable performance difference when writing in degraded mode.
Footnotes:
-
Whenever possible, always use conv=fdatasync for tests like this. I'll explain why in a future installment.
- I did not use iflag=direct (which sets O_DIRECT).
2 comments:
Suggestion: benchmark random I/O (64k here, 64 there...), not only contiguous read/writes done on contiguous blocks.
RAID10 reads each fraction of the needed blocks from a different spindle. When you have a single mirror and read B blocks, it read B/2 blocks from one drive and the rest from another one. In degraded mode there is a missing spindle, hence the read performance loss.
That's true. Some recent benchmarks on the linux-raid mailing list have shown that raid10,f2 outperforms even raid0 when it comes to random I/O (writing), in some cases by quite a bit.
There is also a patch that is being explored that would change the raid10,f2 algorithm to *always* use the outer tracks instead of just the "nearest" tracks of a given disk, but since each disk is constantly switching back and forth between the inner and outer tracks (as each disk is a mirror for some /other/ disk) I wonder if it will help at all.
Post a Comment