Thursday, July 10, 2008

RAID5,6 and 10 Benchmarks on 2.6.25.5

Copyright: Copyright © 2008 Jon Nelson
Date: Jul 2008

This is an expansion of a previous post ( http://pycurious.blogspot.com/2007/12/some-raid10-performance-numbers.html ).

Since that time, I have redeployed using RAID10,f2. The redeployment went very well, but I'm not getting the performance I quite desired. More on that in another post. In the meantime, I slightly enhanced one of my benchmark scripts and decided to give it a go again.

1   Hardware and Setup

  • the kernel is 2.6.25,5, openSUSE 11.0 "default" kernel for x86-64
  • the CPU is an AMD x86-64, x2 3600+ in power-saving mode (1000 MHz)
  • the motherboard is an EPoX MF570SLI which uses the nVidia MCP55 SATA controller (PCIe).
  • in contrast to an earlier test, this time thera are 4 drives - 4 different makes of SATA II, 7200 rpm drives.
  • each drive is capable of not much more than 80 MB/s (at best - the outermost tracks) and, on average, more like 70 MB/s for the portions of the disk involved in these tests
  • the raids are comprised of 4x 4GB partitions, all from the first 8G of the disk.
  • the system was largely idle but is a live system
  • the system has 1 GB of RAM
  • in contrast to the earlier test, the 'cfq' scheduler was used. I forgot to change it.
  • the stripe_sizes and caches, queues sizes, and flusher parameters were left at their defaults

2   Important Notes

  • the caches were dropped before each invocation of 'dd':

    echo 3 > /proc/sys/vm/drop_caches
    
  • the 'write' portion of the test used conv=fdatasync

  • I did not test filesystem performance. This is just about the edge capabilities of linux RAID in various configurations.

  • I did not use iflag=direct (which sets O_DIRECT)

  • I ran each test 5 times, taking the mean average.

3   Questions

Initially, I just wanted to run a bunch of tests and eyeball the results. It's easy to do that, and draw conclusions from the data. However, it is maybe more useful to ask yourself, "What questions can be answered?" Here are a few questions I came up with, and the answers I came up with:

  1. What did you really test?

    Basically I tested streaming read and write performance to a series of raid levels and formats, using different chunk sizes for each.

    I did not want to use any filesystem which only gets in the way for this kind of test - I wasn't testing the filesystem, I was testing to see how different raid formats, layouts, and chunk sizes make a difference.

    A future installment may include filesystem testing as well, which I find just as if not more important, however it's so much more variable that I'm not really sure much sense can be found in the noise.

  2. Why didn't you include my-favorite-raid-level?

    I only wanted to include raid levels for which there is some redundancy. I could have included raid 1+0 but my test script is not sufficiently smart for that. Perhaps I'll include that in a future installment.

  3. Can I have the source to the test program?

    Sure. I'll try to make it available if somebody asks, but it's really nothing special. Futhermore, it's my intent to refine it a bit to support filesystem testing (via bonnie++ or iozone, preferred) and so on.

  4. When using raid5, does the format matter?

    If you squint your eyes a bit, the write performance, regardless of format, were all pretty close. The read performance was more variable, but still did not vary all that much. Chunk size seemed to matter more. Left-symmetric did the best overall, however.

  5. How is the performance graphed versus predicted?

    Left to the reader to comment!

  6. Did you do the readahead settings for the test?

    No. I left them at their defaults.

  7. What are you using to generate this?

    I am using reStructuredText, combined with Pygments.

  8. Which tool do you use to make graphs?

    Google Charts (by way of pygooglechart), a bunch of shell and Python. I used flot previously.

  9. How do the individual drives perform?

The drives are:

<6>ata3.00: ATA-7: Hitachi HDT725032VLA360, V54OA52A, max UDMA/133
<6>ata4.00: ATA-8: SAMSUNG HD321KJ, CP100-10, max UDMA7
<6>ata5.00: ATA-7: ST3320620AS, 3.AAK, max UDMA/133
<6>ata6.00: ATA-8: WDC WD3200AAKS-75VYA0, 12.01B02, max UDMA/133

And their performance:

/dev/sdb:
Timing buffered disk reads:  218 MB in  3.01 seconds =  72.44 MB/sec

/dev/sdc:
Timing buffered disk reads:  234 MB in  3.00 seconds =  77.92 MB/sec

/dev/sdd:
Timing buffered disk reads:  228 MB in  3.02 seconds =  75.60 MB/sec

/dev/sde:
Timing buffered disk reads:  234 MB in  3.02 seconds =  77.57 MB/sec
  1. What difference does the scheduler make?

    As can clearly be seen on the RAID5 graphs, the IO scheduler can make a big difference. Using cfq or nooop, reads start out almost a full point faster than the others, and writes are 1/2 point faster.

    On the other hand, for RAID6, the scheduler doesn't seem to make much difference at all. At least for streaming reads/writes, which is all I'm testing here.

    For RAID10,n2 and RAID10,o2 the story is the same as for RAID6, but there is some impact (up to 1.0 points!) for RAID10,f2.

  2. What revisions have you made to this document?

    I re-ran the tests to include 2048K chunk sizes, and removed 128K as it wasn't very interesting and it cluttered up the graphs.

    I also re-ran the entire set of tests for the other three schedulers, noop, anticipatory, and deadline.

    I re-did the graphs using the Google Charts API (by way of pygooglechart) instead of using flot. There was nothing wrong with flot, in fact I found the software really nice to use, but some people found the google charts "prettier" and it's somewhat easier for me to use.

4   Unanswered Questions

  1. While I don't have the data in this article, I did originally perform these tests on 2.6.22.18. The results were rather noisier, and in most cases a bit worse.

  2. Why aren't raid10,f2 reads getting closer to 4.0x ?

  3. What's with the strange drop in performance at 512K chunk sizes for RAID10,f2 for the deadline and noop schedulers, only to rise again at 1024K (and then drop at 2048K)?

  4. Why are raid10,o2 reads so AWFUL?

    Neil Brown was kind enough to suggest re-running with a larger chunk size, which I did.

    The read performance did, indeed, perform better. Up to the 3.0 mark, in fact.

  5. Why do raid6 reads behave the way they do? I would have expected a more linear graph - the raid6 write graph is very smooth.

    From 64 to 256k chunk size, there is little change (in either direction, for reads or writes) but at 512K the reads really improve and continue to do so as the chunk size increases.

  6. What should the theoretical performance of the various raid levels and formats look like?

    For raid10,f2 I would suspect that 4.0 would be perfect (for reads), and for sustained writes something like 1.5.

    I get 1.5 like this:

    the avg. speed of writing a given chunk of data should look like this:

    avg of writing to outer track + writing to inner track -> (70 + 35) / 2.0, (assuming inner track is 1/2 the speed of outer tracks) and theoretically we could write to 2 devices at a time, so... (( 70 + 35 ) / 2.0) * 2.0 / 70.0 = 1.5x.

    In reality, we do a bit better than that, probably due to the fact that I'm not using the whole disk and therefore the speed of the inner tracks of the region I'm actually using is greater than would otherwise be true.

5   Tables, Charts n Graphs

The following results are expressed in terms of a single with (a baseline) with 1.0 being the speed of a single drive (about 70MB/s).

Everything
scheduler level layout chunk writing reading
cfq raid10 f2 64 1.48 3.01
cfq raid10 f2 128 1.49 3.88
cfq raid10 f2 256 1.50 3.68
cfq raid10 f2 512 1.52 3.65
cfq raid10 f2 1024 1.47 3.76
cfq raid10 f2 2048 1.52 3.73
cfq raid10 n2 64 1.78 1.89
cfq raid10 n2 128 1.85 1.87
cfq raid10 n2 256 1.82 2.00
cfq raid10 n2 512 1.84 2.15
cfq raid10 n2 1024 1.83 2.42
cfq raid10 n2 2048 1.83 2.70
cfq raid10 o2 64 1.83 1.96
cfq raid10 o2 128 1.80 1.96
cfq raid10 o2 256 1.84 1.98
cfq raid10 o2 512 1.80 1.98
cfq raid10 o2 1024 1.83 2.49
cfq raid10 o2 2048 1.80 3.13
cfq raid5 left-asymmetric 64 1.72 2.51
cfq raid5 left-asymmetric 128 1.67 2.79
cfq raid5 left-asymmetric 256 1.52 2.92
cfq raid5 left-asymmetric 512 1.31 2.76
cfq raid5 left-asymmetric 1024 1.06 3.44
cfq raid5 left-asymmetric 2048 0.56 3.25
cfq raid5 left-symmetric 64 1.74 2.71
cfq raid5 left-symmetric 128 1.73 2.76
cfq raid5 left-symmetric 256 1.55 2.97
cfq raid5 left-symmetric 512 1.34 2.88
cfq raid5 left-symmetric 1024 1.08 3.44
cfq raid5 left-symmetric 2048 0.58 3.50
cfq raid5 right-asymmetric 64 1.75 2.70
cfq raid5 right-asymmetric 128 1.61 2.88
cfq raid5 right-asymmetric 256 1.58 2.88
cfq raid5 right-asymmetric 512 1.28 2.88
cfq raid5 right-asymmetric 1024 1.04 3.25
cfq raid5 right-asymmetric 2048 0.54 3.31
cfq raid5 right-symmetric 64 1.75 2.79
cfq raid5 right-symmetric 128 1.69 2.81
cfq raid5 right-symmetric 256 1.56 2.88
cfq raid5 right-symmetric 512 1.30 2.75
cfq raid5 right-symmetric 1024 1.01 3.02
cfq raid5 right-symmetric 2048 0.49 3.24
cfq raid6   64 1.30 1.76
cfq raid6   128 1.24 1.96
cfq raid6   256 1.17 1.91
cfq raid6   512 1.04 2.70
cfq raid6   1024 0.87 2.92
cfq raid6   2048 0.60 3.31
deadline raid10 f2 64 1.78 2.63
deadline raid10 f2 256 1.82 3.80
deadline raid10 f2 512 1.72 3.32
deadline raid10 f2 1024 1.75 3.61
deadline raid10 f2 2048 1.47 3.40
deadline raid10 n2 64 1.96 1.21
deadline raid10 n2 256 1.88 1.85
deadline raid10 n2 512 1.84 2.10
deadline raid10 n2 1024 1.89 2.41
deadline raid10 n2 2048 1.84 2.59
deadline raid10 o2 64 1.80 1.94
deadline raid10 o2 256 1.82 1.96
deadline raid10 o2 512 1.73 1.94
deadline raid10 o2 1024 1.87 2.63
deadline raid10 o2 2048 1.82 3.13
deadline raid5 left-asymmetric 64 1.67 2.55
deadline raid5 left-asymmetric 256 1.43 2.84
deadline raid5 left-asymmetric 512 1.22 2.76
deadline raid5 left-asymmetric 1024 1.04 3.27
deadline raid5 left-asymmetric 2048 0.52 3.31
deadline raid5 left-symmetric 64 1.61 2.32
deadline raid5 left-symmetric 256 1.42 2.89
deadline raid5 left-symmetric 512 1.26 2.89
deadline raid5 left-symmetric 1024 1.08 3.14
deadline raid5 left-symmetric 2048 0.55 3.31
deadline raid5 right-asymmetric 64 1.68 2.15
deadline raid5 right-asymmetric 256 1.50 2.88
deadline raid5 right-asymmetric 512 1.23 2.83
deadline raid5 right-asymmetric 1024 0.97 3.44
deadline raid5 right-asymmetric 2048 0.47 3.24
deadline raid5 right-symmetric 64 1.64 2.11
deadline raid5 right-symmetric 256 1.50 2.84
deadline raid5 right-symmetric 512 1.22 2.83
deadline raid5 right-symmetric 1024 1.00 3.02
deadline raid5 right-symmetric 2048 0.43 3.19
deadline raid6   64 1.22 1.73
deadline raid6   256 1.20 1.75
deadline raid6   512 1.04 2.45
deadline raid6   1024 0.89 3.19
deadline raid6   2048 0.57 3.32
anticipatory raid10 f2 64 1.62 2.59
anticipatory raid10 f2 128 1.59 3.50
anticipatory raid10 f2 256 1.61 3.46
anticipatory raid10 f2 512 1.65 3.73
anticipatory raid10 f2 1024 1.61 3.58
anticipatory raid10 f2 2048 1.47 3.80
anticipatory raid10 n2 64 1.87 1.21
anticipatory raid10 n2 128 1.83 1.45
anticipatory raid10 n2 256 1.83 1.90
anticipatory raid10 n2 512 1.83 2.20
anticipatory raid10 n2 1024 1.82 2.45
anticipatory raid10 n2 2048 1.82 2.70
anticipatory raid10 o2 64 1.82 1.91
anticipatory raid10 o2 128 1.85 1.94
anticipatory raid10 o2 256 1.86 2.05
anticipatory raid10 o2 512 1.80 1.96
anticipatory raid10 o2 1024 1.83 2.63
anticipatory raid10 o2 2048 1.78 3.19
anticipatory raid5 left-asymmetric 64 1.62 2.42
anticipatory raid5 left-asymmetric 128 1.59 2.63
anticipatory raid5 left-asymmetric 256 1.48 2.79
anticipatory raid5 left-asymmetric 512 1.32 2.88
anticipatory raid5 left-asymmetric 1024 1.10 3.37
anticipatory raid5 left-asymmetric 2048 0.54 3.25
anticipatory raid5 left-symmetric 64 1.67 2.49
anticipatory raid5 left-symmetric 128 1.62 2.76
anticipatory raid5 left-symmetric 256 1.52 2.83
anticipatory raid5 left-symmetric 512 1.32 2.76
anticipatory raid5 left-symmetric 1024 1.10 3.32
anticipatory raid5 left-symmetric 2048 0.58 3.25
anticipatory raid5 right-asymmetric 64 1.67 2.17
anticipatory raid5 right-asymmetric 128 1.55 2.63
anticipatory raid5 right-asymmetric 256 1.48 2.76
anticipatory raid5 right-asymmetric 512 1.30 2.92
anticipatory raid5 right-asymmetric 1024 1.09 3.37
anticipatory raid5 right-asymmetric 2048 0.52 3.37
anticipatory raid5 right-symmetric 64 1.72 2.19
anticipatory raid5 right-symmetric 128 1.67 2.63
anticipatory raid5 right-symmetric 256 1.47 2.88
anticipatory raid5 right-symmetric 512 1.32 2.88
anticipatory raid5 right-symmetric 1024 1.07 3.02
anticipatory raid5 right-symmetric 2048 0.47 3.20
anticipatory raid6   64 1.26 1.75
anticipatory raid6   128 1.22 1.67
anticipatory raid6   256 1.19 1.77
anticipatory raid6   512 1.03 2.59
anticipatory raid6   1024 0.91 3.08
anticipatory raid6   2048 0.58 3.24
noop raid10 f2 64 1.40 2.71
noop raid10 f2 256 1.42 3.80
noop raid10 f2 512 1.42 3.38
noop raid10 f2 1024 1.42 3.65
noop raid10 f2 2048 1.46 3.38
noop raid10 n2 64 1.84 1.21
noop raid10 n2 256 1.83 1.90
noop raid10 n2 512 1.85 2.18
noop raid10 n2 1024 1.85 2.45
noop raid10 n2 2048 1.83 2.55
noop raid10 o2 64 1.82 1.90
noop raid10 o2 256 1.85 1.92
noop raid10 o2 512 1.80 1.97
noop raid10 o2 1024 1.62 2.63
noop raid10 o2 2048 1.78 3.13
noop raid5 left-asymmetric 64 1.75 2.63
noop raid5 left-asymmetric 256 1.62 2.92
noop raid5 left-asymmetric 512 1.37 2.92
noop raid5 left-asymmetric 1024 1.09 3.32
noop raid5 left-asymmetric 2048 0.54 3.50
noop raid5 left-symmetric 64 1.78 2.20
noop raid5 left-symmetric 256 1.62 2.88
noop raid5 left-symmetric 512 1.37 2.88
noop raid5 left-symmetric 1024 1.12 3.25
noop raid5 left-symmetric 2048 0.58 3.37
noop raid5 right-asymmetric 64 1.78 2.23
noop raid5 right-asymmetric 256 1.61 2.97
noop raid5 right-asymmetric 512 1.38 2.89
noop raid5 right-asymmetric 1024 1.04 3.30
noop raid5 right-asymmetric 2048 0.52 3.25
noop raid5 right-symmetric 64 1.78 2.29
noop raid5 right-symmetric 256 1.65 2.84
noop raid5 right-symmetric 512 1.38 2.92
noop raid5 right-symmetric 1024 1.09 3.03
noop raid5 right-symmetric 2048 0.47 3.19
noop raid6   64 1.29 1.72
noop raid6   256 1.21 1.84
noop raid6   512 1.05 2.56
noop raid6   1024 0.88 3.08
noop raid6   2048 0.61 3.31

7 comments:

Anonymous said...

You can also try:

RAID5 via Google Charts,

RAID6, and

RAID10.

NeilBrown said...

Why are raid10,o2 reads so AWFUL?

to get a read boost with o2, you need to have chunks large enough that they totally
cover one or more cylinders. That way
the drive can seek over skipped chucks
rather than read over them.

If your chunks are exactly cylinder sized and perfectly aligned you would get close to a factor of 2. But perfect alignment is impossible with today's drives. So the best you can get is having the chunk size a little over twice the cylinder size. Then you will also skip one cylinder and sometimes two. If your chunksize is between 1 and 2 cylinders you will sometimes skip one cylinder, and so get a partial speedup. I think you are seeing that with the chunksize of 1024. Try 2048!

Anonymous said...

neilbrown: I adjusted the configuration to include 2048 (and exclude 128) and re-ran the tests. I'll be putting the results up in the next day or so!

Anonymous said...

I'd be interested in a graph where the best performing raid5, raid6, raid10 configs were stacked up against each other.

Thanks,

Leif

Anonymous said...

I was also thinking that another graph: all of the raid levels and layouts as the xaxis for just one chunk size. Probably 64, 512, and 2048K.

djh said...

broken images!

the html page is ../2008/07/.. and the img files are referred to as being in a subdirectory. But actually the image files are in a subdirectory of ../2007/08/.. so they don't appear!

Anonymous said...

Broken images fixed.