public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* mmap versus read benchmark
@ 2000-11-14 20:03 Zack Weinberg
  2000-11-15  8:55 ` Bourne-again Superuser
  0 siblings, 1 reply; 2+ messages in thread
From: Zack Weinberg @ 2000-11-14 20:03 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

Since the discussion on mmap versus write, I decided to re-do my
benchmark of mmap versus read.  Attached to this message are two
graphs of the results on my system, and a bunch of files which
hopefully will enable you to repeat the test on yours.

Look at the graphs first.  Each has four traces: these are user and
system CPU time as measured by getrusage(2) for reading and
preprocessing simple files of varied length.  Sort of.  The test goes
like this:

1. Create a bunch of files by repeating the exact same content N
   times.  The size range is from 512 bytes up to 1 megabyte.

2. Preprocess each one 1000 times, reading in all files with read and
   discarding all output.  This happens with cpplib's file cache
   enabled, so each file is read into user memory only once.  Report
   user and system times.
3. Do it over again, reading all files with mmap.  Again, the file
   cache is enabled.

4. Do it over again with read, but disable the file cache so that each
   file is read 1000 times.
5. And finally do it over again with mmap and the cache disabled; each
   file is mapped and unmapped 1000 times.

In each of steps 2-5 it reports user and system time consumed to
process each file.

If you look at the raw data, you find that to first order all the time
is spent in user mode doing actual work.  "Bourne-again Superuser"
<toor@dyson.jdyson.com> predicted this.  On my machine, the complete
benchmark took one hour of wall time, of which only one minute was
system activity.  So we can conclude that mmap versus read is a
first-order wash; but we want to dig a little deeper.

Therefore, I wrote a Perl script which subtracts the times with cache
enabled from the times with cache disabled, leaving only the time
consumed (in both user and kernel mode) by the actual operation of
reading or mmapping the file 999 times.  The graphs attached to this
message show results after this operation.  One is autoscaled, the
other is clipped vertically at 4 seconds so you can see what's going
on in the bottom region.

They show clearly that read is slower than mmap for large files - what
isn't obvious due to the scale, unfortunately, is that (a) it's
measurably slower all the way down to about 32k, and (b) read takes
linear time, while mmap takes constant time at least up to
128k. (Notice that the x-axis of the plot is logarithmic.)  This may
be easier to tell from the clipped plot.  

All the traces start trending upward, irregularly, after 128k, and the
constant on the sys/read trace gets a lot bigger.  This is presumably
some sort of cache issue, but I'm not sure why it happens at 128k;
this machine has 512k of secondary cache.  It might be TLB, or lack of
page coloring, or competition with program text.

Here's how you run this test on your system.  Attached are a patch, a
C source file, a perl script, and a gnuplot script.  If you don't have
gnuplot, any plotting utility will do.  Apply the patch to a current
GCC tree.  [I'm not sure if these changes should be applied officially
or not.]  Then drop the C source file, mmapbench.c, into the gcc
subdirectory.

In your build tree, make clean and rebuild libcpp with optimization.
Then compile mmapbench.c the same way.  I did

$ make clean libcpp.a mmapbench.o CC=/work/inst/bin/gcc \
	CFLAGS='-O2 -fomit-frame-pointer -march=i686'

where /work/inst/bin/gcc was built from the 20001027 CVS tree.  Now
link mmapbench:

$ gcc -o mmapbench mmapbench.o libcpp.a ../libiberty/libiberty.a

In a scratch directory, run mmapbench:

$ time ./mmapbench >bench.raw

You will need approximately six megabytes of disk space and a couple
hours of CPU time.  (Almost exactly one hour on my box - PIII 500MHz,
256MB RAM, Linux 2.2).  It might be a good idea to close down most or
all other processes running, to reduce noise from random background
activity.

Now, crunch the data and make the graphs:

$ perl mmapcrunch.pl <bench.raw >bench.cr
$ gnuplot mmapcrunch.gplot

which will give you two plots, bench.png and benchclip.png, made
exactly the same way as the ones I've attached here.

zw


[-- Attachment #2: mmapcrunch.pl --]
[-- Type: text/x-perl, Size: 1622 bytes --]

#! /usr/bin/perl -w

# We have four data sets, each consisting of N lines which read
# <size> <user> <system>, in the order: read/nopurge, mmap/nopurge,
# read/purge, mmap/purge.  Each block is separated by two blank lines.
#
# We want to subtract the nopurge times from the purge times, isolating
# the time (user and system) spent in read or mmap.  (Which should include
# delays due to worse cache utilization.

sub max { return $_[0] > $_[1] ? $_[0] : $_[1]; }

@read_nopurge = ();
@mmap_nopurge = ();
@read_purge = ();
@mmap_purge = ();

for $vec (\@read_nopurge, \@mmap_nopurge, \@read_purge, \@mmap_purge) {

    while (<>) {
	last if /^$/;
	push @$vec, $_;
    }

    die "mangled input" unless eof() || ($_ = <>) =~ /^$/;
}

die "vectors not all equal"
    unless $#read_nopurge == $#mmap_nopurge
    &&     $#mmap_nopurge == $#read_purge
    &&     $#read_purge   == $#mmap_purge;

for ($i = 0; $i <= $#read_nopurge; $i++) {
    @A = split(/ /, $read_nopurge[$i]);
    @B = split(/ /, $read_purge[$i]);

    die "mismatched sizes, index $i\n" unless $A[0] == $B[0];
    die "wrong number of fields, index $i\n" unless $#A == 2 && $#B == 2;

    printf("%.1f %.2f %.2f\n", $A[0] / 1024,
	   max($B[1] - $A[1], 0), max($B[2] - $A[2], 0));
}

print "\n\n";

for ($i = 0; $i <= $#mmap_nopurge; $i++) {
    @A = split(/ /, $mmap_nopurge[$i]);
    @B = split(/ /, $mmap_purge[$i]);

    die "mismatched sizes, index $i\n" unless $A[0] == $B[0];
    die "wrong number of fields, index $i\n" unless $#A == 2 && $#B == 2;

    printf("%.1f %.2f %.2f\n", $A[0] / 1024,
	   max($B[1] - $A[1], 0), max($B[2] - $A[2], 0));
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mmap versus read benchmark
  2000-11-14 20:03 mmap versus read benchmark Zack Weinberg
@ 2000-11-15  8:55 ` Bourne-again Superuser
  0 siblings, 0 replies; 2+ messages in thread
From: Bourne-again Superuser @ 2000-11-15  8:55 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

Zack Weinberg said:
>
> If you look at the raw data, you find that to first order all the time
> is spent in user mode doing actual work.  "Bourne-again Superuser"
                                            ^^^^^^^^^^^^^^^^^^^^^^^^
> <toor@dyson.jdyson.com> predicted this.  On my machine, the complete
> benchmark took one hour of wall time, of which only one minute was
> system activity.  So we can conclude that mmap versus read is a
> first-order wash; but we want to dig a little deeper.
> 

Darn'it -- another account that I forgot to setup my name correctly!!! :-(.

John

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2000-11-15  8:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-11-14 20:03 mmap versus read benchmark Zack Weinberg
2000-11-15  8:55 ` Bourne-again Superuser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).