public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [i386] memcmp very slow
@ 2002-10-29  8:00 Roger Sayle
  0 siblings, 0 replies; 2+ messages in thread
From: Roger Sayle @ 2002-10-29  8:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc, libc-alpha


Hi Paolo,

Could you describe which processor (and perhaps the version of
GCC) you're timing?  I seem to be unable to reproduce your results,
as my numbers show your routine slightly slower on a 1.2GHz AMD
athlon using GCC 3.3 (experimental) on i686-pc-cygwin (1.472s vs. 1.402s
for a x100 version of your test.c benchmark).  However both are much
better than the system memcmp (1.57s).

I'd love to see a faster i386 "memcmp" in glibc, newlib and,
if possible, GCC.

My only concern with disabling x86 memcmp in the compiler is that
GCC is also used to generate x86 code for non-glibc targets, where
not inlining memcmp may adversely affect performance.

The algorithm itself looks very clever.  An implementation in
GCC's i386 backend could even make use of the known alignment
information, and then even transform "memcmp(a,b,c)" into
"-memcmp(b,a,c)" to take advantage of your algorithm's asymmetry
if a and b were known to have different alignments.

If I may make two suggestions.  You should add another loop
to test.c's main, as the resolution of time(1) means that its
safer to compare times in seconds rather than hundredths of a
second.  Secondly, you should compare the "user" (or CPU) times
reported by time(1) rather than the "real" (or elapsed) times.

Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [i386] memcmp very slow
@ 2002-10-28 17:28 Bonzini
  0 siblings, 0 replies; 2+ messages in thread
From: Bonzini @ 2002-10-28 17:28 UTC (permalink / raw)
  To: gcc; +Cc: libc-alpha

I think that the pattern that is used when inlining memcmp for the i386
should be disabled, except perhaps at -Os.  cmpsb is completely unoptimized
in the latest and not-so-latest processors (from the Pentium on).

Here are the results of timing a hand-written memcmp that I just submitted
to the libc-alpha mailing list (including call-return overheads) vs. the GCC
inline memcmp:

    utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c memcmp.S
    utente@engineer:~/esperimenti$ time ./a.out

    real    0m0.088s
    user    0m0.090s
    sys     0m0.000s

    utente@engineer:~/esperimenti$ gcc -g -O3 test.c
    utente@engineer:~/esperimenti$ time ./a.out

    real    0m0.102s
    user    0m0.100s
    sys     0m0.010s

The results that the inlined memcmp gives compared the current glibc memcmp
seem to be encouraging:

    utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c
    utente@engineer:~/esperimenti$ time ./a.out

    real    0m0.108s
    user    0m0.100s
    sys     0m0.010s

but this is only because glibc has a poor implementation of memcmp for the
i386 which also is nothing more than a cmpsb.

The handwritten assembly language memcmp implementation that I refer to is
available at http://sources.redhat.com/ml/libc-alpha/2002-10/msg00496.html

Paolo Bonzini


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-10-29  4:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-29  8:00 [i386] memcmp very slow Roger Sayle
  -- strict thread matches above, loose matches on Subject: below --
2002-10-28 17:28 Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).