public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/42621]  New: 4.4/4.5 Regression, Computed gotos on AMD 800% slower
@ 2010-01-05 11:44 fredrik dot svahn at gmail dot com
  2010-01-05 12:46 ` [Bug rtl-optimization/42621] [4.4/4.5 Regression] " rguenth at gcc dot gnu dot org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: fredrik dot svahn at gmail dot com @ 2010-01-05 11:44 UTC (permalink / raw)
  To: gcc-bugs

When compiling a program with computed goto:s with gcc 4.4.2 it runs
significantly slower (up to a factor 10) than when it is compiled with e.g. gcc
4.1/4.3 with the same optimization flags (-O2 or -O3). A small dummy test
program without header file dependencies is attached.

I am compiling with a commandline like "gcc -O3 test.c -o testp.4.4.2", and run
the generated executable without arguments, like "./testp.4.4.2". Generating
cpu specific instructions, e.g. "-march=athlon64" seems to make no difference.
I have also tried with "-fno-gcse" (as recommended in the docs) to no avail.
Same results with targets x86_64 and i686 on Novell SLES 10 and Arch Linux.

Interestingly enough I do not see this problem on any Intel processor I have
tried, but I have seen the slowdown on all AMD processors I have tried (e.g.
Dual-Core AMD Opteron Processor 2216 and AMD Turion 64 X2 Mobile Technology
TL-60). In fact, the exact same two binaries resulting from compilation with
gcc 4.4.2 and gcc 4.3 for i686 which show a significant performance difference
on an AMD will not show any significant difference on an Intel Core 2 Duo
T7500.

Some observations:

1. On AMD there is a huge difference in the number of mispredicted branches
between the program compiled with gcc-4.4.2 and the program compiled with
earlier compilers. See for instance the following output from oprofile:

---
Counted RETIRED_INDIRECT_BRANCHES_MISPREDICTED events (Retired Indirect
Branches Mispredicted) with a unit mask of 0x00 (No unit mask) count 500
Counted RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS events (Retired Mispredicted
Branch Instructions) with a unit mask of 0x00 (No unit mask) count 500
Counted RETIRED_TAKEN_BRANCH_INSTRUCTIONS events (Retired taken branch
instructions) with a unit mask of 0x00 (No unit mask) count 500
RETIRED_INDIRE...|RETIRED_MISPRE...|RETIRED_TAKEN_...|
  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------
   185416 88.7799    186587 82.8723    381826 48.1913 testp.4.4.2
     5605  2.6838      6275  2.7870    157401 19.8660 testp.4.3


2. Gcc 4.3 generates the following assembler around the "eq:" label in
the attached program:

  4004c0:       48 81 fb 00 e1 f5 05    cmp    $0x5f5e100,%rbx
  4004c7:       74 21                   je     4004ea <main+0x6a>
  4004c9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  4004d0:       48 63 c5                movslq %ebp,%rax
  4004d3:       48 8b 44 c4 b0          mov    -0x50(%rsp,%rax,8),%rax
  4004d8:       ff e0                   jmpq   *%rax

While gcc 4.4.2 will generate an additional jump instruction:

  4004c0:       ff e0                   jmpq   *%rax
    ...
  4004d8:       48 81 fb 00 e1 f5 05    cmp    $0x5f5e100,%rbx
  4004df:       74 21                   je     400502 <main+0x82>
  4004e1:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  4004e8:       48 63 c5                movslq %ebp,%rax
  4004eb:       48 8b 44 c4 88          mov    -0x78(%rsp,%rax,8),%rax
  4004f0:       eb ce                   jmp    4004c0 <main+0x40>

3. I see the same behaviour with a month-old snapshot of gcc 4.5.

Examples of compilers used (have tried with a number of differrent builds on
different targets):

Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --enable-shared
--enable-languages=c,c++,fortran,objc,obj-c++,ada
--enable-threads=posix --mandir=/usr/share/man
--infodir=/usr/share/info --enable-__cxa_atexit --disable-multilib
--libdir=/usr/lib --libexecdir=/usr/lib --enable-clocale=gnu
--disable-libstdcxx-pch --with-tune=generic
Thread model: posix
gcc version 4.4.2 (GCC)

Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --enable-shared
--enable-languages=c,c++ --enable-threads=posix
--mandir=/usr/share/man --infodir=/usr/share/info
--enable-__cxa_atexit --disable-multilib --libdir=/usr/lib
--libexecdir=/usr/lib --enable-clocale=gnu --disable-libstdcxx-pch
--with-tune=generic --disable-werror --enable-checking=release
--program-suffix=-4.3 --enable-version-specific-runtime-libs
Thread model: posix
gcc version 4.3.3 (GCC)

Test program:
=============

#define VALUE 100000000

int main(int argc, char *argv[]) {
  void *ops[] = { &&inc, &&eq, &&gt, &&lt, &&gte, &&lte, &&zero,
&&not_implemented, &&exit };

  long i = 0;
  int next_op = argc; //unknown at compile time...
  int fail_op = 0; //inc
  goto *ops[0];   

  inc: 
    i++;
    goto *ops[next_op]; 

  eq: 
    if (!(i == VALUE)) goto handle_fail;
    return 0;     

  gt: 
    if (!(i > VALUE)) goto handle_fail;
    return 0;     

  lt: 
    if (!(i < VALUE)) goto handle_fail;
    return 0;     

  gte: 
    if (!(i >= VALUE)) goto handle_fail;
    return 0;     

  lte: 
    if (!(i <= VALUE)) goto handle_fail;
    return 0;     

  zero:
    if (!(i == 0)) goto handle_fail;
    return 0;     

  not_implemented: 
    fail_op = 8; //exit
    goto handle_fail;

  exit:
    return -1;


  handle_fail: 
    goto *ops[fail_op];
}


-- 
           Summary: 4.4/4.5 Regression, Computed gotos on AMD 800% slower
           Product: gcc
           Version: 4.4.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: fredrik dot svahn at gmail dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-07-14 20:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, Computed gotos on AMD 800% slower fredrik dot svahn at gmail dot com
2010-01-05 12:46 ` [Bug rtl-optimization/42621] [4.4/4.5 Regression] " rguenth at gcc dot gnu dot org
2010-01-05 12:50 ` steven at gcc dot gnu dot org
2010-01-05 21:51 ` steven at gcc dot gnu dot org
2010-01-05 21:56 ` pinskia at gcc dot gnu dot org
2010-01-05 22:11 ` steven at gcc dot gnu dot org
2010-01-06 11:37 ` fredrik dot svahn at gmail dot com
2010-01-06 11:44 ` fredrik dot svahn at gmail dot com
2010-01-06 23:00 ` fredrik dot svahn at gmail dot com
2010-01-07 14:51 ` rguenth at gcc dot gnu dot org
2010-01-10 23:31 ` steven at gcc dot gnu dot org
2010-01-13 22:27 ` [Bug rtl-optimization/42621] [4.4 " rguenth at gcc dot gnu dot org
2010-01-18 13:14 ` carlr at freemail dot gr
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-04-30  9:25 ` jakub at gcc dot gnu dot org
2010-07-14 20:49 ` jyasskin at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).