public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/42621] New: 4.4/4.5 Regression, Computed gotos on AMD 800% slower
@ 2010-01-05 11:44 fredrik dot svahn at gmail dot com
2010-01-13 22:27 ` [Bug rtl-optimization/42621] [4.4 Regression] " rguenth at gcc dot gnu dot org
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: fredrik dot svahn at gmail dot com @ 2010-01-05 11:44 UTC (permalink / raw)
To: gcc-bugs
When compiling a program with computed goto:s with gcc 4.4.2 it runs
significantly slower (up to a factor 10) than when it is compiled with e.g. gcc
4.1/4.3 with the same optimization flags (-O2 or -O3). A small dummy test
program without header file dependencies is attached.
I am compiling with a commandline like "gcc -O3 test.c -o testp.4.4.2", and run
the generated executable without arguments, like "./testp.4.4.2". Generating
cpu specific instructions, e.g. "-march=athlon64" seems to make no difference.
I have also tried with "-fno-gcse" (as recommended in the docs) to no avail.
Same results with targets x86_64 and i686 on Novell SLES 10 and Arch Linux.
Interestingly enough I do not see this problem on any Intel processor I have
tried, but I have seen the slowdown on all AMD processors I have tried (e.g.
Dual-Core AMD Opteron Processor 2216 and AMD Turion 64 X2 Mobile Technology
TL-60). In fact, the exact same two binaries resulting from compilation with
gcc 4.4.2 and gcc 4.3 for i686 which show a significant performance difference
on an AMD will not show any significant difference on an Intel Core 2 Duo
T7500.
Some observations:
1. On AMD there is a huge difference in the number of mispredicted branches
between the program compiled with gcc-4.4.2 and the program compiled with
earlier compilers. See for instance the following output from oprofile:
---
Counted RETIRED_INDIRECT_BRANCHES_MISPREDICTED events (Retired Indirect
Branches Mispredicted) with a unit mask of 0x00 (No unit mask) count 500
Counted RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS events (Retired Mispredicted
Branch Instructions) with a unit mask of 0x00 (No unit mask) count 500
Counted RETIRED_TAKEN_BRANCH_INSTRUCTIONS events (Retired taken branch
instructions) with a unit mask of 0x00 (No unit mask) count 500
RETIRED_INDIRE...|RETIRED_MISPRE...|RETIRED_TAKEN_...|
samples| %| samples| %| samples| %|
------------------------------------------------------
185416 88.7799 186587 82.8723 381826 48.1913 testp.4.4.2
5605 2.6838 6275 2.7870 157401 19.8660 testp.4.3
2. Gcc 4.3 generates the following assembler around the "eq:" label in
the attached program:
4004c0: 48 81 fb 00 e1 f5 05 cmp $0x5f5e100,%rbx
4004c7: 74 21 je 4004ea <main+0x6a>
4004c9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4004d0: 48 63 c5 movslq %ebp,%rax
4004d3: 48 8b 44 c4 b0 mov -0x50(%rsp,%rax,8),%rax
4004d8: ff e0 jmpq *%rax
While gcc 4.4.2 will generate an additional jump instruction:
4004c0: ff e0 jmpq *%rax
...
4004d8: 48 81 fb 00 e1 f5 05 cmp $0x5f5e100,%rbx
4004df: 74 21 je 400502 <main+0x82>
4004e1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4004e8: 48 63 c5 movslq %ebp,%rax
4004eb: 48 8b 44 c4 88 mov -0x78(%rsp,%rax,8),%rax
4004f0: eb ce jmp 4004c0 <main+0x40>
3. I see the same behaviour with a month-old snapshot of gcc 4.5.
Examples of compilers used (have tried with a number of differrent builds on
different targets):
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --enable-shared
--enable-languages=c,c++,fortran,objc,obj-c++,ada
--enable-threads=posix --mandir=/usr/share/man
--infodir=/usr/share/info --enable-__cxa_atexit --disable-multilib
--libdir=/usr/lib --libexecdir=/usr/lib --enable-clocale=gnu
--disable-libstdcxx-pch --with-tune=generic
Thread model: posix
gcc version 4.4.2 (GCC)
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --enable-shared
--enable-languages=c,c++ --enable-threads=posix
--mandir=/usr/share/man --infodir=/usr/share/info
--enable-__cxa_atexit --disable-multilib --libdir=/usr/lib
--libexecdir=/usr/lib --enable-clocale=gnu --disable-libstdcxx-pch
--with-tune=generic --disable-werror --enable-checking=release
--program-suffix=-4.3 --enable-version-specific-runtime-libs
Thread model: posix
gcc version 4.3.3 (GCC)
Test program:
=============
#define VALUE 100000000
int main(int argc, char *argv[]) {
void *ops[] = { &&inc, &&eq, &>, &<, &>e, &<e, &&zero,
&¬_implemented, &&exit };
long i = 0;
int next_op = argc; //unknown at compile time...
int fail_op = 0; //inc
goto *ops[0];
inc:
i++;
goto *ops[next_op];
eq:
if (!(i == VALUE)) goto handle_fail;
return 0;
gt:
if (!(i > VALUE)) goto handle_fail;
return 0;
lt:
if (!(i < VALUE)) goto handle_fail;
return 0;
gte:
if (!(i >= VALUE)) goto handle_fail;
return 0;
lte:
if (!(i <= VALUE)) goto handle_fail;
return 0;
zero:
if (!(i == 0)) goto handle_fail;
return 0;
not_implemented:
fail_op = 8; //exit
goto handle_fail;
exit:
return -1;
handle_fail:
goto *ops[fail_op];
}
--
Summary: 4.4/4.5 Regression, Computed gotos on AMD 800% slower
Product: gcc
Version: 4.4.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: fredrik dot svahn at gmail dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
@ 2010-01-13 22:27 ` rguenth at gcc dot gnu dot org
2010-01-18 13:14 ` carlr at freemail dot gr
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-13 22:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2010-01-13 22:26 -------
Fixed for 4.5 sofar.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |4.5.0
Summary|[4.4/4.5 Regression] |[4.4 Regression] Computed
|Computed gotos on AMD 800% |gotos on AMD 800% slower
|slower |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
2010-01-13 22:27 ` [Bug rtl-optimization/42621] [4.4 Regression] " rguenth at gcc dot gnu dot org
@ 2010-01-18 13:14 ` carlr at freemail dot gr
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: carlr at freemail dot gr @ 2010-01-18 13:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from carlr at freemail dot gr 2010-01-18 13:14 -------
Please note that computed gotos are factored out because "they are a hell to
deal with" in tree-cfg.c:build_gimple_cfg(). This means that they MUST be
unfactored out as promised in the comment without leaving this to another
optimization step that may or may not be enabled.
Also, for our product there are 97 "extra jumps" and 95 of them are long jumps,
i.e:
12be0: ff e1 jmp *%ecx
...
12dda: e9 01 fe ff ff jmp 12be0 <main_loop+0x220>
...
so this is a serious both speed and size pessimisation :(
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
2010-01-13 22:27 ` [Bug rtl-optimization/42621] [4.4 Regression] " rguenth at gcc dot gnu dot org
2010-01-18 13:14 ` carlr at freemail dot gr
@ 2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-04-30 9:25 ` jakub at gcc dot gnu dot org
2010-07-14 20:49 ` jyasskin at gmail dot com
4 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-01-21 13:16 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.4.3 |4.4.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
` (2 preceding siblings ...)
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
@ 2010-04-30 9:25 ` jakub at gcc dot gnu dot org
2010-07-14 20:49 ` jyasskin at gmail dot com
4 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30 9:25 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.4.4 |4.4.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
` (3 preceding siblings ...)
2010-04-30 9:25 ` jakub at gcc dot gnu dot org
@ 2010-07-14 20:49 ` jyasskin at gmail dot com
4 siblings, 0 replies; 10+ messages in thread
From: jyasskin at gmail dot com @ 2010-07-14 20:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from jyasskin at gmail dot com 2010-07-14 20:49 -------
Is this the same bug as PR 39284?
--
jyasskin at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jyasskin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42621
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-13 16:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-42621-4@http.gcc.gnu.org/bugzilla/>
2010-10-01 12:12 ` [Bug rtl-optimization/42621] [4.4 Regression] Computed gotos on AMD 800% slower jakub at gcc dot gnu.org
2011-04-16 11:16 ` jakub at gcc dot gnu.org
2011-06-10 8:51 ` jaak at ristioja dot ee
2011-06-10 8:53 ` jaak at ristioja dot ee
2012-03-13 16:39 ` jakub at gcc dot gnu.org
2010-01-05 11:44 [Bug c/42621] New: 4.4/4.5 Regression, " fredrik dot svahn at gmail dot com
2010-01-13 22:27 ` [Bug rtl-optimization/42621] [4.4 Regression] " rguenth at gcc dot gnu dot org
2010-01-18 13:14 ` carlr at freemail dot gr
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-04-30 9:25 ` jakub at gcc dot gnu dot org
2010-07-14 20:49 ` jyasskin at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).