public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64)
@ 2011-08-24 21:27 oleg.smolsky at gmail dot com
2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
` (37 more replies)
0 siblings, 38 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-24 21:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
Bug #: 50182
Summary: Performance degradation from gcc 4.1 (x86_64)
Classification: Unclassified
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: oleg.smolsky@gmail.com
G++ 4.6 emits slower code based on the following set of benchmarks:
http://stlab.adobe.com/performance/
The discussion thread is here:
http://gcc.gnu.org/ml/gcc/2011-07/threads.html#00506
http://gcc.gnu.org/ml/gcc/2011-08/threads.html#00411
I digested one of the tests down to a single short case (see attachments):
http://gcc.gnu.org/ml/gcc/2011-08/msg00391.html
g++ 4.1 (1.35 sec, 1185M ops/s):
.text:0000000000400FE0 loc_400FE0:
.text:0000000000400FE0 movzx eax, ds:data8[rdx]
.text:0000000000400FE7 add rdx, 1
.text:0000000000400FEB add eax, 0Ah
.text:0000000000400FEE cmp rdx, 1F40h
.text:0000000000400FF5 lea ecx, [rax+rcx]
.text:0000000000400FF8 jnz short loc_400FE0
g++ 4.6 (2.86s, 563M ops/s) :
.text:0000000000400D90 loc_400D90:
.text:0000000000400D90 add eax, 0Ah
.text:0000000000400D93 add al, [rdx]
.text:0000000000400D95 add rdx, 1
.text:0000000000400D99 cmp rdx, 503480h
.text:0000000000400DA0 jnz short loc_400D90
P.S. setting the component to C++. Optimizer?
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
@ 2011-08-24 22:30 ` oleg.smolsky at gmail dot com
2011-08-25 0:14 ` xinliangli at gmail dot com
` (36 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-24 22:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #1 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-24 22:13:26 UTC ---
Created attachment 25097
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25097
The test case
This is the preprocessed source for the test discussed in the mail thread.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
@ 2011-08-25 0:14 ` xinliangli at gmail dot com
2011-08-25 0:52 ` xinliangli at gmail dot com
` (35 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25 0:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
davidxl <xinliangli at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |xinliangli at gmail dot com
--- Comment #2 from davidxl <xinliangli at gmail dot com> 2011-08-24 23:15:44 UTC ---
The problem is fixed in trunk compiler:
1) with 4.6 compiler:
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 3.29 sec 486.32 M 1.00
RAT_STALLS.registers = 288249 (sampling count 10001)
2) with trunk compiler:
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 1.34 sec 1194.03 M 1.00
No partial register stalls from user functions.
Inner loop from trunk compiler:
.L55:
movzbl 0(%rbp,%rcx), %r9d
addq $1, %rcx
cmpl %ecx, %ebx
leal 10(%r8,%r9), %r8d
jg .L55
Inner loop from 46 compiler:
.L43:
addl $10, %eax
addb (%rdx), %al
addq $1, %rdx
cmpq $data8+8000, %rdx
jne .L43
RAT stalls (not precise event so the instruction causing stalls is a little
off)
: 400e27: nopw 0x0(%rax,%rax,1)
127 0.0440 : 400e30: add $0xa,%eax
5869 2.0330 : 400e33: add (%rdx),%al
282125 97.7263 : 400e35: add $0x1,%rdx
: 400e39: cmp $0x404560,%rdx
: 400e40: jne 400e30 <main+0xd0>
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
2011-08-25 0:14 ` xinliangli at gmail dot com
@ 2011-08-25 0:52 ` xinliangli at gmail dot com
2011-08-25 9:00 ` jakub at gcc dot gnu.org
` (34 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25 0:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #3 from davidxl <xinliangli at gmail dot com> 2011-08-25 00:13:00 UTC ---
Caused by differences in FE generated code:
46:
D.6887 = (int) D.6886;
D.6888 = custom_constant_add<signed char>::do_shift (D.6887);
D.6889 = (unsigned char) D.6888;
result.8 = (unsigned char) result;
D.6891 = D.6889 + result.8;
result = (signed char) D.6891;
n = n + 1;
trunk:
D.6938 = (int) D.6937;
D.6874 = custom_constant_add<signed char>::do_shift (D.6938);
D.6939 = (int) result; <-- promoted to int
D.6940 = (int) D.6874; <---promoted to int
D.6941 = D.6939 + D.6940;
result = (signed char) D.6941;
n = n + 1;
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (2 preceding siblings ...)
2011-08-25 0:52 ` xinliangli at gmail dot com
@ 2011-08-25 9:00 ` jakub at gcc dot gnu.org
2011-08-25 15:21 ` oleg.smolsky at gmail dot com
` (33 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-25 9:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-25 08:55:42 UTC ---
The bugreport is incomplete, I don't see anywhere where you'd state what g++
options were meassured, what CPU was it on, is it -m32 or -m64, etc.
For me, on i7-2600 CPU 4.6.0 (both Fedora 4.6.0-10 and 20110727 4.6 branch
snapshot) is actually much faster than current trunk with -O3 -m64:
4.6.* gives roughly
0 "int8_t constant add" 0.84 sec 1904.76 M 1.00
while trunk
0 "int8_t constant add" 1.26 sec 1269.84 M 1.00
4.4.* gives also
0 "int8_t constant add" 1.26 sec 1269.84 M 1.00
4.3.* gives
0 "int8_t constant add" 1.26 sec 1269.84 M 1.00
4.2.* gives
0 "int8_t constant add" 0.84 sec 1904.76 M 1.00
and 4.1.* doesn't compile, because the source has been preprocessed and STL is
dependent on the compiler version.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (3 preceding siblings ...)
2011-08-25 9:00 ` jakub at gcc dot gnu.org
@ 2011-08-25 15:21 ` oleg.smolsky at gmail dot com
2011-08-25 15:29 ` oleg.smolsky at gmail dot com
` (32 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 15:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #5 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 15:19:57 UTC ---
Created attachment 25103
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25103
The same test preprocessed with g++ 4.1
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (4 preceding siblings ...)
2011-08-25 15:21 ` oleg.smolsky at gmail dot com
@ 2011-08-25 15:29 ` oleg.smolsky at gmail dot com
2011-08-25 16:18 ` hjl.tools at gmail dot com
` (31 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 15:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #6 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 15:25:49 UTC ---
Oh, the settings and things were discussed the mail thread... Here is the
digest:
I have compiled and run a set of C++ benchmarks on a CentOS4/64 box using the
following compilers:
a) g++4.1 that is available for this distro (GCC version 4.1.2 20071124 (Red
Hat 4.1.2-42)
b) g++4.6 that I built (stock version 4.6.1)
I built the compiler with all the default options (it just has a distinct
installation path):
../gcc-%{version}/configure --prefix=/work/tools/gcc46
--enable-languages=c,c++ --with-system-zlib --with-mpfr=/work/tools/mpfr24
--with-gmp=/work/tools/gmp --with-mpc=/work/tools/mpc
LD_LIBRARY_PATH=/work/tools/mpfr/lib24:/work/tools/gmp/lib:/work/tools/mpc/lib
Tests were compiled with -O2 and -O3, I later added -march=native to 4.6
builds.
The processor is Intel quad core something:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Genuine Intel(R) CPU @ 2.40GHz
stepping : 4
cpu MHz : 2393.943
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor
ds_cpl tm2 cx16 xtpr lahf_lm
bogomips : 4793.09
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (5 preceding siblings ...)
2011-08-25 15:29 ` oleg.smolsky at gmail dot com
@ 2011-08-25 16:18 ` hjl.tools at gmail dot com
2011-08-25 16:26 ` xinliangli at gmail dot com
` (30 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-25 16:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-25 15:58:08 UTC ---
(In reply to comment #6)
>
> The processor is Intel quad core something:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 15
> model name : Genuine Intel(R) CPU @ 2.40GHz
> stepping : 4
Are you using engineering example? It doesn't look
like a production processor.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (6 preceding siblings ...)
2011-08-25 16:18 ` hjl.tools at gmail dot com
@ 2011-08-25 16:26 ` xinliangli at gmail dot com
2011-08-25 16:49 ` oleg.smolsky at gmail dot com
` (29 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25 16:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #8 from davidxl <xinliangli at gmail dot com> 2011-08-25 16:17:10 UTC ---
gcc46 and gcc47 difference can be reproduced using -O2 -m64.
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (7 preceding siblings ...)
2011-08-25 16:26 ` xinliangli at gmail dot com
@ 2011-08-25 16:49 ` oleg.smolsky at gmail dot com
2011-08-25 22:48 ` oleg.smolsky at gmail dot com
` (28 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 16:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #9 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 16:26:05 UTC ---
AFAIK it's a production processor, a couple of years old. From x86info:
Family: 6 Model: 15 Stepping: 4 Type: 0 Brand: 0
CPU Model: Core 2 Duo E6600 Original OEM
Feature flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh
ds acpi mmx fxsr sse sse2 ss ht tm pbe sse3 monitor ds-cpl vmx tm2 ssse3 cx16
xT
PR
Extended feature flags:
SYSCALL xd em64t lahf_lm
Cache info
L1 Instruction cache: 32KB, 8-way associative. 64 byte line size.
L1 Data cache: 32KB, 8-way associative. 64 byte line size.
L3 unified cache: 4MB, 16-way associative. 64 byte line size.
TLB info
Instruction TLB: 4x 4MB page entries, or 8x 2MB pages entries, 4-way assoc..
Instruction TLB: 4K pages, 4-way associative, 128 entries.
Data TLB: 4MB pages, 4-way associative, 32 entries
L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
Data TLB: 4K pages, 4-way associative, 256 entries.
Data TLB: 4MB pages, 4-way associative, 32 entries
64 byte prefetching.
L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
Data TLB: 4K pages, 4-way associative, 256 entries.
The physical package supports 4 logical processors
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (8 preceding siblings ...)
2011-08-25 16:49 ` oleg.smolsky at gmail dot com
@ 2011-08-25 22:48 ` oleg.smolsky at gmail dot com
2011-08-26 7:12 ` oleg.smolsky at gmail dot com
` (27 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 22:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #10 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 22:08:49 UTC ---
BTW, the uint16_t test also got slower for the same very reason. Here is the
inner-most loop generated by g++4.6:
text:0000000000400DA0 loc_400DA0:
.text:0000000000400DA0 add eax, 0Ah
.text:0000000000400DA3 add ax, [rdx]
.text:0000000000400DA6 add rdx, 2
.text:0000000000400DAA cmp rdx, 5092E0h
.text:0000000000400DB1 jnz short loc_400DA0
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (9 preceding siblings ...)
2011-08-25 22:48 ` oleg.smolsky at gmail dot com
@ 2011-08-26 7:12 ` oleg.smolsky at gmail dot com
2011-08-30 20:37 ` matt at use dot net
` (26 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-26 7:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #11 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-26 00:48:02 UTC ---
Also, I have just built the same suite with GCC version 4.7 that came from
ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20110820/gcc-4.7-20110820.tar.bz2 and
the performance degradation remains:
gcc41:
0 "int8_t constant add" 1.35 sec 1185.19 M 1.00
gcc47:
0 "int8_t constant add" 2.37 sec 675.11 M 1.00
Note, these are original unmodified tests, not my digested derivatives
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (10 preceding siblings ...)
2011-08-26 7:12 ` oleg.smolsky at gmail dot com
@ 2011-08-30 20:37 ` matt at use dot net
2011-09-15 16:57 ` oleg at smolsky dot net
` (25 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: matt at use dot net @ 2011-08-30 20:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
Matt Hargett <matt at use dot net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matt at use dot net
--- Comment #12 from Matt Hargett <matt at use dot net> 2011-08-30 20:30:15 UTC ---
Can you determine which release introduced the regression?
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (11 preceding siblings ...)
2011-08-30 20:37 ` matt at use dot net
@ 2011-09-15 16:57 ` oleg at smolsky dot net
2011-09-15 17:39 ` xinliangli at gmail dot com
` (24 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-09-15 16:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #13 from oleg at smolsky dot net 2011-09-15 16:53:26 UTC ---
David, it looks like we are seeing different things with v4.7... See my
comment 11 - I am still observing the slowdown. Do you have access to
v4.1 and v4.6? Could you try reproducing my test please?
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (12 preceding siblings ...)
2011-09-15 16:57 ` oleg at smolsky dot net
@ 2011-09-15 17:39 ` xinliangli at gmail dot com
2011-10-21 23:02 ` xinliangli at gmail dot com
` (23 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-09-15 17:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #14 from davidxl <xinliangli at gmail dot com> 2011-09-15 17:28:10 UTC ---
(In reply to comment #13)
> David, it looks like we are seeing different things with v4.7... See my
> comment 11 - I am still observing the slowdown. Do you have access to
> v4.1 and v4.6? Could you try reproducing my test please?
Sorry for the delay -- I am pretty swamped these days (till mid October). I
will try to look at the problem more then.
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (13 preceding siblings ...)
2011-09-15 17:39 ` xinliangli at gmail dot com
@ 2011-10-21 23:02 ` xinliangli at gmail dot com
2011-10-24 18:28 ` oleg at smolsky dot net
` (22 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-21 23:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #15 from davidxl <xinliangli at gmail dot com> 2011-10-21 23:02:16 UTC ---
(In reply to comment #14)
> (In reply to comment #13)
> > David, it looks like we are seeing different things with v4.7... See my
> > comment 11 - I am still observing the slowdown. Do you have access to
> > v4.1 and v4.6? Could you try reproducing my test please?
>
> Sorry for the delay -- I am pretty swamped these days (till mid October). I
> will try to look at the problem more then.
>
> David
I still can not reproduce the problem with trunk compiler:
rv=4282167296
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 1.09 sec 1467.89 M 1.00
Total absolute time for int8_t constant folding: 1.09 sec
Can you attach the output of -v and the assembly file with -fverbose-asm -dA
and the optimized dump file with option -fdump-tree-optimized-blocks using
trunk compiler?
thanks,
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (14 preceding siblings ...)
2011-10-21 23:02 ` xinliangli at gmail dot com
@ 2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:28 ` oleg at smolsky dot net
` (21 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #17 from oleg at smolsky dot net 2011-10-24 18:27:31 UTC ---
Created attachment 25595
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25595
test.cpp.144t.optimized
--- Comment #18 from oleg at smolsky dot net 2011-10-24 18:27:31 UTC ---
Created attachment 25596
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25596
test.s
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (15 preceding siblings ...)
2011-10-24 18:28 ` oleg at smolsky dot net
@ 2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:33 ` oleg at smolsky dot net
` (20 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #16 from oleg at smolsky dot net 2011-10-24 18:27:28 UTC ---
$ /work/tools/gcc47/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/work/tools/gcc47/bin/g++
COLLECT_LTO_WRAPPER=/work/tools/gcc47/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.7/configure --prefix=/work/tools/gcc47
--enable-languages=c,c++ --with-system-zlib
--with-mpfr=/work/tools/mpfr24 --with-gmp=/work/tools/gmp
--with-mpc=/work/tools/mpc
LD_LIBRARY_PATH=/work/tools/mpfr/lib24:/work/tools/gmp/lib:/work/tools/mpc/lib
Thread model: posix
gcc version 4.7.0 20111001 (experimental) (GCC)
The test case, test.cpp was compiled with this command:
/work/tools/gcc47/bin/g++ -I. -g -O3 -static-libstdc++ -static-libgcc
-march=native test.cpp -o test
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (16 preceding siblings ...)
2011-10-24 18:28 ` oleg at smolsky dot net
@ 2011-10-24 18:33 ` oleg at smolsky dot net
2011-10-24 19:34 ` xinliangli at gmail dot com
` (19 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #19 from oleg at smolsky dot net 2011-10-24 18:33:23 UTC ---
Also note that Bugzilla has quietly replaced an older attachment,
test.cpp, with a new one without adding a comment...
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (17 preceding siblings ...)
2011-10-24 18:33 ` oleg at smolsky dot net
@ 2011-10-24 19:34 ` xinliangli at gmail dot com
2011-10-24 19:50 ` oleg at smolsky dot net
` (18 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 19:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #20 from davidxl <xinliangli at gmail dot com> 2011-10-24 19:33:18 UTC ---
The test.cpp attached seems to be the same as the old version.
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (18 preceding siblings ...)
2011-10-24 19:34 ` xinliangli at gmail dot com
@ 2011-10-24 19:50 ` oleg at smolsky dot net
2011-10-24 19:59 ` xinliangli at gmail dot com
` (17 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 19:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #21 from oleg at smolsky dot net 2011-10-24 19:48:57 UTC ---
OK, just in case, here is my current test.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (19 preceding siblings ...)
2011-10-24 19:50 ` oleg at smolsky dot net
@ 2011-10-24 19:59 ` xinliangli at gmail dot com
2011-10-24 21:12 ` oleg at smolsky dot net
` (16 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 19:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #22 from davidxl <xinliangli at gmail dot com> 2011-10-24 19:58:23 UTC ---
(In reply to comment #21)
> OK, just in case, here is my current test.
Preprocessed test case? I saw the main assembly difference that can explain the
performance diff, but want to make sure it is not due to your new source change
(I saw some print statement addeded).
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (20 preceding siblings ...)
2011-10-24 19:59 ` xinliangli at gmail dot com
@ 2011-10-24 21:12 ` oleg at smolsky dot net
2011-10-24 23:00 ` xinliangli at gmail dot com
` (15 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 21:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #23 from oleg at smolsky dot net 2011-10-24 21:11:21 UTC ---
Here is the source preprocessed for gcc47. The test exhibits the
slowdown mentioned in comment 11.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (21 preceding siblings ...)
2011-10-24 21:12 ` oleg at smolsky dot net
@ 2011-10-24 23:00 ` xinliangli at gmail dot com
2011-10-24 23:03 ` xinliangli at gmail dot com
` (14 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 23:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #24 from davidxl <xinliangli at gmail dot com> 2011-10-24 23:00:22 UTC ---
(In reply to comment #23)
> Here is the source preprocessed for gcc47. The test exhibits the
> slowdown mentioned in comment 11.
The problem can be reproduced with a simplified test case -- basically
depending on how the result value from the inner loop is used in the outer loop
(related to casting), the inner loop code is quite different - in the slow
case, there are two redundant sign extension and a move instructions generated.
# the fast version
gcc -O3 -DFAST_VER bug.cpp
./a.out
rv=4282167296
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 1.05 sec 1523.81 M 1.00
Total absolute time for int8_t constant folding: 1.05 sec
# the slow version:
gcc -O3 bug.cpp
./a.out
rv=4282167296
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 1.57 sec 1019.11 M 1.00
Total absolute time for int8_t constant folding: 1.57 sec
# however, when disabling inlining of check_shifted_sum_1 in the slow case, the
runtime is recovered:
gcc -O3 -DNOINLINE bug.cpp
./a.out
rv=4282167296
test description absolute operations ratio with
number time per second test0
0 "int8_t constant add" 1.05 sec 1523.81 M 1.00
Total absolute time for int8_t constant folding: 1.05 sec
The inner loop body in faster case:
.L60:
movzbl 0(%rbp,%rcx), %r9d
addq $1, %rcx
cmpl %ecx, %ebx
leal 10(%r8,%r9), %r8d
# SUCC: 4 [91.0%] (dfs_back,can_fallthru) 5 [9.0%]
(fallthru,can_fallthru,loop_exit)
jg .L60
while for the slow case:
.L60:
movzbl (%r12,%rcx), %eax
movsbl %r8b, %r8d
addq $1, %rcx
leal 10(%rax), %r9d
movsbl %r9b, %r9d
addl %r8d, %r9d
cmpl %ecx, %ebp
movl %r9d, %r8d
# SUCC: 4 [91.0%] (dfs_back,can_fallthru) 5 [9.0%]
(fallthru,can_fallthru,loop_exit)
jg .L60
The relevant source change:
#ifdef NOINLINE
#define INL __attribute__((noinline))
#else
#define INL inline
#endif
template <typename T, typename T2, typename Shifter>
INL void check_shifted_sum_1(T2 result) {
T temp = (T)SIZE * Shifter::do_shift((T)init_value);
if (!tolerance_equal<T>((T&)result,temp))
printf("test %i failed\n", current_test);
}
#ifdef FAST_VER
#define TYPE u_int32_t
#else
#define TYPE int8_t
#endif
template <typename T, typename Shifter>
__attribute__((noinline)) u_int32_t test_constant(T* first, int count, const
char *label)
{
int i;
u_int32_t rv = 0;
start_timer();
for (i = 0; i < iterations; ++i) {
T result = 0;
for (int n = 0; n < count; ++n) {
result += Shifter::do_shift( first[n] );
}
rv += result;
check_shifted_sum_1<T, TYPE, Shifter>(result);
}
record_result( timer(), label );
return rv;
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (22 preceding siblings ...)
2011-10-24 23:00 ` xinliangli at gmail dot com
@ 2011-10-24 23:03 ` xinliangli at gmail dot com
2012-01-10 18:07 ` oleg at smolsky dot net
` (13 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 23:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #25 from davidxl <xinliangli at gmail dot com> 2011-10-24 23:02:14 UTC ---
Created attachment 25600
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25600
test case for 47
Note that with gcc46, the result is even slower -- it has the RAT stall problem
which is fixed in 47.
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (23 preceding siblings ...)
2011-10-24 23:03 ` xinliangli at gmail dot com
@ 2012-01-10 18:07 ` oleg at smolsky dot net
2012-01-11 9:43 ` rguenth at gcc dot gnu.org
` (12 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-01-10 18:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #26 from oleg at smolsky dot net 2012-01-10 18:06:28 UTC ---
Could someone toggle the state assign a milestone please?
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (24 preceding siblings ...)
2012-01-10 18:07 ` oleg at smolsky dot net
@ 2012-01-11 9:43 ` rguenth at gcc dot gnu.org
2012-01-11 17:27 ` xinliangli at gmail dot com
` (11 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-11 9:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-01-11
Ever Confirmed|0 |1
--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-11 09:41:25 UTC ---
Confirmed. Can somebody summarize please and point to the relevant short
testcase that shows the regression (is there only one kind of problem? this
seems to be a benchmark suite). A short testcase is preprocessed and
at most a few hundred lines.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (25 preceding siblings ...)
2012-01-11 9:43 ` rguenth at gcc dot gnu.org
@ 2012-01-11 17:27 ` xinliangli at gmail dot com
2012-03-02 0:56 ` oleg at smolsky dot net
` (10 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2012-01-11 17:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #28 from davidxl <xinliangli at gmail dot com> 2012-01-11 17:26:46 UTC ---
See comment 24 for shorter test case.
Summary:
1) the regression reported by Oleg in gcc4_6 and earlier versions is due to FE
code generation difference which lead to the backend to generate code leading
to partial register stall.
2) the RAT stall problem is fixed in gcc4_7
3) however in 4_7, there is a different problem -- redundant sign-extension and
move instruction is generated. It could be due to the limitation in RTL forward
propagation and combine pass to deal with multiple downward uses
David
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (26 preceding siblings ...)
2012-01-11 17:27 ` xinliangli at gmail dot com
@ 2012-03-02 0:56 ` oleg at smolsky dot net
2012-03-02 8:08 ` jakub at gcc dot gnu.org
` (9 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-02 0:56 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #29 from oleg at smolsky dot net 2012-03-02 00:54:53 UTC ---
Is it possible to target this to 4.7? These optimization issues result
in benchmarcably slower code...
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (27 preceding siblings ...)
2012-03-02 0:56 ` oleg at smolsky dot net
@ 2012-03-02 8:08 ` jakub at gcc dot gnu.org
2012-03-02 8:23 ` oleg at smolsky dot net
` (8 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02 8:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 08:07:15 UTC ---
Created attachment 26809
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26809
pr50182.C
Even the reduced testcase is orders of magnitude longer than what would be
desirable for analysis, I've tried to reduce it just to the templates that are
actually needed (and can be meassured just with time), does this reflect the
slowdowns you are seeing? The next step at reducing would be to remove all the
template mess, instantiate it by hand, and perhaps also inline by hand. There
is no reason why we shouldn't be just having one loop with all the statements
in it. On this reduced testcase on Intel i7-2600 CPU with -O3 the
-DFAST_VER/-DNOINLINE don't seem to make any difference, but 4.6 is measurably
faster than 4.7.
In any case, this is way too late for 4.7.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (28 preceding siblings ...)
2012-03-02 8:08 ` jakub at gcc dot gnu.org
@ 2012-03-02 8:23 ` oleg at smolsky dot net
2012-03-02 8:29 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-02 8:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #31 from oleg at smolsky dot net 2012-03-02 08:21:41 UTC ---
I don't think there is a need to actually check the result in this
benchmarkable fragment, so that will reduce the code a little. The only
thing that I was hitting is about fooling/forcing the compiler not to
discard the intermediate result and actually perform every calculation
and iteration :)
Let me try do digest this further. I'll also get you a result from our
production compiler (v4.1 that emits the fastest code)
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (29 preceding siblings ...)
2012-03-02 8:23 ` oleg at smolsky dot net
@ 2012-03-02 8:29 ` jakub at gcc dot gnu.org
2012-03-02 9:14 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02 8:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 08:28:34 UTC ---
For me, 4.1 is equally fast to 4.6 on my CPU and on the reduced testcase I've
attached (not clear if it models what the original benchmark did right or not),
and on the trunk regressed with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176072
Before that the inner loop looked like:
.L12:
addl $10, %edx
addb 0(%rbp,%rcx), %dl
addq $1, %rcx
cmpl %ecx, %ebx
jg .L12
and now it looks like:
.L12:
movzbl 0(%rbp,%rdx), %r8d
addq $1, %rdx
cmpl %edx, %ebx
leal 10(%rcx,%r8), %ecx
jg .L12
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (30 preceding siblings ...)
2012-03-02 8:29 ` jakub at gcc dot gnu.org
@ 2012-03-02 9:14 ` jakub at gcc dot gnu.org
2012-03-03 2:20 ` oleg at smolsky dot net
` (5 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02 9:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 09:13:52 UTC ---
After Jason's patch (which needs to be kept, it was a wrong-code bugfix), we
get out of the FE the addition in int type, while previously it was in unsigned
char type. I.e.
int D.2177;
signed char D.2138;
T D.2178;
T D.2179;
T D.2180;
signed char result;
D.2138 = custom_constant_add<signed char>::do_shift (D.2177);
D.2178 = (T) result;
D.2179 = (T) D.2138;
D.2180 = D.2178 + D.2179;
result = (signed char) D.2180;
where T used to be unsigned char before and now is int.
And no GIMPLE optimization pass manages to narrow the addition operation
(together with the previous sign extensions and following demotion) to an
unsigned char operation (signed char would be wrong, because of the possible
overflow). I bet such narrowing in these cases could even help the vectorizer,
which if it were to vectorize this or similar loops (it doesn't in this case),
would do the promotions/demotions needlessly.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (31 preceding siblings ...)
2012-03-02 9:14 ` jakub at gcc dot gnu.org
@ 2012-03-03 2:20 ` oleg at smolsky dot net
2012-03-03 2:47 ` oleg at smolsky dot net
` (4 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03 2:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #34 from oleg at smolsky dot net 2012-03-03 02:19:21 UTC ---
OK, here are some benchmark numbers for the test compiled verbatim with
g++41/g++463 -O2:
$ time ./test41
rv=4243767296
real 0m6.063s
user 0m6.058s
sys 0m0.001s
$ time ./test46
rv=4243767296
real 0m11.425s
user 0m11.415s
sys 0m0.003s
$ time ./test46-fast #(ie built it with -DFAST_VER)
rv=4243767296
real 0m11.389s
user 0m11.383s
sys 0m0.003s
Let me see how the sample can be digested further down...
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (32 preceding siblings ...)
2012-03-03 2:20 ` oleg at smolsky dot net
@ 2012-03-03 2:47 ` oleg at smolsky dot net
2012-03-03 3:00 ` oleg at smolsky dot net
` (3 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03 2:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #35 from oleg at smolsky dot net 2012-03-03 02:45:15 UTC ---
Here is a smaller version. BTW, I've noticed another regression in
optimization in v4.1 when using a const global...
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (33 preceding siblings ...)
2012-03-03 2:47 ` oleg at smolsky dot net
@ 2012-03-03 3:00 ` oleg at smolsky dot net
2012-03-06 16:34 ` oleg at smolsky dot net
` (2 subsequent siblings)
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03 3:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #36 from oleg at smolsky dot net 2012-03-03 02:59:11 UTC ---
Here is the code emitted by g++ 4.6.3 for smaller_test.cpp (attached to
the bug)
unsigned int test_constant<> proc near
mov r9d, cs:iterations
xor r8d, r8d
xor eax, eax
test r9d, r9d
jle short locret_400552
db 66h, 66h, 66h
nop
db 66h, 66h
nop
loc_400528:
xor ecx, ecx
xor edx, edx
test esi, esi
jle short loc_40054E
loc_400530:
add edx, 0Ah
add dl, [rdi+rcx]
add rcx, 1
cmp esi, ecx
jg short loc_400530
movsx edx, dl
loc_400541:
add r8d, 1
add eax, edx
cmp r8d, r9d
jnz short loc_400528
rep retn
loc_40054E:
xor edx, edx
jmp short loc_400541
locret_400552:
rep retn
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (34 preceding siblings ...)
2012-03-03 3:00 ` oleg at smolsky dot net
@ 2012-03-06 16:34 ` oleg at smolsky dot net
2012-03-06 17:27 ` jakub at gcc dot gnu.org
2012-03-06 19:40 ` oleg at smolsky dot net
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-06 16:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #37 from oleg at smolsky dot net 2012-03-06 16:34:27 UTC ---
Hey Jakub, is this smaller example digestable?
http://gcc.gnu.org/bugzilla/attachment.cgi?id=26814
The asm output is straightforward, but I obviously have no clue about
how complex the corresponding compiler's internal state is...
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (35 preceding siblings ...)
2012-03-06 16:34 ` oleg at smolsky dot net
@ 2012-03-06 17:27 ` jakub at gcc dot gnu.org
2012-03-06 19:40 ` oleg at smolsky dot net
37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-06 17:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #38 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-06 17:26:24 UTC ---
Sorry, can't reproduce any performance degradation between 4.1 and 4.6
on the http://gcc.gnu.org/bugzilla/attachment.cgi?id=26814 testcase (-O3 -m64,
default -mtune=generic):
on i7-2600 4.1 user time is 0m3.833s, 4.6 0m3.411s and 4.7 0m5.102s,
on AMD Barcelona 4.1 user time is 0m8.798s, 4.6 0m5.875s and 4.7 0m5.855s.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
` (36 preceding siblings ...)
2012-03-06 17:27 ` jakub at gcc dot gnu.org
@ 2012-03-06 19:40 ` oleg at smolsky dot net
37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-06 19:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #39 from oleg at smolsky dot net 2012-03-06 19:39:03 UTC ---
Hmm... funky. I can reproduce the issue on a newer Intel machine:
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
stepping : 6
cpu MHz : 2327.445
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
....
$ time ./test41
real 0m6.270s
user 0m6.268s
sys 0m0.000s
$ time ./test44
real 0m5.524s
user 0m5.523s
sys 0m0.000s
$ time ./test46
real 0m11.721s
user 0m11.718s
sys 0m0.001s
P.S. the middle one is made using g++ (GCC) 4.4.5 20110214 (Red Hat
4.4.5-6). The rest are original binaries made a couple of days ago.
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2012-03-06 19:40 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
2011-08-25 0:14 ` xinliangli at gmail dot com
2011-08-25 0:52 ` xinliangli at gmail dot com
2011-08-25 9:00 ` jakub at gcc dot gnu.org
2011-08-25 15:21 ` oleg.smolsky at gmail dot com
2011-08-25 15:29 ` oleg.smolsky at gmail dot com
2011-08-25 16:18 ` hjl.tools at gmail dot com
2011-08-25 16:26 ` xinliangli at gmail dot com
2011-08-25 16:49 ` oleg.smolsky at gmail dot com
2011-08-25 22:48 ` oleg.smolsky at gmail dot com
2011-08-26 7:12 ` oleg.smolsky at gmail dot com
2011-08-30 20:37 ` matt at use dot net
2011-09-15 16:57 ` oleg at smolsky dot net
2011-09-15 17:39 ` xinliangli at gmail dot com
2011-10-21 23:02 ` xinliangli at gmail dot com
2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:33 ` oleg at smolsky dot net
2011-10-24 19:34 ` xinliangli at gmail dot com
2011-10-24 19:50 ` oleg at smolsky dot net
2011-10-24 19:59 ` xinliangli at gmail dot com
2011-10-24 21:12 ` oleg at smolsky dot net
2011-10-24 23:00 ` xinliangli at gmail dot com
2011-10-24 23:03 ` xinliangli at gmail dot com
2012-01-10 18:07 ` oleg at smolsky dot net
2012-01-11 9:43 ` rguenth at gcc dot gnu.org
2012-01-11 17:27 ` xinliangli at gmail dot com
2012-03-02 0:56 ` oleg at smolsky dot net
2012-03-02 8:08 ` jakub at gcc dot gnu.org
2012-03-02 8:23 ` oleg at smolsky dot net
2012-03-02 8:29 ` jakub at gcc dot gnu.org
2012-03-02 9:14 ` jakub at gcc dot gnu.org
2012-03-03 2:20 ` oleg at smolsky dot net
2012-03-03 2:47 ` oleg at smolsky dot net
2012-03-03 3:00 ` oleg at smolsky dot net
2012-03-06 16:34 ` oleg at smolsky dot net
2012-03-06 17:27 ` jakub at gcc dot gnu.org
2012-03-06 19:40 ` oleg at smolsky dot net
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).