public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64)
@ 2011-08-24 21:27 oleg.smolsky at gmail dot com
  2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
                   ` (37 more replies)
  0 siblings, 38 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-24 21:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

             Bug #: 50182
           Summary: Performance degradation from gcc 4.1 (x86_64)
    Classification: Unclassified
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: oleg.smolsky@gmail.com


G++ 4.6 emits slower code based on the following set of benchmarks:
    http://stlab.adobe.com/performance/ 

The discussion thread is here:
    http://gcc.gnu.org/ml/gcc/2011-07/threads.html#00506
    http://gcc.gnu.org/ml/gcc/2011-08/threads.html#00411

I digested one of the tests down to a single short case (see attachments):
    http://gcc.gnu.org/ml/gcc/2011-08/msg00391.html



g++ 4.1 (1.35 sec, 1185M ops/s):

.text:0000000000400FE0 loc_400FE0:
.text:0000000000400FE0                 movzx   eax, ds:data8[rdx]
.text:0000000000400FE7                 add     rdx, 1
.text:0000000000400FEB                 add     eax, 0Ah
.text:0000000000400FEE                 cmp     rdx, 1F40h
.text:0000000000400FF5                 lea     ecx, [rax+rcx]
.text:0000000000400FF8                 jnz     short loc_400FE0

g++ 4.6 (2.86s, 563M ops/s) :

.text:0000000000400D90 loc_400D90:
.text:0000000000400D90                 add     eax, 0Ah
.text:0000000000400D93                 add     al, [rdx]
.text:0000000000400D95                 add     rdx, 1
.text:0000000000400D99                 cmp     rdx, 503480h
.text:0000000000400DA0                 jnz     short loc_400D90

P.S. setting the component to C++. Optimizer?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
@ 2011-08-24 22:30 ` oleg.smolsky at gmail dot com
  2011-08-25  0:14 ` xinliangli at gmail dot com
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-24 22:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #1 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-24 22:13:26 UTC ---
Created attachment 25097
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25097
The test case

This is the preprocessed source for the test discussed in the mail thread.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
  2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
@ 2011-08-25  0:14 ` xinliangli at gmail dot com
  2011-08-25  0:52 ` xinliangli at gmail dot com
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25  0:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

davidxl <xinliangli at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xinliangli at gmail dot com

--- Comment #2 from davidxl <xinliangli at gmail dot com> 2011-08-24 23:15:44 UTC ---
The problem is fixed in trunk compiler:

1) with 4.6 compiler:


test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   3.29 sec   486.32 M     1.00


RAT_STALLS.registers = 288249 (sampling count 10001)


2) with trunk compiler:


test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   1.34 sec   1194.03 M     1.00

No partial register stalls from user functions.


Inner loop from trunk compiler:

.L55:
    movzbl    0(%rbp,%rcx), %r9d
    addq    $1, %rcx
    cmpl    %ecx, %ebx
    leal    10(%r8,%r9), %r8d
    jg    .L55


Inner loop from 46 compiler:

.L43:
    addl    $10, %eax
    addb    (%rdx), %al
    addq    $1, %rdx
    cmpq    $data8+8000, %rdx
    jne    .L43


RAT stalls (not precise event so the instruction causing stalls is a little
off)
               :  400e27:    nopw   0x0(%rax,%rax,1)
   127  0.0440 :  400e30:    add    $0xa,%eax
  5869  2.0330 :  400e33:    add    (%rdx),%al
282125 97.7263 :  400e35:    add    $0x1,%rdx
               :  400e39:    cmp    $0x404560,%rdx
               :  400e40:    jne    400e30 <main+0xd0>


David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
  2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
  2011-08-25  0:14 ` xinliangli at gmail dot com
@ 2011-08-25  0:52 ` xinliangli at gmail dot com
  2011-08-25  9:00 ` jakub at gcc dot gnu.org
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25  0:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #3 from davidxl <xinliangli at gmail dot com> 2011-08-25 00:13:00 UTC ---
Caused by differences in FE generated code:

46:


        D.6887 = (int) D.6886;
        D.6888 = custom_constant_add<signed char>::do_shift (D.6887);
        D.6889 = (unsigned char) D.6888;
        result.8 = (unsigned char) result;
        D.6891 = D.6889 + result.8;
        result = (signed char) D.6891;
        n = n + 1;


trunk:


        D.6938 = (int) D.6937;
        D.6874 = custom_constant_add<signed char>::do_shift (D.6938);
        D.6939 = (int) result;               <-- promoted to int
        D.6940 = (int) D.6874;               <---promoted to int
        D.6941 = D.6939 + D.6940;
        result = (signed char) D.6941;
        n = n + 1;


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (2 preceding siblings ...)
  2011-08-25  0:52 ` xinliangli at gmail dot com
@ 2011-08-25  9:00 ` jakub at gcc dot gnu.org
  2011-08-25 15:21 ` oleg.smolsky at gmail dot com
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-25  9:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-25 08:55:42 UTC ---
The bugreport is incomplete, I don't see anywhere where you'd state what g++
options were meassured, what CPU was it on, is it -m32 or -m64, etc.
For me, on i7-2600 CPU 4.6.0 (both Fedora 4.6.0-10 and 20110727 4.6 branch
snapshot) is actually much faster than current trunk with -O3 -m64:
4.6.* gives roughly
 0 "int8_t constant add"   0.84 sec   1904.76 M     1.00
while trunk
 0 "int8_t constant add"   1.26 sec   1269.84 M     1.00
4.4.* gives also
 0 "int8_t constant add"   1.26 sec   1269.84 M     1.00
4.3.* gives
 0 "int8_t constant add"   1.26 sec   1269.84 M     1.00
4.2.* gives
 0 "int8_t constant add"   0.84 sec   1904.76 M     1.00
and 4.1.* doesn't compile, because the source has been preprocessed and STL is
dependent on the compiler version.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (3 preceding siblings ...)
  2011-08-25  9:00 ` jakub at gcc dot gnu.org
@ 2011-08-25 15:21 ` oleg.smolsky at gmail dot com
  2011-08-25 15:29 ` oleg.smolsky at gmail dot com
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 15:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #5 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 15:19:57 UTC ---
Created attachment 25103
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25103
The same test preprocessed with g++ 4.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (4 preceding siblings ...)
  2011-08-25 15:21 ` oleg.smolsky at gmail dot com
@ 2011-08-25 15:29 ` oleg.smolsky at gmail dot com
  2011-08-25 16:18 ` hjl.tools at gmail dot com
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 15:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #6 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 15:25:49 UTC ---
Oh, the settings and things were discussed the mail thread... Here is the
digest:

I have compiled and run a set of C++ benchmarks on a CentOS4/64 box using the
following compilers:
 a) g++4.1 that is available for this distro (GCC version 4.1.2 20071124 (Red
Hat 4.1.2-42)
 b) g++4.6 that I built (stock version 4.6.1)

I built the compiler with all the default options (it just has a distinct
installation path):
 ../gcc-%{version}/configure --prefix=/work/tools/gcc46
--enable-languages=c,c++ --with-system-zlib --with-mpfr=/work/tools/mpfr24
--with-gmp=/work/tools/gmp --with-mpc=/work/tools/mpc
LD_LIBRARY_PATH=/work/tools/mpfr/lib24:/work/tools/gmp/lib:/work/tools/mpc/lib

Tests were compiled with -O2 and -O3, I later added -march=native to 4.6
builds.

The processor is Intel quad core something:

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 15
model name    : Genuine Intel(R) CPU                  @ 2.40GHz
stepping    : 4
cpu MHz        : 2393.943
cache size    : 4096 KB
physical id    : 0
siblings    : 4
core id        : 0
cpu cores    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 10
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor
ds_cpl tm2 cx16 xtpr lahf_lm
bogomips    : 4793.09
clflush size    : 64
cache_alignment    : 64
address sizes    : 36 bits physical, 48 bits virtual
power management:


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (5 preceding siblings ...)
  2011-08-25 15:29 ` oleg.smolsky at gmail dot com
@ 2011-08-25 16:18 ` hjl.tools at gmail dot com
  2011-08-25 16:26 ` xinliangli at gmail dot com
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-25 16:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-25 15:58:08 UTC ---
(In reply to comment #6)
>
> The processor is Intel quad core something:
> 
> processor    : 0
> vendor_id    : GenuineIntel
> cpu family    : 6
> model        : 15
> model name    : Genuine Intel(R) CPU                  @ 2.40GHz
> stepping    : 4

Are you using engineering example? It doesn't look
like a production processor.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (6 preceding siblings ...)
  2011-08-25 16:18 ` hjl.tools at gmail dot com
@ 2011-08-25 16:26 ` xinliangli at gmail dot com
  2011-08-25 16:49 ` oleg.smolsky at gmail dot com
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-08-25 16:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #8 from davidxl <xinliangli at gmail dot com> 2011-08-25 16:17:10 UTC ---
gcc46 and gcc47 difference can be reproduced using -O2 -m64.

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (7 preceding siblings ...)
  2011-08-25 16:26 ` xinliangli at gmail dot com
@ 2011-08-25 16:49 ` oleg.smolsky at gmail dot com
  2011-08-25 22:48 ` oleg.smolsky at gmail dot com
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 16:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #9 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 16:26:05 UTC ---
AFAIK it's a production processor, a couple of years old. From x86info:

Family: 6 Model: 15 Stepping: 4 Type: 0 Brand: 0
CPU Model: Core 2 Duo E6600 Original OEM
Feature flags:
 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh
ds acpi mmx fxsr sse sse2 ss ht tm pbe sse3 monitor ds-cpl vmx tm2 ssse3 cx16
xT
PR
Extended feature flags:
 SYSCALL xd em64t lahf_lm
Cache info
 L1 Instruction cache: 32KB, 8-way associative. 64 byte line size.
 L1 Data cache: 32KB, 8-way associative. 64 byte line size.
 L3 unified cache: 4MB, 16-way associative. 64 byte line size.
TLB info
 Instruction TLB: 4x 4MB page entries, or 8x 2MB pages entries, 4-way assoc..
 Instruction TLB: 4K pages, 4-way associative, 128 entries.
 Data TLB: 4MB pages, 4-way associative, 32 entries
 L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
 L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
 Data TLB: 4K pages, 4-way associative, 256 entries.
 Data TLB: 4MB pages, 4-way associative, 32 entries
 64 byte prefetching.
 L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
 L0 Data TLB: 4MB pages, 4-way set associative, 16 entries
 Data TLB: 4K pages, 4-way associative, 256 entries.
The physical package supports 4 logical processors


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (8 preceding siblings ...)
  2011-08-25 16:49 ` oleg.smolsky at gmail dot com
@ 2011-08-25 22:48 ` oleg.smolsky at gmail dot com
  2011-08-26  7:12 ` oleg.smolsky at gmail dot com
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-25 22:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #10 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 22:08:49 UTC ---
BTW, the uint16_t test also got slower for the same very reason. Here is the
inner-most loop generated by g++4.6:

text:0000000000400DA0 loc_400DA0:
.text:0000000000400DA0                 add     eax, 0Ah
.text:0000000000400DA3                 add     ax, [rdx]
.text:0000000000400DA6                 add     rdx, 2
.text:0000000000400DAA                 cmp     rdx, 5092E0h
.text:0000000000400DB1                 jnz     short loc_400DA0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (9 preceding siblings ...)
  2011-08-25 22:48 ` oleg.smolsky at gmail dot com
@ 2011-08-26  7:12 ` oleg.smolsky at gmail dot com
  2011-08-30 20:37 ` matt at use dot net
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg.smolsky at gmail dot com @ 2011-08-26  7:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #11 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-26 00:48:02 UTC ---
Also, I have just built the same suite with GCC version 4.7 that came from
ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20110820/gcc-4.7-20110820.tar.bz2 and
the performance degradation remains:

gcc41:
0                 "int8_t constant add"   1.35 sec   1185.19 M     1.00

gcc47:
0                 "int8_t constant add"   2.37 sec   675.11 M     1.00

Note, these are original unmodified tests, not my digested derivatives


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (10 preceding siblings ...)
  2011-08-26  7:12 ` oleg.smolsky at gmail dot com
@ 2011-08-30 20:37 ` matt at use dot net
  2011-09-15 16:57 ` oleg at smolsky dot net
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: matt at use dot net @ 2011-08-30 20:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

Matt Hargett <matt at use dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matt at use dot net

--- Comment #12 from Matt Hargett <matt at use dot net> 2011-08-30 20:30:15 UTC ---
Can you determine which release introduced the regression?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (11 preceding siblings ...)
  2011-08-30 20:37 ` matt at use dot net
@ 2011-09-15 16:57 ` oleg at smolsky dot net
  2011-09-15 17:39 ` xinliangli at gmail dot com
                   ` (24 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-09-15 16:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #13 from oleg at smolsky dot net 2011-09-15 16:53:26 UTC ---
David, it looks like we are seeing different things with v4.7... See my 
comment 11 - I am still observing the slowdown. Do you have access to 
v4.1 and v4.6? Could you try reproducing my test please?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (12 preceding siblings ...)
  2011-09-15 16:57 ` oleg at smolsky dot net
@ 2011-09-15 17:39 ` xinliangli at gmail dot com
  2011-10-21 23:02 ` xinliangli at gmail dot com
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-09-15 17:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #14 from davidxl <xinliangli at gmail dot com> 2011-09-15 17:28:10 UTC ---
(In reply to comment #13)
> David, it looks like we are seeing different things with v4.7... See my 
> comment 11 - I am still observing the slowdown. Do you have access to 
> v4.1 and v4.6? Could you try reproducing my test please?

Sorry for the delay -- I am pretty swamped these days (till mid October). I
will try to look at the problem more then.

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (13 preceding siblings ...)
  2011-09-15 17:39 ` xinliangli at gmail dot com
@ 2011-10-21 23:02 ` xinliangli at gmail dot com
  2011-10-24 18:28 ` oleg at smolsky dot net
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-21 23:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #15 from davidxl <xinliangli at gmail dot com> 2011-10-21 23:02:16 UTC ---
(In reply to comment #14)
> (In reply to comment #13)
> > David, it looks like we are seeing different things with v4.7... See my 
> > comment 11 - I am still observing the slowdown. Do you have access to 
> > v4.1 and v4.6? Could you try reproducing my test please?
> 
> Sorry for the delay -- I am pretty swamped these days (till mid October). I
> will try to look at the problem more then.
> 
> David


I still can not reproduce the problem with trunk compiler:


rv=4282167296

test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   1.09 sec   1467.89 M     1.00

Total absolute time for int8_t constant folding: 1.09 sec


Can you attach the output of -v and the assembly file with -fverbose-asm -dA
and the optimized dump file with option -fdump-tree-optimized-blocks using
trunk compiler?

thanks,

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (14 preceding siblings ...)
  2011-10-21 23:02 ` xinliangli at gmail dot com
@ 2011-10-24 18:28 ` oleg at smolsky dot net
  2011-10-24 18:28 ` oleg at smolsky dot net
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #17 from oleg at smolsky dot net 2011-10-24 18:27:31 UTC ---
Created attachment 25595
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25595
test.cpp.144t.optimized

--- Comment #18 from oleg at smolsky dot net 2011-10-24 18:27:31 UTC ---
Created attachment 25596
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25596
test.s


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (15 preceding siblings ...)
  2011-10-24 18:28 ` oleg at smolsky dot net
@ 2011-10-24 18:28 ` oleg at smolsky dot net
  2011-10-24 18:33 ` oleg at smolsky dot net
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #16 from oleg at smolsky dot net 2011-10-24 18:27:28 UTC ---
$ /work/tools/gcc47/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/work/tools/gcc47/bin/g++
COLLECT_LTO_WRAPPER=/work/tools/gcc47/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.7/configure --prefix=/work/tools/gcc47 
--enable-languages=c,c++ --with-system-zlib 
--with-mpfr=/work/tools/mpfr24 --with-gmp=/work/tools/gmp 
--with-mpc=/work/tools/mpc 
LD_LIBRARY_PATH=/work/tools/mpfr/lib24:/work/tools/gmp/lib:/work/tools/mpc/lib
Thread model: posix
gcc version 4.7.0 20111001 (experimental) (GCC)

The test case, test.cpp was compiled with this command:
/work/tools/gcc47/bin/g++  -I. -g -O3 -static-libstdc++ -static-libgcc 
-march=native    test.cpp   -o test


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (16 preceding siblings ...)
  2011-10-24 18:28 ` oleg at smolsky dot net
@ 2011-10-24 18:33 ` oleg at smolsky dot net
  2011-10-24 19:34 ` xinliangli at gmail dot com
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 18:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #19 from oleg at smolsky dot net 2011-10-24 18:33:23 UTC ---
Also note that Bugzilla has quietly replaced an older attachment, 
test.cpp, with a new one without adding a comment...


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (17 preceding siblings ...)
  2011-10-24 18:33 ` oleg at smolsky dot net
@ 2011-10-24 19:34 ` xinliangli at gmail dot com
  2011-10-24 19:50 ` oleg at smolsky dot net
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 19:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #20 from davidxl <xinliangli at gmail dot com> 2011-10-24 19:33:18 UTC ---
The test.cpp attached seems to be the same as the old version.

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (18 preceding siblings ...)
  2011-10-24 19:34 ` xinliangli at gmail dot com
@ 2011-10-24 19:50 ` oleg at smolsky dot net
  2011-10-24 19:59 ` xinliangli at gmail dot com
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 19:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #21 from oleg at smolsky dot net 2011-10-24 19:48:57 UTC ---
OK, just in case, here is my current test.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (19 preceding siblings ...)
  2011-10-24 19:50 ` oleg at smolsky dot net
@ 2011-10-24 19:59 ` xinliangli at gmail dot com
  2011-10-24 21:12 ` oleg at smolsky dot net
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 19:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #22 from davidxl <xinliangli at gmail dot com> 2011-10-24 19:58:23 UTC ---
(In reply to comment #21)
> OK, just in case, here is my current test.

Preprocessed test case? I saw the main assembly difference that can explain the
performance diff, but want to make sure it is not due to your new source change
(I saw some print statement addeded).

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (20 preceding siblings ...)
  2011-10-24 19:59 ` xinliangli at gmail dot com
@ 2011-10-24 21:12 ` oleg at smolsky dot net
  2011-10-24 23:00 ` xinliangli at gmail dot com
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2011-10-24 21:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #23 from oleg at smolsky dot net 2011-10-24 21:11:21 UTC ---
Here is the source preprocessed for gcc47. The test exhibits the 
slowdown mentioned in comment 11.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (21 preceding siblings ...)
  2011-10-24 21:12 ` oleg at smolsky dot net
@ 2011-10-24 23:00 ` xinliangli at gmail dot com
  2011-10-24 23:03 ` xinliangli at gmail dot com
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 23:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #24 from davidxl <xinliangli at gmail dot com> 2011-10-24 23:00:22 UTC ---
(In reply to comment #23)
> Here is the source preprocessed for gcc47. The test exhibits the 
> slowdown mentioned in comment 11.


The problem can be reproduced with a simplified test case -- basically
depending on how the result value from the inner loop is used in the outer loop
(related to casting), the inner loop code is quite different - in the slow
case, there are two redundant sign extension and a move instructions generated.

# the fast version
gcc -O3 -DFAST_VER bug.cpp

./a.out 
rv=4282167296

test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   1.05 sec   1523.81 M     1.00

Total absolute time for int8_t constant folding: 1.05 sec

# the slow version:

gcc -O3 bug.cpp
./a.out 
rv=4282167296

test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   1.57 sec   1019.11 M     1.00

Total absolute time for int8_t constant folding: 1.57 sec


# however, when disabling inlining of check_shifted_sum_1 in the slow case, the
runtime is recovered:

gcc -O3 -DNOINLINE bug.cpp

./a.out 
rv=4282167296

test         description   absolute   operations   ratio with
number                     time       per second   test0

 0 "int8_t constant add"   1.05 sec   1523.81 M     1.00

Total absolute time for int8_t constant folding: 1.05 sec



The inner loop body in faster case:

.L60:
    movzbl    0(%rbp,%rcx), %r9d
    addq    $1, %rcx
    cmpl    %ecx, %ebx
    leal    10(%r8,%r9), %r8d
# SUCC: 4 [91.0%]  (dfs_back,can_fallthru) 5 [9.0%] 
(fallthru,can_fallthru,loop_exit)
    jg    .L60

while for the slow case:

.L60:
    movzbl    (%r12,%rcx), %eax
    movsbl    %r8b, %r8d
    addq    $1, %rcx
    leal    10(%rax), %r9d
    movsbl    %r9b, %r9d
    addl    %r8d, %r9d
    cmpl    %ecx, %ebp
    movl    %r9d, %r8d
# SUCC: 4 [91.0%]  (dfs_back,can_fallthru) 5 [9.0%] 
(fallthru,can_fallthru,loop_exit)
    jg    .L60


The relevant source change:

#ifdef NOINLINE
#define INL __attribute__((noinline))
#else
#define INL inline
#endif

template <typename T, typename T2, typename Shifter>
INL void check_shifted_sum_1(T2 result) {
 T temp = (T)SIZE * Shifter::do_shift((T)init_value);
 if (!tolerance_equal<T>((T&)result,temp))
  printf("test %i failed\n", current_test);
}

#ifdef FAST_VER
#define TYPE u_int32_t
#else
#define TYPE int8_t
#endif


template <typename T, typename Shifter>
__attribute__((noinline)) u_int32_t test_constant(T* first, int count, const
char *label)
{
    int i;
    u_int32_t rv = 0;

    start_timer();

    for (i = 0; i < iterations; ++i) {
        T result = 0;
        for (int n = 0; n < count; ++n) {
            result += Shifter::do_shift( first[n] );
        }
        rv += result;
        check_shifted_sum_1<T, TYPE, Shifter>(result);
    }

    record_result( timer(), label );
    return rv;
}


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (22 preceding siblings ...)
  2011-10-24 23:00 ` xinliangli at gmail dot com
@ 2011-10-24 23:03 ` xinliangli at gmail dot com
  2012-01-10 18:07 ` oleg at smolsky dot net
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2011-10-24 23:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #25 from davidxl <xinliangli at gmail dot com> 2011-10-24 23:02:14 UTC ---
Created attachment 25600
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25600
test case for 47



Note that with gcc46, the result is even slower -- it has the RAT stall problem
which is fixed in 47.


David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (23 preceding siblings ...)
  2011-10-24 23:03 ` xinliangli at gmail dot com
@ 2012-01-10 18:07 ` oleg at smolsky dot net
  2012-01-11  9:43 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-01-10 18:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #26 from oleg at smolsky dot net 2012-01-10 18:06:28 UTC ---
Could someone toggle the state assign a milestone please?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (24 preceding siblings ...)
  2012-01-10 18:07 ` oleg at smolsky dot net
@ 2012-01-11  9:43 ` rguenth at gcc dot gnu.org
  2012-01-11 17:27 ` xinliangli at gmail dot com
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-11  9:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-01-11
     Ever Confirmed|0                           |1

--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-11 09:41:25 UTC ---
Confirmed.  Can somebody summarize please and point to the relevant short
testcase that shows the regression (is there only one kind of problem?  this
seems to be a benchmark suite).  A short testcase is preprocessed and
at most a few hundred lines.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (25 preceding siblings ...)
  2012-01-11  9:43 ` rguenth at gcc dot gnu.org
@ 2012-01-11 17:27 ` xinliangli at gmail dot com
  2012-03-02  0:56 ` oleg at smolsky dot net
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: xinliangli at gmail dot com @ 2012-01-11 17:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #28 from davidxl <xinliangli at gmail dot com> 2012-01-11 17:26:46 UTC ---
See comment 24 for shorter test case.

Summary:

1) the regression reported by Oleg in gcc4_6 and earlier versions is due to FE
code generation difference which lead to the backend to generate code leading
to partial register stall.
2) the RAT stall problem is fixed in gcc4_7 
3) however in 4_7, there is a different problem -- redundant sign-extension and
move instruction is generated. It could be due to the limitation in RTL forward
propagation and combine pass to deal with multiple downward uses

David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (26 preceding siblings ...)
  2012-01-11 17:27 ` xinliangli at gmail dot com
@ 2012-03-02  0:56 ` oleg at smolsky dot net
  2012-03-02  8:08 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-02  0:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #29 from oleg at smolsky dot net 2012-03-02 00:54:53 UTC ---
Is it possible to target this to 4.7? These optimization issues result 
in benchmarcably slower code...


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (27 preceding siblings ...)
  2012-03-02  0:56 ` oleg at smolsky dot net
@ 2012-03-02  8:08 ` jakub at gcc dot gnu.org
  2012-03-02  8:23 ` oleg at smolsky dot net
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02  8:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 08:07:15 UTC ---
Created attachment 26809
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26809
pr50182.C

Even the reduced testcase is orders of magnitude longer than what would be
desirable for analysis, I've tried to reduce it just to the templates that are
actually needed (and can be meassured just with time), does this reflect the
slowdowns you are seeing?  The next step at reducing would be to remove all the
template mess, instantiate it by hand, and perhaps also inline by hand.  There
is no reason why we shouldn't be just having one loop with all the statements
in it.  On this reduced testcase on Intel i7-2600 CPU with -O3 the
-DFAST_VER/-DNOINLINE don't seem to make any difference, but 4.6 is measurably
faster than 4.7.

In any case, this is way too late for 4.7.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (28 preceding siblings ...)
  2012-03-02  8:08 ` jakub at gcc dot gnu.org
@ 2012-03-02  8:23 ` oleg at smolsky dot net
  2012-03-02  8:29 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-02  8:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #31 from oleg at smolsky dot net 2012-03-02 08:21:41 UTC ---
I don't think there is a need to actually check the result in this 
benchmarkable fragment, so that will reduce the code a little. The only 
thing that I was hitting is about fooling/forcing the compiler not to 
discard the intermediate result and actually perform every calculation 
and iteration :)

Let me try do digest this further. I'll also get you a result from our 
production compiler (v4.1 that emits the fastest code)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (29 preceding siblings ...)
  2012-03-02  8:23 ` oleg at smolsky dot net
@ 2012-03-02  8:29 ` jakub at gcc dot gnu.org
  2012-03-02  9:14 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02  8:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 08:28:34 UTC ---
For me, 4.1 is equally fast to 4.6 on my CPU and on the reduced testcase I've
attached (not clear if it models what the original benchmark did right or not),
and on the trunk regressed with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176072
Before that the inner loop looked like:
.L12:
        addl    $10, %edx
        addb    0(%rbp,%rcx), %dl
        addq    $1, %rcx
        cmpl    %ecx, %ebx
        jg      .L12
and now it looks like:
.L12:
        movzbl  0(%rbp,%rdx), %r8d
        addq    $1, %rdx
        cmpl    %edx, %ebx
        leal    10(%rcx,%r8), %ecx
        jg      .L12


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (30 preceding siblings ...)
  2012-03-02  8:29 ` jakub at gcc dot gnu.org
@ 2012-03-02  9:14 ` jakub at gcc dot gnu.org
  2012-03-03  2:20 ` oleg at smolsky dot net
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-02  9:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 09:13:52 UTC ---
After Jason's patch (which needs to be kept, it was a wrong-code bugfix), we
get out of the FE the addition in int type, while previously it was in unsigned
char type.  I.e.

  int D.2177;
  signed char D.2138;
  T D.2178;
  T D.2179;
  T D.2180;
  signed char result;
        D.2138 = custom_constant_add<signed char>::do_shift (D.2177);
        D.2178 = (T) result;
        D.2179 = (T) D.2138;
        D.2180 = D.2178 + D.2179;
        result = (signed char) D.2180;
where T used to be unsigned char before and now is int.
And no GIMPLE optimization pass manages to narrow the addition operation
(together with the previous sign extensions and following demotion) to an
unsigned char operation (signed char would be wrong, because of the possible
overflow).  I bet such narrowing in these cases could even help the vectorizer,
which if it were to vectorize this or similar loops (it doesn't in this case),
would do the promotions/demotions needlessly.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (31 preceding siblings ...)
  2012-03-02  9:14 ` jakub at gcc dot gnu.org
@ 2012-03-03  2:20 ` oleg at smolsky dot net
  2012-03-03  2:47 ` oleg at smolsky dot net
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03  2:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #34 from oleg at smolsky dot net 2012-03-03 02:19:21 UTC ---
OK, here are some benchmark numbers for the test compiled verbatim with 
g++41/g++463 -O2:

$ time ./test41
rv=4243767296

real    0m6.063s
user    0m6.058s
sys     0m0.001s

$ time ./test46
rv=4243767296

real    0m11.425s
user    0m11.415s
sys     0m0.003s

$ time ./test46-fast     #(ie built it with -DFAST_VER)
rv=4243767296

real    0m11.389s
user    0m11.383s
sys     0m0.003s

Let me see how the sample can be digested further down...


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (32 preceding siblings ...)
  2012-03-03  2:20 ` oleg at smolsky dot net
@ 2012-03-03  2:47 ` oleg at smolsky dot net
  2012-03-03  3:00 ` oleg at smolsky dot net
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03  2:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #35 from oleg at smolsky dot net 2012-03-03 02:45:15 UTC ---
Here is a smaller version. BTW, I've noticed another regression in 
optimization in v4.1 when using a const global...


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (33 preceding siblings ...)
  2012-03-03  2:47 ` oleg at smolsky dot net
@ 2012-03-03  3:00 ` oleg at smolsky dot net
  2012-03-06 16:34 ` oleg at smolsky dot net
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-03  3:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #36 from oleg at smolsky dot net 2012-03-03 02:59:11 UTC ---
Here is the code emitted by g++ 4.6.3 for smaller_test.cpp (attached to 
the bug)

  unsigned int test_constant<> proc near
                 mov     r9d, cs:iterations
                 xor     r8d, r8d
                 xor     eax, eax
                 test    r9d, r9d
                 jle     short locret_400552
                 db      66h, 66h, 66h
                 nop
                 db      66h, 66h
                 nop

loc_400528:
                 xor     ecx, ecx
                 xor     edx, edx
                 test    esi, esi
                 jle     short loc_40054E

loc_400530:
                 add     edx, 0Ah
                 add     dl, [rdi+rcx]
                 add     rcx, 1
                 cmp     esi, ecx
                 jg      short loc_400530
                 movsx   edx, dl

loc_400541:
                 add     r8d, 1
                 add     eax, edx
                 cmp     r8d, r9d
                 jnz     short loc_400528
                 rep retn

loc_40054E:
                 xor     edx, edx
                 jmp     short loc_400541

locret_400552:
                 rep retn


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (34 preceding siblings ...)
  2012-03-03  3:00 ` oleg at smolsky dot net
@ 2012-03-06 16:34 ` oleg at smolsky dot net
  2012-03-06 17:27 ` jakub at gcc dot gnu.org
  2012-03-06 19:40 ` oleg at smolsky dot net
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-06 16:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #37 from oleg at smolsky dot net 2012-03-06 16:34:27 UTC ---
Hey Jakub, is this smaller example digestable?
     http://gcc.gnu.org/bugzilla/attachment.cgi?id=26814

The asm output is straightforward, but I obviously have no clue about 
how complex the corresponding compiler's internal state is...


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (35 preceding siblings ...)
  2012-03-06 16:34 ` oleg at smolsky dot net
@ 2012-03-06 17:27 ` jakub at gcc dot gnu.org
  2012-03-06 19:40 ` oleg at smolsky dot net
  37 siblings, 0 replies; 39+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-06 17:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #38 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-06 17:26:24 UTC ---
Sorry, can't reproduce any performance degradation between 4.1 and 4.6
on the http://gcc.gnu.org/bugzilla/attachment.cgi?id=26814 testcase (-O3 -m64,
default -mtune=generic):
on i7-2600 4.1 user time is 0m3.833s, 4.6 0m3.411s and 4.7 0m5.102s,
on AMD Barcelona 4.1 user time is 0m8.798s, 4.6 0m5.875s and 4.7 0m5.855s.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Bug target/50182] Performance degradation from gcc 4.1 (x86_64)
  2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
                   ` (36 preceding siblings ...)
  2012-03-06 17:27 ` jakub at gcc dot gnu.org
@ 2012-03-06 19:40 ` oleg at smolsky dot net
  37 siblings, 0 replies; 39+ messages in thread
From: oleg at smolsky dot net @ 2012-03-06 19:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #39 from oleg at smolsky dot net 2012-03-06 19:39:03 UTC ---
Hmm... funky. I can reproduce the issue on a newer Intel machine:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           L5410  @ 2.33GHz
stepping        : 6
cpu MHz         : 2327.445
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
....

$ time ./test41
     real    0m6.270s
     user    0m6.268s
     sys     0m0.000s

$ time ./test44
     real    0m5.524s
     user    0m5.523s
     sys     0m0.000s

$ time ./test46
     real    0m11.721s
     user    0m11.718s
     sys     0m0.001s

P.S. the middle one is made using g++ (GCC) 4.4.5 20110214 (Red Hat 
4.4.5-6). The rest are original binaries made a couple of days ago.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2012-03-06 19:40 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 21:27 [Bug c++/50182] New: Performance degradation from gcc 4.1 (x86_64) oleg.smolsky at gmail dot com
2011-08-24 22:30 ` [Bug target/50182] " oleg.smolsky at gmail dot com
2011-08-25  0:14 ` xinliangli at gmail dot com
2011-08-25  0:52 ` xinliangli at gmail dot com
2011-08-25  9:00 ` jakub at gcc dot gnu.org
2011-08-25 15:21 ` oleg.smolsky at gmail dot com
2011-08-25 15:29 ` oleg.smolsky at gmail dot com
2011-08-25 16:18 ` hjl.tools at gmail dot com
2011-08-25 16:26 ` xinliangli at gmail dot com
2011-08-25 16:49 ` oleg.smolsky at gmail dot com
2011-08-25 22:48 ` oleg.smolsky at gmail dot com
2011-08-26  7:12 ` oleg.smolsky at gmail dot com
2011-08-30 20:37 ` matt at use dot net
2011-09-15 16:57 ` oleg at smolsky dot net
2011-09-15 17:39 ` xinliangli at gmail dot com
2011-10-21 23:02 ` xinliangli at gmail dot com
2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:28 ` oleg at smolsky dot net
2011-10-24 18:33 ` oleg at smolsky dot net
2011-10-24 19:34 ` xinliangli at gmail dot com
2011-10-24 19:50 ` oleg at smolsky dot net
2011-10-24 19:59 ` xinliangli at gmail dot com
2011-10-24 21:12 ` oleg at smolsky dot net
2011-10-24 23:00 ` xinliangli at gmail dot com
2011-10-24 23:03 ` xinliangli at gmail dot com
2012-01-10 18:07 ` oleg at smolsky dot net
2012-01-11  9:43 ` rguenth at gcc dot gnu.org
2012-01-11 17:27 ` xinliangli at gmail dot com
2012-03-02  0:56 ` oleg at smolsky dot net
2012-03-02  8:08 ` jakub at gcc dot gnu.org
2012-03-02  8:23 ` oleg at smolsky dot net
2012-03-02  8:29 ` jakub at gcc dot gnu.org
2012-03-02  9:14 ` jakub at gcc dot gnu.org
2012-03-03  2:20 ` oleg at smolsky dot net
2012-03-03  2:47 ` oleg at smolsky dot net
2012-03-03  3:00 ` oleg at smolsky dot net
2012-03-06 16:34 ` oleg at smolsky dot net
2012-03-06 17:27 ` jakub at gcc dot gnu.org
2012-03-06 19:40 ` oleg at smolsky dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).