public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
@ 2006-06-04 19:53 ` jsm28 at gcc dot gnu dot org
  2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2006-06-04 19:53 UTC (permalink / raw)
  To: gcc-bugs



-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Inlining related regression |[4.0/4.1/4.2 Regression]
                   |for gcc-4.x                 |Inlining related regression
                   |                            |for gcc-4.x
   Target Milestone|---                         |4.1.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
  2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
@ 2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
  2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-06-04 19:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2006-06-04 19:58 -------
This was a P2 before P3 became the default.(In reply to comment #4)
>   first$current$current$current.506 = first$current$current$current.506 + 8B;
>   D.34505 = D.34505 + first$current$current$current->value;

If we swaped around those two statements at the tree level, out of SSA would
not have produced an extra assignment.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
  2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
  2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
@ 2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
  2006-08-28  5:47 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-07-05 17:45 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
@ 2006-08-28  5:47 ` pinskia at gcc dot gnu dot org
  2007-02-14  9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-08-28  5:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from pinskia at gcc dot gnu dot org  2006-08-28 05:47 -------
HUH:
  # D.34332_4 = PHI <D.34332_139(7), D.34332_13(6)>;
  # first$current$current$current_3 = PHI <first$current$current$current_98(7),
first$current$current$current_11(6)>;
  # first$current$current$current_282 = PHI
<first$current$current$current_98(7), first$current$current$current_11(6)>;
<L10>:;
  first$current$current$current_98 = first$current$current$current_282 + 8B;
  tmp$current$current_113 = first$current$current$current_3 + 8B;
  tmp$current_122 = tmp$current$current_113 - 8B;
  y_134 = tmp$current_122;
  D.34330_138 = y_134->value;
  D.34332_139 = D.34332_4 + D.34330_138;
  if (last$current$current$current_12 != first$current$current$current_98) goto
<L10>; else goto <L12>;


Isn't _3 the same as _282?  Why don't we elimitate it?  (there is no way not to
create it in the first place with this testcase as it is not really created by
any pass).  I think if we eliminate that, this should be fixed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2006-08-28  5:47 ` pinskia at gcc dot gnu dot org
@ 2007-02-14  9:07 ` mmitchel at gcc dot gnu dot org
  2007-11-22 16:11 ` jakub at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-02-14  9:07 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.2                       |4.1.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2007-02-14  9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
@ 2007-11-22 16:11 ` jakub at gcc dot gnu dot org
  2007-11-22 16:41 ` jakub at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 16:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from jakub at gcc dot gnu dot org  2007-11-22 16:11 -------
On the trunk there is no difference between -O2 and -O2 -finline-functions
(the latter is perhaps 1% better), both are as bad as 4.1/4.2 with -O2
-finline-functions.  Compiling with -O2 -fno-inline-small-functions gives the
speed back.  Both x86_64-linux and i686-linux.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2007-11-22 16:11 ` jakub at gcc dot gnu dot org
@ 2007-11-22 16:41 ` jakub at gcc dot gnu dot org
  2007-11-22 17:04 ` jakub at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 16:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jakub at gcc dot gnu dot org  2007-11-22 16:41 -------
On x86_64-linux -m64 with -O2 gcc doesn't hoist movabsq insns out of the loops,
which can give some performance back:

time ./pr23305-slow

real    0m4.028s
user    0m4.023s
sys     0m0.003s
time ./pr23305-slow2

real    0m3.436s
user    0m3.434s
sys     0m0.001s

when I hoist it by hand in assembly:

--- pr23305-slow.s      2007-11-22 17:14:09.000000000 +0100
+++ pr23305-slow2.s     2007-11-22 17:31:31.000000000 +0100
@@ -222,16 +222,16 @@ _Z13s000005a_testv:
 .LVL2:
 .LBB329:
 .LBB330:
        .loc 1 28697 0
        cmpq    %rax, %rdx
        je      .L13
+       movabsq $4613937818241073152, %r8
        .p2align 4,,10
        .p2align 3
 .L14:
-       movabsq $4613937818241073152, %r8
        movq    %r8, (%rax)
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L14
 .L13:
 .LBE330:
@@ -242,17 +242,17 @@ _Z13s000005a_testv:
 .LVL3:
 .LBB326:
 .LBB327:
        .loc 1 28697 0
        cmpq    %rax, %rdx
        je      .L15
+       movabsq $4613937818241073152, %rdi
        .p2align 4,,10
        .p2align 3
 .L16:
 .LBE327:
-       movabsq $4613937818241073152, %rdi
        movq    %rdi, (%rax)
 .LBB328:
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L16
 .L15:

but still the -O2 -fno-inline-small-functions version is much faster:

time ./pr23305-fast

real    0m1.591s
user    0m1.588s
sys     0m0.001s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2007-11-22 16:41 ` jakub at gcc dot gnu dot org
@ 2007-11-22 17:04 ` jakub at gcc dot gnu dot org
  2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
  2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org
  9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 17:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from jakub at gcc dot gnu dot org  2007-11-22 17:04 -------
The remaining difference is register allocation issue:

time ./pr23305-vanilla; time ./pr23305-fixed

real    0m4.030s
user    0m4.028s
sys     0m0.002s

real    0m1.593s
user    0m1.592s
sys     0m0.001s

with hand-edited changes:

--- pr23305-vanilla.s   2007-11-22 17:57:15.000000000 +0100
+++ pr23305-fixed.s     2007-11-22 17:57:56.000000000 +0100
@@ -95,49 +95,49 @@ _Z13s000005a_testv:
        subq    $24, %rsp
 .LCFI1:
        movq    _ZL3dpe(%rip), %rdx
        movq    _ZL3dpb(%rip), %rax
        cmpq    %rax, %rdx
        je      .L13
+       movabsq $4613937818241073152, %r8
        .p2align 4,,10
        .p2align 3
 .L14:
-       movabsq $4613937818241073152, %r8
        movq    %r8, (%rax)
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L14
 .L13:
        movq    _ZL3Dpe(%rip), %rdx
        movq    _ZL3Dpb(%rip), %rax
        cmpq    %rax, %rdx
        je      .L15
+       movabsq $4613937818241073152, %rdi
        .p2align 4,,10
        .p2align 3
 .L16:
-       movabsq $4613937818241073152, %rdi
        movq    %rdi, (%rax)
        addq    $8, %rax
        cmpq    %rax, %rdx
        jne     .L16
 .L15:
        movq    _ZL5rrDPe(%rip), %rdx
        movq    _ZL5rrDPb(%rip), %rax
        movsd   _ZL1D(%rip), %xmm0
        cmpq    %rdx, %rax
        movsd   %xmm0, 8(%rsp)
        je      .L18
+       movsd   8(%rsp), %xmm0
        .p2align 4,,10
        .p2align 3
 .L24:
-       movsd   8(%rsp), %xmm0
        addsd   (%rax), %xmm0
        addq    $8, %rax
        cmpq    %rax, %rdx
-       movsd   %xmm0, 8(%rsp)
        jne     .L24
+       movsd   %xmm0, 8(%rsp)
 .L18:
        movsd   8(%rsp), %xmm0
        ucomisd .LC2(%rip), %xmm0
        jp      .L23
        jne     .L23
        addq    $24, %rsp

In lreg dump we have:

(code_label:HI 98 35 97 7 24 "" [1 uses])
(note:HI 97 98 45 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
(insn:HI 45 97 46 7 pr23305.ii:28564 (set (reg/v:DF 64 [ result ])
        (plus:DF (reg/v:DF 64 [ result ])
            (mem/s:DF (reg:DI 58 [ ivtmp.254 ]) [29 <variable>.value+0 S8
A8]))) 680 {*fop_df_comm_sse} (nil))
(insn:HI 46 45 48 7 pr23305.ii:28564 (parallel [
            (set (reg:DI 58 [ ivtmp.254 ])
                (plus:DI (reg:DI 58 [ ivtmp.254 ])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))  
        ]) 244 {*adddi_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn:HI 48 46 49 7 pr23305.ii:28673 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/f:DI 60 [ last$current$current$current ])
            (reg:DI 58 [ ivtmp.254 ]))) 2 {cmpdi_1_insn_rex64} (nil))
(jump_insn:HI 49 48 50 7 pr23305.ii:28673 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref:DI 98)
            (pc))) 579 {*jcc_1} (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
            (nil))))

and
Register 64 pref SSE_FIRST_REG, else SSE_REGS
Register 64 used 5 times across 23 insns; set 2 times; user var; crosses 3
calls; pref SSE_FIRST_REG, else SSE_REGS.

Yet global alloc puts it into 8(%rsp), which is certainly fine, except in a the
tight loop.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2007-11-22 17:04 ` jakub at gcc dot gnu dot org
@ 2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
  2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org
  9 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from hubicka at gcc dot gnu dot org  2008-02-05 13:31 -------
This testcase is still slower, 4.4s with -O2 and 3.6s with -O2
-fno-inline-small-functions (on i386).  I wondered if the patch counting
frequency of calls crossed helped here. My slowdown is smaller than what
reported by Jakub, so perhaps it did partially, but we are still having
regression here.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem
       [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 15:10 ` hubicka at gcc dot gnu dot org
  9 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 15:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from hubicka at gcc dot gnu dot org  2008-02-06 15:10 -------
Looks like last remaining problem is the missed loop invariant motion due to
STACK_REGS hack as in the case of pr23322


hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time
./a.out-nostackregs-hack

real    0m3.637s
user    0m3.588s
sys     0m0.008s
hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time ./a.out-mainline


Does someone have 2.95 around to double check that it didn't perform
significandly better than 3.4?
real    0m4.627s
user    0m4.484s
sys     0m0.016s
hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time ./a.out-gcc-3.4

real    0m4.229s
user    0m3.876s
sys     0m0.004s


*** This bug has been marked as a duplicate of 23322 ***


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
          Component|tree-optimization           |target
           Keywords|ra                          |
         Resolution|                            |DUPLICATE
            Summary|[4.0/4.1/4.2/4.3 Regression]|[4.0/4.1/4.2/4.3 Regression]
                   |Inlining related regression |x87 load hoisting problem
                   |for gcc-4.x                 |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-06 15:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
2006-08-28  5:47 ` pinskia at gcc dot gnu dot org
2007-02-14  9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
2007-11-22 16:11 ` jakub at gcc dot gnu dot org
2007-11-22 16:41 ` jakub at gcc dot gnu dot org
2007-11-22 17:04 ` jakub at gcc dot gnu dot org
2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).