public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
@ 2006-06-04 19:53 ` jsm28 at gcc dot gnu dot org
2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2006-06-04 19:53 UTC (permalink / raw)
To: gcc-bugs
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Inlining related regression |[4.0/4.1/4.2 Regression]
|for gcc-4.x |Inlining related regression
| |for gcc-4.x
Target Milestone|--- |4.1.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
@ 2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
` (7 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-06-04 19:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from pinskia at gcc dot gnu dot org 2006-06-04 19:58 -------
This was a P2 before P3 became the default.(In reply to comment #4)
> first$current$current$current.506 = first$current$current$current.506 + 8B;
> D.34505 = D.34505 + first$current$current$current->value;
If we swaped around those two statements at the tree level, out of SSA would
not have produced an extra assignment.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P2 |P3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
@ 2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
2006-08-28 5:47 ` pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-07-05 17:45 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
@ 2006-08-28 5:47 ` pinskia at gcc dot gnu dot org
2007-02-14 9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
` (5 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-08-28 5:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from pinskia at gcc dot gnu dot org 2006-08-28 05:47 -------
HUH:
# D.34332_4 = PHI <D.34332_139(7), D.34332_13(6)>;
# first$current$current$current_3 = PHI <first$current$current$current_98(7),
first$current$current$current_11(6)>;
# first$current$current$current_282 = PHI
<first$current$current$current_98(7), first$current$current$current_11(6)>;
<L10>:;
first$current$current$current_98 = first$current$current$current_282 + 8B;
tmp$current$current_113 = first$current$current$current_3 + 8B;
tmp$current_122 = tmp$current$current_113 - 8B;
y_134 = tmp$current_122;
D.34330_138 = y_134->value;
D.34332_139 = D.34332_4 + D.34330_138;
if (last$current$current$current_12 != first$current$current$current_98) goto
<L10>; else goto <L12>;
Isn't _3 the same as _282? Why don't we elimitate it? (there is no way not to
create it in the first place with this testcase as it is not really created by
any pass). I think if we eliminate that, this should be fixed.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2006-08-28 5:47 ` pinskia at gcc dot gnu dot org
@ 2007-02-14 9:07 ` mmitchel at gcc dot gnu dot org
2007-11-22 16:11 ` jakub at gcc dot gnu dot org
` (4 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-02-14 9:07 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.1.2 |4.1.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2007-02-14 9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
@ 2007-11-22 16:11 ` jakub at gcc dot gnu dot org
2007-11-22 16:41 ` jakub at gcc dot gnu dot org
` (3 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 16:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jakub at gcc dot gnu dot org 2007-11-22 16:11 -------
On the trunk there is no difference between -O2 and -O2 -finline-functions
(the latter is perhaps 1% better), both are as bad as 4.1/4.2 with -O2
-finline-functions. Compiling with -O2 -fno-inline-small-functions gives the
speed back. Both x86_64-linux and i686-linux.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2007-11-22 16:11 ` jakub at gcc dot gnu dot org
@ 2007-11-22 16:41 ` jakub at gcc dot gnu dot org
2007-11-22 17:04 ` jakub at gcc dot gnu dot org
` (2 subsequent siblings)
9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 16:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from jakub at gcc dot gnu dot org 2007-11-22 16:41 -------
On x86_64-linux -m64 with -O2 gcc doesn't hoist movabsq insns out of the loops,
which can give some performance back:
time ./pr23305-slow
real 0m4.028s
user 0m4.023s
sys 0m0.003s
time ./pr23305-slow2
real 0m3.436s
user 0m3.434s
sys 0m0.001s
when I hoist it by hand in assembly:
--- pr23305-slow.s 2007-11-22 17:14:09.000000000 +0100
+++ pr23305-slow2.s 2007-11-22 17:31:31.000000000 +0100
@@ -222,16 +222,16 @@ _Z13s000005a_testv:
.LVL2:
.LBB329:
.LBB330:
.loc 1 28697 0
cmpq %rax, %rdx
je .L13
+ movabsq $4613937818241073152, %r8
.p2align 4,,10
.p2align 3
.L14:
- movabsq $4613937818241073152, %r8
movq %r8, (%rax)
addq $8, %rax
cmpq %rax, %rdx
jne .L14
.L13:
.LBE330:
@@ -242,17 +242,17 @@ _Z13s000005a_testv:
.LVL3:
.LBB326:
.LBB327:
.loc 1 28697 0
cmpq %rax, %rdx
je .L15
+ movabsq $4613937818241073152, %rdi
.p2align 4,,10
.p2align 3
.L16:
.LBE327:
- movabsq $4613937818241073152, %rdi
movq %rdi, (%rax)
.LBB328:
addq $8, %rax
cmpq %rax, %rdx
jne .L16
.L15:
but still the -O2 -fno-inline-small-functions version is much faster:
time ./pr23305-fast
real 0m1.591s
user 0m1.588s
sys 0m0.001s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2007-11-22 16:41 ` jakub at gcc dot gnu dot org
@ 2007-11-22 17:04 ` jakub at gcc dot gnu dot org
2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org
9 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-22 17:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from jakub at gcc dot gnu dot org 2007-11-22 17:04 -------
The remaining difference is register allocation issue:
time ./pr23305-vanilla; time ./pr23305-fixed
real 0m4.030s
user 0m4.028s
sys 0m0.002s
real 0m1.593s
user 0m1.592s
sys 0m0.001s
with hand-edited changes:
--- pr23305-vanilla.s 2007-11-22 17:57:15.000000000 +0100
+++ pr23305-fixed.s 2007-11-22 17:57:56.000000000 +0100
@@ -95,49 +95,49 @@ _Z13s000005a_testv:
subq $24, %rsp
.LCFI1:
movq _ZL3dpe(%rip), %rdx
movq _ZL3dpb(%rip), %rax
cmpq %rax, %rdx
je .L13
+ movabsq $4613937818241073152, %r8
.p2align 4,,10
.p2align 3
.L14:
- movabsq $4613937818241073152, %r8
movq %r8, (%rax)
addq $8, %rax
cmpq %rax, %rdx
jne .L14
.L13:
movq _ZL3Dpe(%rip), %rdx
movq _ZL3Dpb(%rip), %rax
cmpq %rax, %rdx
je .L15
+ movabsq $4613937818241073152, %rdi
.p2align 4,,10
.p2align 3
.L16:
- movabsq $4613937818241073152, %rdi
movq %rdi, (%rax)
addq $8, %rax
cmpq %rax, %rdx
jne .L16
.L15:
movq _ZL5rrDPe(%rip), %rdx
movq _ZL5rrDPb(%rip), %rax
movsd _ZL1D(%rip), %xmm0
cmpq %rdx, %rax
movsd %xmm0, 8(%rsp)
je .L18
+ movsd 8(%rsp), %xmm0
.p2align 4,,10
.p2align 3
.L24:
- movsd 8(%rsp), %xmm0
addsd (%rax), %xmm0
addq $8, %rax
cmpq %rax, %rdx
- movsd %xmm0, 8(%rsp)
jne .L24
+ movsd %xmm0, 8(%rsp)
.L18:
movsd 8(%rsp), %xmm0
ucomisd .LC2(%rip), %xmm0
jp .L23
jne .L23
addq $24, %rsp
In lreg dump we have:
(code_label:HI 98 35 97 7 24 "" [1 uses])
(note:HI 97 98 45 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
(insn:HI 45 97 46 7 pr23305.ii:28564 (set (reg/v:DF 64 [ result ])
(plus:DF (reg/v:DF 64 [ result ])
(mem/s:DF (reg:DI 58 [ ivtmp.254 ]) [29 <variable>.value+0 S8
A8]))) 680 {*fop_df_comm_sse} (nil))
(insn:HI 46 45 48 7 pr23305.ii:28564 (parallel [
(set (reg:DI 58 [ ivtmp.254 ])
(plus:DI (reg:DI 58 [ ivtmp.254 ])
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
]) 244 {*adddi_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn:HI 48 46 49 7 pr23305.ii:28673 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/f:DI 60 [ last$current$current$current ])
(reg:DI 58 [ ivtmp.254 ]))) 2 {cmpdi_1_insn_rex64} (nil))
(jump_insn:HI 49 48 50 7 pr23305.ii:28673 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0x0]))
(label_ref:DI 98)
(pc))) 579 {*jcc_1} (expr_list:REG_DEAD (reg:CCZ 17 flags)
(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
(nil))))
and
Register 64 pref SSE_FIRST_REG, else SSE_REGS
Register 64 used 5 times across 23 insns; set 2 times; user var; crosses 3
calls; pref SSE_FIRST_REG, else SSE_REGS.
Yet global alloc puts it into 8(%rsp), which is certainly fine, except in a the
tight loop.
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 Regression] Inlining related regression for gcc-4.x
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2007-11-22 17:04 ` jakub at gcc dot gnu dot org
@ 2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org
9 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-05 13:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from hubicka at gcc dot gnu dot org 2008-02-05 13:31 -------
This testcase is still slower, 4.4s with -O2 and 3.6s with -O2
-fno-inline-small-functions (on i386). I wondered if the patch counting
frequency of calls crossed helped here. My slowdown is smaller than what
reported by Jakub, so perhaps it did partially, but we are still having
regression here.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
@ 2008-02-06 15:10 ` hubicka at gcc dot gnu dot org
9 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-06 15:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from hubicka at gcc dot gnu dot org 2008-02-06 15:10 -------
Looks like last remaining problem is the missed loop invariant motion due to
STACK_REGS hack as in the case of pr23322
hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time
./a.out-nostackregs-hack
real 0m3.637s
user 0m3.588s
sys 0m0.008s
hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time ./a.out-mainline
Does someone have 2.95 around to double check that it didn't perform
significandly better than 3.4?
real 0m4.627s
user 0m4.484s
sys 0m0.016s
hubicka@occam:/aux/hubicka/trunk-write/buidl2/gcc$ time ./a.out-gcc-3.4
real 0m4.229s
user 0m3.876s
sys 0m0.004s
*** This bug has been marked as a duplicate of 23322 ***
--
hubicka at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Component|tree-optimization |target
Keywords|ra |
Resolution| |DUPLICATE
Summary|[4.0/4.1/4.2/4.3 Regression]|[4.0/4.1/4.2/4.3 Regression]
|Inlining related regression |x87 load hoisting problem
|for gcc-4.x |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23305
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-06 15:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-23305-10914@http.gcc.gnu.org/bugzilla/>
2006-06-04 19:53 ` [Bug tree-optimization/23305] [4.0/4.1/4.2 Regression] Inlining related regression for gcc-4.x jsm28 at gcc dot gnu dot org
2006-06-04 19:58 ` pinskia at gcc dot gnu dot org
2006-07-05 17:45 ` mmitchel at gcc dot gnu dot org
2006-08-28 5:47 ` pinskia at gcc dot gnu dot org
2007-02-14 9:07 ` [Bug tree-optimization/23305] [4.0/4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
2007-11-22 16:11 ` jakub at gcc dot gnu dot org
2007-11-22 16:41 ` jakub at gcc dot gnu dot org
2007-11-22 17:04 ` jakub at gcc dot gnu dot org
2008-02-05 13:32 ` hubicka at gcc dot gnu dot org
2008-02-06 15:10 ` [Bug target/23305] [4.0/4.1/4.2/4.3 Regression] x87 load hoisting problem hubicka at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).