* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
@ 2008-09-04 9:50 ` rguenth at gcc dot gnu dot org
2008-09-04 16:04 ` hjl dot tools at gmail dot com
` (35 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-04 9:50 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
Target Milestone|--- |4.4.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
2008-09-04 9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
@ 2008-09-04 16:04 ` hjl dot tools at gmail dot com
2008-09-04 16:14 ` hjl dot tools at gmail dot com
` (34 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 16:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from hjl dot tools at gmail dot com 2008-09-04 16:02 -------
"-O2 -march=core2 -fno-ira -fno-regmove" generates
movq x(%rip), %mm0
paddd y(%rip), %mm0
movq %mm0, -8(%rsp)
movq -8(%rsp), %rax
It seems that regmove isn't effective for IRA.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
2008-09-04 9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
2008-09-04 16:04 ` hjl dot tools at gmail dot com
@ 2008-09-04 16:14 ` hjl dot tools at gmail dot com
2008-09-04 17:44 ` hjl dot tools at gmail dot com
` (33 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 16:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from hjl dot tools at gmail dot com 2008-09-04 16:13 -------
(In reply to comment #1)
> "-O2 -march=core2 -fno-ira -fno-regmove" generates
>
> movq x(%rip), %mm0
> paddd y(%rip), %mm0
> movq %mm0, -8(%rsp)
> movq -8(%rsp), %rax
>
> It seems that regmove isn't effective for IRA.
>
regmove is turned off for IRA. Revert the regmove.c change in
revision 139590
* regmove.c (regmove_optimize): Don't do replacement of output for
IRA.
fixes this regression. Vladimir, IRA should either deal with replacement
of output or let regmove handle it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (2 preceding siblings ...)
2008-09-04 16:14 ` hjl dot tools at gmail dot com
@ 2008-09-04 17:44 ` hjl dot tools at gmail dot com
2008-09-04 17:55 ` hjl dot tools at gmail dot com
` (32 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 17:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from hjl dot tools at gmail dot com 2008-09-04 17:43 -------
The problem may be in IRA_COVER_CLASSES. -mtune=core2 turns on
TARGET_INTER_UNIT_MOVES, which means move between mmx and 64bit
integer registers is cheaper than load/store. But IRA doesn't
handle it properly.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (3 preceding siblings ...)
2008-09-04 17:44 ` hjl dot tools at gmail dot com
@ 2008-09-04 17:55 ` hjl dot tools at gmail dot com
2008-09-04 18:40 ` [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code hjl dot tools at gmail dot com
` (31 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 17:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from hjl dot tools at gmail dot com 2008-09-04 17:54 -------
(In reply to comment #3)
> The problem may be in IRA_COVER_CLASSES. -mtune=core2 turns on
> TARGET_INTER_UNIT_MOVES, which means move between mmx and 64bit
> integer registers is cheaper than load/store. But IRA doesn't
> handle it properly.
>
Is there a way to tell IRA that moving between 2 register
classes is cheaper than load/store?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (4 preceding siblings ...)
2008-09-04 17:55 ` hjl dot tools at gmail dot com
@ 2008-09-04 18:40 ` hjl dot tools at gmail dot com
2008-09-04 19:26 ` vmakarov at redhat dot com
` (30 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 18:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from hjl dot tools at gmail dot com 2008-09-04 18:39 -------
Just for the record, move between MMX and GPR isn't a major issue
since we prefer SSE anyway. If it is the only inter class register
move problem for IRA, I am OK to close it as WONTFIX.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (5 preceding siblings ...)
2008-09-04 18:40 ` [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code hjl dot tools at gmail dot com
@ 2008-09-04 19:26 ` vmakarov at redhat dot com
2008-09-04 19:46 ` hjl dot tools at gmail dot com
` (29 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: vmakarov at redhat dot com @ 2008-09-04 19:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from vmakarov at redhat dot com 2008-09-04 19:25 -------
First of all, I've check the generated code on Core2 and I found it is not
slower than using movd.
IRA assigns hard registers calculating their costs. It the memory is
cheaper, it assigns memory. The first decision point to use memory or hard
register is made in ira-costs.c. The second decision point is ira-color
because the cost can changed dynamically (e.g. cheap hard register are not
available).
In this case IRA decides to use memory instead of MMX reg in ira-cost.c
a0(r61,l0) costs: AREG:25000,25000 DREG:25000,25000 CREG:25000,25000
BREG:25000,25000 SIREG:25000,25000 DIREG:25000,25000 AD_REGS:25000,25000
Q_REGS:25000,25000 NON_Q_REGS:25000,25000 LEGACY_REGS:25000,25000
GENERAL_REGS:25000,25000 SSE_FIRST_REG:42000,42000 SSE_REGS:42000,42000
MMX_REGS:25000,25000 MEM:23000
The memory is cheaper than MMX_REG therefore r61 gets NO_REGS as cover class
which means using memory.
The reason for this is in insn
(insn:HI 14 8 20 2
/home/cygnus/vmakarov/build/ira-merge-branch/gcc/gcc/testsuite/gcc.target/i386/pr34256.c:12
(set (reg/i:DI 0 ax)
(subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
(reg:V2SI 61)
(nil)))
which has the following description
(define_insn "*movdi_1_rex64"
[(set (match_operand:DI 0 "nonimmediate_operand"
"=r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
(match_operand:DI 1 "general_operand"
"Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m ,*Ym,*x"))]
"TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
Please, look at the 8th alternatives ?r *Ym which corresponds to GENERAL_REGS
MMX_REGS. ? makes the alternative expensive. * is even worse because it
excludes the alternative from register cost calculation.
The old register allocator would behave the same way if regmove did not
coalesced r61 with another pseudo and the result pseudo had not MMX cost
cheaper than memory.
There are several ways to fix the problem:
o ignore * and ? in the cost calculation
o fix the pattern
o run regmove as the old RA
o make failure expected
The first solution would result in huge performance regression for practically
any program.
I can not say about the 2nd solution because I am not the port maintainer.
The third solution is bad because in general case IRA does coalescing more
smart on the fly besides it could make RA even slower.
So I'd prefer the last solution.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (6 preceding siblings ...)
2008-09-04 19:26 ` vmakarov at redhat dot com
@ 2008-09-04 19:46 ` hjl dot tools at gmail dot com
2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
` (28 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 19:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from hjl dot tools at gmail dot com 2008-09-04 19:45 -------
(In reply to comment #6)
> First of all, I've check the generated code on Core2 and I found it is not
> slower than using movd.
I think MMX may be slow anyway.
> The reason for this is in insn
>
> (insn:HI 14 8 20 2
> /home/cygnus/vmakarov/build/ira-merge-branch/gcc/gcc/testsuite/gcc.target/i386/pr34256.c:12
> (set (reg/i:DI 0 ax)
> (subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
> (reg:V2SI 61)
> (nil)))
>
> which has the following description
>
> (define_insn "*movdi_1_rex64"
> [(set (match_operand:DI 0 "nonimmediate_operand"
> "=r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
> (match_operand:DI 1 "general_operand"
> "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m ,*Ym,*x"))]
> "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
>
> Please, look at the 8th alternatives ?r *Ym which corresponds to GENERAL_REGS
> MMX_REGS. ? makes the alternative expensive. * is even worse because it
> excludes the alternative from register cost calculation.
You are right.
> The old register allocator would behave the same way if regmove did not
> coalesced r61 with another pseudo and the result pseudo had not MMX cost
> cheaper than memory.
>
> There are several ways to fix the problem:
>
> o ignore * and ? in the cost calculation
> o fix the pattern
> o run regmove as the old RA
> o make failure expected
>
> So I'd prefer the last solution.
I agree, given that it only affects MMX. Uros, what do you think?
Should we mark gcc.target/i386/pr34256.c as XFAIL?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (7 preceding siblings ...)
2008-09-04 19:46 ` hjl dot tools at gmail dot com
@ 2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
2008-09-04 20:18 ` hjl dot tools at gmail dot com
` (27 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-04 20:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rguenth at gcc dot gnu dot org 2008-09-04 20:12 -------
If we decide the test is bogus we should remove it, not xfail it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (8 preceding siblings ...)
2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
@ 2008-09-04 20:18 ` hjl dot tools at gmail dot com
2008-09-04 20:32 ` hjl dot tools at gmail dot com
` (26 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 20:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from hjl dot tools at gmail dot com 2008-09-04 20:17 -------
I am concerned about those "*Yi"/"*Ym" and "r" pairs:
[hjl@gnu-6 i386]$ grep "\*Yi" *.md
i386.md: "=r,m ,*y,*y,?rm,?*y,*x,*x,?r ,m ,?*Yi,*x")
i386.md: "g ,ri,C ,*y,*y ,rm ,C ,*x,*Yi,*x,r ,m "))]
i386.md: "=r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r
,m,?*Yi,*x,?*x,?*Ym")
i386.md: "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m
,*Ym,*x"))]
i386.md: [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,?r,?o,?*Ym,?*y,?*Yi,*Y2")
i386.md: [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,o,?*Ym,?*y,?*Yi,*Y2")
[hjl@gnu-6 i386]$ grep "\*Ym" *.md
i386.md: "=r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r
,m,?*Yi,*x,?*x,?*Ym")
i386.md: "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m
,*Ym,*x"))]
i386.md: "=f,m,f,r ,m ,x,x,x ,m,!*y,!m,!*y,?Yi,?r,!*Ym,!r")
i386.md: "fm,f,G,rmF,Fr,C,x,xm,x,m ,*y,*y ,r ,Yi,r ,*Ym"))]
i386.md: [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,?r,?o,?*Ym,?*y,?*Yi,*Y2")
i386.md: [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,o,?*Ym,?*y,?*Yi,*Y2")
[hjl@gnu-6 i386]$
IRA is much more sensitive to "*" constraint. We can ignore "*Ym"/"r" pairs.
But should we remove * from "*Yi"/"r" pairs?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (9 preceding siblings ...)
2008-09-04 20:18 ` hjl dot tools at gmail dot com
@ 2008-09-04 20:32 ` hjl dot tools at gmail dot com
2008-09-06 15:58 ` [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass ubizjak at gmail dot com
` (25 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 20:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from hjl dot tools at gmail dot com 2008-09-04 20:30 -------
Created an attachment (id=16224)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16224&action=view)
A patch
I am testing this patch
2008-09-04 H.J. Lu <hongjiu.lu@intel.com>
PR target/37364
* config/i386/i386.md (*movsi_1): Remove * from *Yi.
(*movdi_1_rex64): Remove * from pairs of r and *Ym/*Yi.
(zero_extendsidi2_32): Likewise.
(zero_extendsidi2_rex64): Likewise.
(*movsf_1): Remove * from *Ym.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (10 preceding siblings ...)
2008-09-04 20:32 ` hjl dot tools at gmail dot com
@ 2008-09-06 15:58 ` ubizjak at gmail dot com
2008-09-06 16:26 ` ubizjak at gmail dot com
` (24 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 15:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from ubizjak at gmail dot com 2008-09-06 15:57 -------
(In reply to comment #9)
> I am concerned about those "*Yi"/"*Ym" and "r" pairs:
> IRA is much more sensitive to "*" constraint. We can ignore "*Ym"/"r" pairs.
> But should we remove * from "*Yi"/"r" pairs?
This approach was used to keep register allocator away from moving value from
integer ("r") register to non-integer register that can otherwise hold SImode
or DImode value. This happened when there was a shortage of "r" registers.
This delicate balance was achieved by using "?" and "*" for MMX and SSE
registers in various move patterns. I'm afraid that removing these decorations
could (in the worst case) lead to MMX instructions that lock out x87 registers.
IMO, running a simple regmove pass avoids this situation, since this pass will
just connect two already used registers together. Due to "*" decoration,
allocator won't allocate the register in the move pattern, no matter how hard
the register pressure is.
So, due to this, I vote to bring back regmove pass to fix this issue
independently of RA. Maybe this pass can even be enhanced a bit to fix PR
19389?
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2008-09-06 15:57:14
date| |
Summary|[4.4 Regression] IRA |[4.4 Regression] IRA
|generates inefficient code |generates inefficient code
| |due to missing regmove pass
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (11 preceding siblings ...)
2008-09-06 15:58 ` [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass ubizjak at gmail dot com
@ 2008-09-06 16:26 ` ubizjak at gmail dot com
2008-09-06 16:50 ` hjl dot tools at gmail dot com
` (23 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 16:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from ubizjak at gmail dot com 2008-09-06 16:25 -------
(In reply to comment #11)
> of RA. Maybe this pass can even be enhanced a bit to fix PR 19389?
Eh, PR 19398.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (12 preceding siblings ...)
2008-09-06 16:26 ` ubizjak at gmail dot com
@ 2008-09-06 16:50 ` hjl dot tools at gmail dot com
2008-09-09 16:02 ` hjl dot tools at gmail dot com
` (22 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-06 16:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from hjl dot tools at gmail dot com 2008-09-06 16:49 -------
Vladimir, I will re-enable regmove on ira-merge branch to see its
impact on compile time as well as SPEC CPU 2K/2006.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (13 preceding siblings ...)
2008-09-06 16:50 ` hjl dot tools at gmail dot com
@ 2008-09-09 16:02 ` hjl dot tools at gmail dot com
2008-09-11 13:47 ` bonzini at gnu dot org
` (21 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-09 16:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from hjl dot tools at gmail dot com 2008-09-09 16:01 -------
I re-enabled the regmove pass on ira-merge branch at revision 140065
and ran SPEC CPU 2K/2006 with -O2 -msse2 -mfpmath=sse -ffast-math for
both 32bit and 64bit on Intel Core 2. Here are the performance impacts
of the regmove pass:
32bit:
164.gzip -0.597213%
175.vpr 0.42898%
176.gcc 2.66576%
181.mcf -0.913938%
186.crafty -0.514019%
197.parser 0.185644%
252.eon 1.10181%
253.perlbmk -2.17957%
254.gap -0.993521%
255.vortex 0.209644%
256.bzip2 -0.294551%
300.twolf 0.131363%
SPECint_base2000 -0.0744109%
168.wupwise 0.45045%
171.swim 0.653595%
172.mgrid 4.23729%
173.applu 1.31265%
177.mesa -2.31596%
178.galgel -0.410633%
179.art 0.504994%
183.equake -0.744153%
187.facerec 4.28781%
188.ammp 1.28707%
189.lucas 13.7548%
191.fma3d -0.53528%
200.sixtrack 1.49551%
301.apsi 0.27088%
SPECfp_base2000 1.66845%
400.perlbench 1.52284%
401.bzip2 -1.36986%
403.gcc 0.497512%
429.mcf 0%
445.gobmk -1.08696%
456.hmmer 0%
458.sjeng 0%
462.libquantum -0.490196%
464.h264ref -20.1581%
471.omnetpp 0.689655%
473.astar 0.826446%
483.xalancbmk 0%
SPECint(R)_base2006 -2.1978%
410.bwaves 0%
416.gamess 1.37931%
433.milc 0%
434.zeusmp -3.0303%
435.gromacs -6.19469%
436.cactusADM -8.57143%
437.leslie3d 0%
444.namd 1.5625%
447.dealII -0.446429%
450.soplex 0%
453.povray 1.26582%
454.calculix 1.59884%
459.GemsFDTD 1.76991%
465.tonto -0.793651%
470.lbm -5.64103%
481.wrf 0.917431%
482.sphinx3 4.69484%
SPECfp(R)_base2006 -1.41844%
64bit:
164.gzip -0.124611%
175.vpr -0.145208%
176.gcc 0.172831%
181.mcf 0.143833%
186.crafty -0.934579%
197.parser 0.122549%
252.eon -0.465839%
253.perlbmk 0.565348%
254.gap -0.158983%
255.vortex 0.0648298%
256.bzip2 0.0863931%
300.twolf 0.346566%
SPECint_base2000 -0.0274812%
68.wupwise 1.79766%
171.swim 0.389105%
172.mgrid -1.01574%
173.applu 4.83871%
177.mesa -0.634026%
178.galgel 0.239704%
179.art 0.71311%
183.equake 1.14286%
187.facerec 2.05011%
188.ammp -1.99076%
189.lucas 2.137%
191.fma3d -0.20816%
200.sixtrack 0.303951%
301.apsi 0.0847099%
SPECfp_base2000 0.690346%
400.perlbench -0.440529%
401.bzip2 -0.564972%
403.gcc -0.5%
429.mcf -0.510204%
445.gobmk -1.05263%
456.hmmer 2.7027%
458.sjeng 1.5544%
462.libquantum 0.45045%
464.h264ref 0.374532%
471.omnetpp -0.70922%
473.astar 0%
483.xalancbmk 0.507614%
SPECint(R)_base2006 0%
410.bwaves 0%
416.gamess -0.568182%
433.milc 1.45985%
434.zeusmp 0%
435.gromacs 3.22581%
436.cactusADM -6.66667%
437.leslie3d 0%
444.namd 0%
447.dealII 1.13636%
450.soplex 0.546448%
453.povray -0.900901%
454.calculix 1.98598%
459.GemsFDTD 2.27273%
465.tonto 0%
470.lbm -2.06186%
481.wrf 0.8%
482.sphinx3 0.881057%
SPECfp(R)_base2006 0%
At -O2, the impact of the regmove pass is mixed. However,
for -mtune=core2 is used, the results could be different.
See PR 37437.
--
hjl dot tools at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |37437
nThis| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (14 preceding siblings ...)
2008-09-09 16:02 ` hjl dot tools at gmail dot com
@ 2008-09-11 13:47 ` bonzini at gnu dot org
2008-09-11 17:34 ` ubizjak at gmail dot com
` (20 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-09-11 13:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from bonzini at gnu dot org 2008-09-11 13:46 -------
Uros, does your comment #11 apply also to SSE registers (Yi), which could
alleviate for PR 37437, or only to MMX (Ym)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (15 preceding siblings ...)
2008-09-11 13:47 ` bonzini at gnu dot org
@ 2008-09-11 17:34 ` ubizjak at gmail dot com
2008-10-22 3:19 ` mmitchel at gcc dot gnu dot org
` (19 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-11 17:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from ubizjak at gmail dot com 2008-09-11 17:33 -------
(In reply to comment #15)
> Uros, does your comment #11 apply also to SSE registers (Yi), which could
> alleviate for PR 37437, or only to MMX (Ym)?
Comment #11 is also true for Yi SSE registers (for TARGET_INTER_UNIT_MOVES
targets, i.e. -march=core2). Under integer register shortage, allocator will
start to allocate SSE registers, and reload will later generate various
xmm->reg moves to satisfy subsequent instruction constraints.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (16 preceding siblings ...)
2008-09-11 17:34 ` ubizjak at gmail dot com
@ 2008-10-22 3:19 ` mmitchel at gcc dot gnu dot org
2008-10-23 8:44 ` Joey dot ye at intel dot com
` (18 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2008-10-22 3:19 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (17 preceding siblings ...)
2008-10-22 3:19 ` mmitchel at gcc dot gnu dot org
@ 2008-10-23 8:44 ` Joey dot ye at intel dot com
2008-10-24 8:37 ` Joey dot ye at intel dot com
` (17 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-23 8:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from Joey dot ye at intel dot com 2008-10-23 08:42 -------
CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on
Core2 ix86:
IRA IRA_core2 NO_IRA_core2
454.calculix 1.00 0.90 1.01
Revision: trunk 140514
Options in detail:
IRA= -m32 -O2 -mssse3 -mfpmath=sse
IRA_core2= $IRA -mtune=core2
NO_IRA_core2= $IRA_core2 -fno-ira
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (18 preceding siblings ...)
2008-10-23 8:44 ` Joey dot ye at intel dot com
@ 2008-10-24 8:37 ` Joey dot ye at intel dot com
2008-10-24 10:12 ` bonzini at gnu dot org
` (16 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-24 8:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from Joey dot ye at intel dot com 2008-10-24 08:36 -------
Created an attachment (id=16536)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536&action=view)
Reduced performance case from cpu2006/454.calculix
50% regression with IRA core2 on trunk revsion 140514 and 141335
$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20081024 (experimental) [trunk revision 141335] (GCC)
$ gcc -m32 -O2 -mssse3 -mfpmath=sse 36.c
$ time -p ./a.out
real 7.97
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -o core2.exe 36.c
$ time -p ./core2.exe
real 12.27
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -fno-ira -o no-ira.exe 36.c
$ time -p ./no-ira.exe
real 8.03
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (19 preceding siblings ...)
2008-10-24 8:37 ` Joey dot ye at intel dot com
@ 2008-10-24 10:12 ` bonzini at gnu dot org
2008-10-24 10:17 ` bonzini at gnu dot org
` (15 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-10-24 10:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from bonzini at gnu dot org 2008-10-24 10:11 -------
Left = old, right = IRA.
It seems to me that the better register allocation of IRA gives the
post-regalloc scheduling pass much less freedom.
Intel guys, could you run SPEC with -O2 -fschedule-insns and -O2, both of them
using IRA?
.L4:
movsd (%esi,%eax,8), %xmm3 movl 12(%ebp), %edx
movsd (%ebx,%eax,8), %xmm4 movsd (%edx,%eax,8),
%xmm7
movsd (%ecx,%eax,8), %xmm6 movl -44(%ebp), %edx
movl 12(%ebp), %edx movsd %xmm7,
-40(%ebp)
movsd (%edx,%eax,8), %xmm1 movsd (%edx,%eax,8),
%xmm7
movl 16(%ebp), %edx movsd %xmm7,
-56(%ebp)
movapd %xmm1, %xmm0 movsd -40(%ebp),
%xmm7
movsd (%edx,%eax,8), %xmm2 mulsd (%ebx,%eax,8),
%xmm7
mulsd %xmm3, %xmm0 addsd %xmm7, %xmm6
movl 20(%ebp), %edx movsd -40(%ebp),
%xmm7
addsd -80(%ebp), %xmm0 mulsd (%esi,%eax,8),
%xmm7
movsd (%edx,%eax,8), %xmm5 addsd %xmm7, %xmm5
movsd %xmm0, -80(%ebp) movsd -40(%ebp),
%xmm7
incl %eax mulsd (%edi,%eax,8),
%xmm7
movapd %xmm1, %xmm0 addsd %xmm7, %xmm4
cmpl %eax, %edi movsd -56(%ebp),
%xmm7
mulsd %xmm4, %xmm0 mulsd (%ebx,%eax,8),
%xmm7
mulsd %xmm6, %xmm1 addsd %xmm7, %xmm3
addsd -72(%ebp), %xmm0 movsd -56(%ebp),
%xmm7
addsd -64(%ebp), %xmm1 mulsd (%esi,%eax,8),
%xmm7
movsd %xmm0, -72(%ebp) addsd %xmm7, %xmm2
movsd %xmm1, -64(%ebp) movsd -56(%ebp),
%xmm7
movapd %xmm2, %xmm0 mulsd (%edi,%eax,8),
%xmm7
mulsd %xmm3, %xmm0 addsd %xmm7, %xmm1
mulsd %xmm5, %xmm3 movsd (%ecx,%eax,8),
%xmm7
addsd -56(%ebp), %xmm0 mulsd (%ebx,%eax,8),
%xmm7
addsd -32(%ebp), %xmm3 addsd -32(%ebp),
%xmm7
movsd %xmm0, -56(%ebp) movsd %xmm7,
-32(%ebp)
movsd %xmm3, -32(%ebp) movsd (%ecx,%eax,8),
%xmm7
movapd %xmm2, %xmm0 mulsd (%esi,%eax,8),
%xmm7
mulsd %xmm6, %xmm2 addsd -24(%ebp),
%xmm7
mulsd %xmm4, %xmm0 movsd %xmm7,
-24(%ebp)
addsd -40(%ebp), %xmm2 movsd (%ecx,%eax,8),
%xmm7
mulsd %xmm5, %xmm4 mulsd (%edi,%eax,8),
%xmm7
addsd -48(%ebp), %xmm0 incl %eax
addsd -24(%ebp), %xmm4 addsd %xmm7, %xmm0
mulsd %xmm6, %xmm5 cmpl %eax, 8(%ebp)
movsd %xmm0, -48(%ebp) jg .L4
movsd %xmm2, -40(%ebp)
movsd %xmm4, -24(%ebp)
addsd %xmm5, %xmm7
jg .L4
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (20 preceding siblings ...)
2008-10-24 10:12 ` bonzini at gnu dot org
@ 2008-10-24 10:17 ` bonzini at gnu dot org
2008-10-25 4:15 ` Joey dot ye at intel dot com
` (14 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-10-24 10:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from bonzini at gnu dot org 2008-10-24 10:16 -------
Why doesn't bugzilla have a preview? Run the previous comment through
sed -e '/, *$/\!b; N; y/\n/ /'
to get the side-by-side outputs.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (21 preceding siblings ...)
2008-10-24 10:17 ` bonzini at gnu dot org
@ 2008-10-25 4:15 ` Joey dot ye at intel dot com
2008-10-28 1:13 ` hjl dot tools at gmail dot com
` (13 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-25 4:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from Joey dot ye at intel dot com 2008-10-25 04:14 -------
To me scheduler is irrelevant here. GCC has no core2 pipeline description so
the instruction scheduling doesn't looks optimized. But for OOO processor like
core2, IMHO scheduling shouldn't make that much difference. Also core2 + no-ira
doesn't hurt, which means core2 scheduling is not the root cause.
Instead old code uses different register for loading, but IRA code always uses
xmm7 as load target. Need to figure out two questions:
1. why instructions from core2+ira runs slower than ira?
2. why core2+ira generate so different code as non-core2?
Scheduler dump for core2:
;; insn code bb dep prio cost reservation
;; ---- ---- -- --- ---- ---- -----------
;; 108 47 4 0 0 0 nothing : 70 109 43
;; 43 102 4 1 0 0 nothing : 70 51 117 114 67 109
;; 109 47 4 2 0 0 nothing : 70 44
;; 44 102 4 1 0 0 nothing : 70 57 55 59 67
;; 45 102 4 0 0 0 nothing : 70 65 67 112 110
;; 46 102 4 0 0 0 nothing : 70 55 49 67 61
;; 110 102 4 1 0 0 nothing : 70 65 61
;; 61 720 4 2 0 0 nothing : 70 55 62
;; 62 720 4 1 0 0 nothing : 70 47 111
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (22 preceding siblings ...)
2008-10-25 4:15 ` Joey dot ye at intel dot com
@ 2008-10-28 1:13 ` hjl dot tools at gmail dot com
2008-10-28 1:21 ` Joey dot ye at intel dot com
` (12 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-10-28 1:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from hjl dot tools at gmail dot com 2008-10-28 01:12 -------
Created an attachment (id=16571)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view)
A patch to re-enable regmove
After applying this patch to re-enable regmove, I got
[hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
[hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o ira -m32
[hjl@gnu-6 gcc]$ time ./ira
real 0m7.307s
user 0m7.305s
sys 0m0.001s
[hjl@gnu-6 gcc]$ time ./noira
real 0m7.478s
user 0m7.474s
sys 0m0.002s
[hjl@gnu-6 gcc]$
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (23 preceding siblings ...)
2008-10-28 1:13 ` hjl dot tools at gmail dot com
@ 2008-10-28 1:21 ` Joey dot ye at intel dot com
2008-10-28 1:37 ` hjl dot tools at gmail dot com
` (11 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-28 1:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from Joey dot ye at intel dot com 2008-10-28 01:19 -------
(In reply to comment #22)
> Created an attachment (id=16571)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view) [edit]
> A patch to re-enable regmove
> After applying this patch to re-enable regmove, I got
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
HJ, is your foo.c the case attached in comment #18?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (24 preceding siblings ...)
2008-10-28 1:21 ` Joey dot ye at intel dot com
@ 2008-10-28 1:37 ` hjl dot tools at gmail dot com
2008-11-04 19:30 ` hjl dot tools at gmail dot com
` (10 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-10-28 1:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from hjl dot tools at gmail dot com 2008-10-28 01:36 -------
(In reply to comment #23)
> (In reply to comment #22)
> > Created an attachment (id=16571)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view) [edit]
> > A patch to re-enable regmove
> > After applying this patch to re-enable regmove, I got
> > [hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
> HJ, is your foo.c the case attached in comment #18?
>
Yes.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (25 preceding siblings ...)
2008-10-28 1:37 ` hjl dot tools at gmail dot com
@ 2008-11-04 19:30 ` hjl dot tools at gmail dot com
2008-11-30 20:46 ` steven at gcc dot gnu dot org
` (9 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-11-04 19:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from hjl dot tools at gmail dot com 2008-11-04 19:28 -------
This regression isn't specific to -mtune=core2. I saw 3% regression
with -O3 on Intel64. Enable the full regmove pass fixes the regression.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (26 preceding siblings ...)
2008-11-04 19:30 ` hjl dot tools at gmail dot com
@ 2008-11-30 20:46 ` steven at gcc dot gnu dot org
2008-11-30 20:53 ` hjl dot tools at gmail dot com
` (8 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 20:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from steven at gcc dot gnu dot org 2008-11-30 20:45 -------
Resurrecting regmove is not an option. Time is better spent on figuring out
what regmove does, that makes a difference, and see if IRA can be taught to do
the same.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (27 preceding siblings ...)
2008-11-30 20:46 ` steven at gcc dot gnu dot org
@ 2008-11-30 20:53 ` hjl dot tools at gmail dot com
2008-11-30 21:20 ` steven at gcc dot gnu dot org
` (7 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-11-30 20:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from hjl dot tools at gmail dot com 2008-11-30 20:52 -------
(In reply to comment #26)
> Resurrecting regmove is not an option. Time is better spent on figuring out
> what regmove does, that makes a difference, and see if IRA can be taught to do
> the same.
>
x86 has
(define_insn "*movdi_1_rex64"
[(set (match_operand:DI 0 "nonimmediate_operand"
"=r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
(match_operand:DI 1 "general_operand"
"Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m ,*Ym,*x"))]
"TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
The alternative with * is available for regmove, but not for
IRA/reload. I don't know how you can resolve it without a different
pass.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (28 preceding siblings ...)
2008-11-30 20:53 ` hjl dot tools at gmail dot com
@ 2008-11-30 21:20 ` steven at gcc dot gnu dot org
2008-11-30 21:34 ` steven at gcc dot gnu dot org
` (6 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 21:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from steven at gcc dot gnu dot org 2008-11-30 21:18 -------
You're not explaining what regmove does. What transformation do these
alternatives with "*" enable regmove to do?
I'm not saying that a separate pass is not an option. Perhaps a regmove-like
pass is necessary. In fact it probably is, I think even Vlad already
acknowledged so. But not regmove as-we-know-it, which is a friggin' mess that
ought to go away.
What we should figure out IMHO, is which transformation(s) it is (are) that
regmove does, which are helpful here. Maybe we can add a simple pre-regalloc
pass that does these transformations, or make it part of an existing pass,
using clean infrastructure (cfg, df) instead of the ad-hoc mess of regmove.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (29 preceding siblings ...)
2008-11-30 21:20 ` steven at gcc dot gnu dot org
@ 2008-11-30 21:34 ` steven at gcc dot gnu dot org
2008-12-01 18:27 ` ubizjak at gmail dot com
` (5 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 21:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from steven at gcc dot gnu dot org 2008-11-30 21:32 -------
The insns 8 in comment #0 show the regmove transformation that matters here:
With regmove disabled::
(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 61)
(plus:V2SI (reg:V2SI 63 [ x ])
(mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7f66abfb5c80 y>) [2
y
+0 S8 A64]))) 992 {*mmx_addv2si3} (expr_list:REG_DEAD (reg:V2SI 63 [ x ])
(nil)))
vs. with regmove enabled:
(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 63 [ x ])
(plus:V2SI (reg:V2SI 63 [ x ])
(mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7fd36e03cc80 y>) [2
y+0 S8 A64]))) 992 {*mmx_addv2si3} (nil))
This is a "textbook" example of a regmove transformation (where the use of the
word "textbook" is an insult to all textbooks ever written, since nothing in
regmove is "textbook", but oh well...). It's turned a 3-address instruction
into a 2-address instruction, and this is precisely the transformation that
regmove was originally written for.
In IRA, I would've expected reg 61 and reg 63 to be coalesced to give the same
result.
Vlad already commented on this in comment #6. It's clear from his description
why IRA did not coalesce reg 61 and reg 63.
Other compilers do this kind of transformation via reverse copy propagation.
GCC could perhaps add something like that too, when it transforms a 3-address
insn to a 2-address insn.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (30 preceding siblings ...)
2008-11-30 21:34 ` steven at gcc dot gnu dot org
@ 2008-12-01 18:27 ` ubizjak at gmail dot com
2008-12-21 18:06 ` hjl dot tools at gmail dot com
` (4 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-12-01 18:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from ubizjak at gmail dot com 2008-12-01 18:26 -------
(In reply to comment #29)
> Other compilers do this kind of transformation via reverse copy propagation.
> GCC could perhaps add something like that too, when it transforms a 3-address
> insn to a 2-address insn.
Will this also solve PR 19398, where we have:
fstps -4(%ebp)
(*) movss -4(%ebp), %xmm0
(*) cvttss2si %xmm0, %eax
leave
Note, that we could simply have:
cvttss2si -4(%ebp), %eax
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (31 preceding siblings ...)
2008-12-01 18:27 ` ubizjak at gmail dot com
@ 2008-12-21 18:06 ` hjl dot tools at gmail dot com
2009-01-29 17:13 ` hjl dot tools at gmail dot com
` (3 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-12-21 18:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from hjl dot tools at gmail dot com 2008-12-21 18:04 -------
*** Bug 38601 has been marked as a duplicate of this bug. ***
--
hjl dot tools at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tkoenig at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (32 preceding siblings ...)
2008-12-21 18:06 ` hjl dot tools at gmail dot com
@ 2009-01-29 17:13 ` hjl dot tools at gmail dot com
2009-01-29 17:57 ` hjl dot tools at gmail dot com
` (2 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-01-29 17:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from hjl dot tools at gmail dot com 2009-01-29 17:13 -------
This has been fixed by revision 143757:
http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01410.html
--
hjl dot tools at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (33 preceding siblings ...)
2009-01-29 17:13 ` hjl dot tools at gmail dot com
@ 2009-01-29 17:57 ` hjl dot tools at gmail dot com
2009-02-04 7:57 ` bonzini at gnu dot org
2009-02-04 7:58 ` bonzini at gnu dot org
36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-01-29 17:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from hjl dot tools at gmail dot com 2009-01-29 17:57 -------
Reopen since revision 143757 isn't supposed to fix it.
--
hjl dot tools at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (34 preceding siblings ...)
2009-01-29 17:57 ` hjl dot tools at gmail dot com
@ 2009-02-04 7:57 ` bonzini at gnu dot org
2009-02-04 7:58 ` bonzini at gnu dot org
36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2009-02-04 7:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from bonzini at gnu dot org 2009-02-04 07:57 -------
> Reopen since revision 143757 isn't supposed to fix it.
Who cares if 143757 "isn't supposed to fix it", it *is* fixed:
movq x(%rip), %mm0
paddd y(%rip), %mm0
movd %mm0, %rax
ret
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
2008-09-04 5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
` (35 preceding siblings ...)
2009-02-04 7:57 ` bonzini at gnu dot org
@ 2009-02-04 7:58 ` bonzini at gnu dot org
36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2009-02-04 7:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from bonzini at gnu dot org 2009-02-04 07:57 -------
(and all bugs depending on this one are also fixed)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
^ permalink raw reply [flat|nested] 38+ messages in thread