[Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/37364]  New: [4.4 Regression] IRA generates ineffient code
@ 2008-09-04  5:45 hjl dot tools at gmail dot com
  2008-09-04  9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
                   ` (36 more replies)
  0 siblings, 37 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04  5:45 UTC (permalink / raw)
  To: gcc-bugs

bash-3.2$ cat /tmp/x.c
#include <mmintrin.h>
extern __m64 x, y;
unsigned long long  foo(__m64 m) {
  return _mm_cvtm64_si64(_mm_add_pi32(x, y));
}
bash-3.2$ ./xgcc -B./ -march=core2 -S -O2 /tmp/x.c
bash-3.2$ cat x.s
        .file   "x.c"
        .text
        .p2align 4,,15
.globl foo
        .type   foo, @function
foo:
.LFB129:
        .cfi_startproc
        movq    x(%rip), %mm0
        paddd   y(%rip), %mm0
        movq    %mm0, -8(%rsp)
        movq    -8(%rsp), %rax
        ret
        .cfi_endproc
.LFE129:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.4.0 20080903 (experimental) [trunk revision
139952]"
        .section        .note.GNU-stack,"",@progbits
bash-3.2$ ./xgcc -B./ -march=core2 -S -O2 /tmp/x.c -fno-ira
bash-3.2$ cat x.s
        .file   "x.c"
        .text
        .p2align 4,,15
.globl foo
        .type   foo, @function
foo:
.LFB129:
        .cfi_startproc
        movq    x(%rip), %mm0
        paddd   y(%rip), %mm0
        movd    %mm0, %rax
        ret
        .cfi_endproc
.LFE129:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.4.0 20080903 (experimental) [trunk revision
139952]"
        .section        .note.GNU-stack,"",@progbits
bash-3.2$ 

The problem is IRA turns

(insn 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 61)
        (plus:V2SI (reg:V2SI 63 [ x ])
            (mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7fd36e03cc80 y>) [2 
y+0 S8 A64]))) 992 {*mmx_addv2si3} (expr_list:REG_DEAD (reg:V2SI 63 [ x ])
        (nil)))

(insn 14 8 20 2 /tmp/x.c:12 (set (reg/i:DI 0 ax)
        (subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
(re
g:V2SI 61)
        (nil)))

into

(insn 26 8 14 2 ../../include/mmintrin.h:300 (set (mem/c:V2SI (plus:DI
(reg/f:DI
 7 sp)
                (const_int -8 [0xfffffffffffffff8])) [3 S8 A64])
        (reg:V2SI 29 mm0 [orig:63 x ] [63])) 946 {*movv2si_internal_rex64}
(nil)
)

(insn:HI 14 26 20 2 /tmp/x.c:12 (set (reg/i:DI 0 ax)
        (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                (const_int -8 [0xfffffffffffffff8])) [3 S8 A64])) 89
{*movdi_1_r
ex64} (nil))

while the old RA generates

(insn:HI 14 8 20 2 /tmp/x.c:12 (set (reg/i:DI 0 ax)
        (reg:DI 29 mm0 [orig:63 x ] [63])) 89 {*movdi_1_rex64} (nil))

The outputs from regmove pass are different. With IRA, we got

(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 61)
        (plus:V2SI (reg:V2SI 63 [ x ])
            (mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7f66abfb5c80 y>) [2
y
+0 S8 A64]))) 992 {*mmx_addv2si3} (expr_list:REG_DEAD (reg:V2SI 63 [ x ])
        (nil)))

(insn:HI 14 8 20 2 /tmp/x.c:12 (set (reg/i:DI 0 ax)
        (subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
(re
g:V2SI 61)
        (nil)))

Without IRA, we got
(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 63 [ x ])
        (plus:V2SI (reg:V2SI 63 [ x ])
            (mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7fd36e03cc80 y>) [2 
y+0 S8 A64]))) 992 {*mmx_addv2si3} (nil))

(insn:HI 14 8 20 2 /tmp/x.c:12 (set (reg/i:DI 0 ax)
        (subreg:DI (reg:V2SI 63 [ x ]) 0)) 89 {*movdi_1_rex64} (expr_list:REG_D
EAD (reg:V2SI 63 [ x ])
        (nil)))

Does it have any impact on code generation?


-- 
           Summary: [4.4 Regression] IRA generates ineffient code
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: hjl dot tools at gmail dot com
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
@ 2008-09-04  9:50 ` rguenth at gcc dot gnu dot org
  2008-09-04 16:04 ` hjl dot tools at gmail dot com
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-04  9:50 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra
   Target Milestone|---                         |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
  2008-09-04  9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
@ 2008-09-04 16:04 ` hjl dot tools at gmail dot com
  2008-09-04 16:14 ` hjl dot tools at gmail dot com
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 16:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from hjl dot tools at gmail dot com  2008-09-04 16:02 -------
"-O2 -march=core2 -fno-ira -fno-regmove" generates

        movq    x(%rip), %mm0
        paddd   y(%rip), %mm0
        movq    %mm0, -8(%rsp)
        movq    -8(%rsp), %rax

It seems that regmove isn't effective for IRA.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
  2008-09-04  9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
  2008-09-04 16:04 ` hjl dot tools at gmail dot com
@ 2008-09-04 16:14 ` hjl dot tools at gmail dot com
  2008-09-04 17:44 ` hjl dot tools at gmail dot com
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 16:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from hjl dot tools at gmail dot com  2008-09-04 16:13 -------
(In reply to comment #1)
> "-O2 -march=core2 -fno-ira -fno-regmove" generates
> 
>         movq    x(%rip), %mm0
>         paddd   y(%rip), %mm0
>         movq    %mm0, -8(%rsp)
>         movq    -8(%rsp), %rax
> 
> It seems that regmove isn't effective for IRA.
> 

regmove is turned off for IRA. Revert the regmove.c change in
revision 139590

        * regmove.c (regmove_optimize): Don't do replacement of output for
        IRA.

fixes this regression. Vladimir, IRA should either deal with replacement
of output or let regmove handle it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (2 preceding siblings ...)
  2008-09-04 16:14 ` hjl dot tools at gmail dot com
@ 2008-09-04 17:44 ` hjl dot tools at gmail dot com
  2008-09-04 17:55 ` hjl dot tools at gmail dot com
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 17:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from hjl dot tools at gmail dot com  2008-09-04 17:43 -------
The problem may be in IRA_COVER_CLASSES. -mtune=core2 turns on
TARGET_INTER_UNIT_MOVES, which means move between mmx and 64bit
integer registers is cheaper than load/store. But IRA doesn't
handle it properly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates ineffient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (3 preceding siblings ...)
  2008-09-04 17:44 ` hjl dot tools at gmail dot com
@ 2008-09-04 17:55 ` hjl dot tools at gmail dot com
  2008-09-04 18:40 ` [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code hjl dot tools at gmail dot com
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 17:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from hjl dot tools at gmail dot com  2008-09-04 17:54 -------
(In reply to comment #3)
> The problem may be in IRA_COVER_CLASSES. -mtune=core2 turns on
> TARGET_INTER_UNIT_MOVES, which means move between mmx and 64bit
> integer registers is cheaper than load/store. But IRA doesn't
> handle it properly.
> 

Is there a way to tell IRA that moving between 2 register
classes is cheaper than load/store?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (4 preceding siblings ...)
  2008-09-04 17:55 ` hjl dot tools at gmail dot com
@ 2008-09-04 18:40 ` hjl dot tools at gmail dot com
  2008-09-04 19:26 ` vmakarov at redhat dot com
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 18:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from hjl dot tools at gmail dot com  2008-09-04 18:39 -------
Just for the record, move between MMX and GPR isn't a major issue
since we prefer SSE anyway. If it is the only inter class register
move problem for IRA, I am OK to close it as WONTFIX.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (5 preceding siblings ...)
  2008-09-04 18:40 ` [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code hjl dot tools at gmail dot com
@ 2008-09-04 19:26 ` vmakarov at redhat dot com
  2008-09-04 19:46 ` hjl dot tools at gmail dot com
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: vmakarov at redhat dot com @ 2008-09-04 19:26 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #6 from vmakarov at redhat dot com  2008-09-04 19:25 -------
  First of all, I've check the generated code on Core2 and I found it is not
slower than using movd.

  IRA assigns hard registers calculating their costs.  It the memory is
cheaper, it assigns memory.  The first decision point to use memory or hard
register is made in ira-costs.c.  The second decision point is ira-color
because the cost can changed dynamically (e.g. cheap hard register are not
available).

  In this case IRA decides to use memory instead of MMX reg in ira-cost.c

  a0(r61,l0) costs: AREG:25000,25000 DREG:25000,25000 CREG:25000,25000
BREG:25000,25000 SIREG:25000,25000 DIREG:25000,25000 AD_REGS:25000,25000
Q_REGS:25000,25000 NON_Q_REGS:25000,25000 LEGACY_REGS:25000,25000
GENERAL_REGS:25000,25000 SSE_FIRST_REG:42000,42000 SSE_REGS:42000,42000
MMX_REGS:25000,25000 MEM:23000

  The memory is cheaper than MMX_REG therefore r61 gets NO_REGS as cover class
which means using memory.

  The reason for this is in insn

(insn:HI 14 8 20 2
/home/cygnus/vmakarov/build/ira-merge-branch/gcc/gcc/testsuite/gcc.target/i386/pr34256.c:12
(set (reg/i:DI 0 ax)
        (subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
(reg:V2SI 61)
        (nil)))

which has the following description

(define_insn "*movdi_1_rex64"
  [(set (match_operand:DI 0 "nonimmediate_operand"
          "=r,r  ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
        (match_operand:DI 1 "general_operand"
          "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m ,*Ym,*x"))]
  "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"

Please, look at the 8th alternatives ?r *Ym which corresponds to GENERAL_REGS
MMX_REGS.  ? makes the alternative expensive. * is even worse because it
excludes the alternative from register cost calculation.

  The old register allocator would behave the same way if regmove did not
coalesced r61 with another pseudo and the result pseudo had not MMX cost
cheaper than memory.

  There are several ways to fix the problem:

o ignore * and ? in the cost calculation
o fix the pattern
o run regmove as the old RA
o make failure expected

The first solution would result in huge performance regression for practically
any program.

I can not say about the 2nd solution because I am not the port maintainer.

The third solution is bad because in general case IRA does coalescing more
smart on the fly besides it could make RA even slower.

So I'd prefer the last solution.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (6 preceding siblings ...)
  2008-09-04 19:26 ` vmakarov at redhat dot com
@ 2008-09-04 19:46 ` hjl dot tools at gmail dot com
  2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 19:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from hjl dot tools at gmail dot com  2008-09-04 19:45 -------
(In reply to comment #6)
>   First of all, I've check the generated code on Core2 and I found it is not
> slower than using movd.

I think MMX may be slow anyway.

>   The reason for this is in insn
> 
> (insn:HI 14 8 20 2
> /home/cygnus/vmakarov/build/ira-merge-branch/gcc/gcc/testsuite/gcc.target/i386/pr34256.c:12
> (set (reg/i:DI 0 ax)
>         (subreg:DI (reg:V2SI 61) 0)) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
> (reg:V2SI 61)
>         (nil)))
> 
> which has the following description
> 
> (define_insn "*movdi_1_rex64"
>   [(set (match_operand:DI 0 "nonimmediate_operand"
>           "=r,r  ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
>         (match_operand:DI 1 "general_operand"
>           "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m ,*Ym,*x"))]
>   "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> 
> Please, look at the 8th alternatives ?r *Ym which corresponds to GENERAL_REGS
> MMX_REGS.  ? makes the alternative expensive. * is even worse because it
> excludes the alternative from register cost calculation.

You are right.

>   The old register allocator would behave the same way if regmove did not
> coalesced r61 with another pseudo and the result pseudo had not MMX cost
> cheaper than memory.
> 
>   There are several ways to fix the problem:
> 
> o ignore * and ? in the cost calculation
> o fix the pattern
> o run regmove as the old RA
> o make failure expected
> 
> So I'd prefer the last solution.

I agree, given that it only affects MMX. Uros, what do you think?
Should we mark gcc.target/i386/pr34256.c as XFAIL?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (7 preceding siblings ...)
  2008-09-04 19:46 ` hjl dot tools at gmail dot com
@ 2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
  2008-09-04 20:18 ` hjl dot tools at gmail dot com
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-04 20:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2008-09-04 20:12 -------
If we decide the test is bogus we should remove it, not xfail it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (8 preceding siblings ...)
  2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
@ 2008-09-04 20:18 ` hjl dot tools at gmail dot com
  2008-09-04 20:32 ` hjl dot tools at gmail dot com
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 20:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from hjl dot tools at gmail dot com  2008-09-04 20:17 -------
I am concerned about those "*Yi"/"*Ym" and "r" pairs:

[hjl@gnu-6 i386]$ grep "\*Yi" *.md
i386.md:                        "=r,m ,*y,*y,?rm,?*y,*x,*x,?r ,m ,?*Yi,*x")
i386.md:                        "g ,ri,C ,*y,*y ,rm ,C ,*x,*Yi,*x,r   ,m "))]
i386.md:          "=r,r  ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r
,m,?*Yi,*x,?*x,?*Ym")
i386.md:          "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m
,*Ym,*x"))]
i386.md:  [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,?r,?o,?*Ym,?*y,?*Yi,*Y2")
i386.md:  [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,o,?*Ym,?*y,?*Yi,*Y2")
[hjl@gnu-6 i386]$ grep "\*Ym" *.md 
i386.md:          "=r,r  ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r
,m,?*Yi,*x,?*x,?*Ym")
i386.md:          "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m
,*Ym,*x"))]
i386.md:          "=f,m,f,r  ,m ,x,x,x ,m,!*y,!m,!*y,?Yi,?r,!*Ym,!r")
i386.md:          "fm,f,G,rmF,Fr,C,x,xm,x,m  ,*y,*y ,r  ,Yi,r   ,*Ym"))]
i386.md:  [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,?r,?o,?*Ym,?*y,?*Yi,*Y2")
i386.md:  [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,o,?*Ym,?*y,?*Yi,*Y2")
[hjl@gnu-6 i386]$ 

IRA is much more sensitive to "*" constraint. We can ignore "*Ym"/"r" pairs.
But should we remove * from "*Yi"/"r" pairs?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (9 preceding siblings ...)
  2008-09-04 20:18 ` hjl dot tools at gmail dot com
@ 2008-09-04 20:32 ` hjl dot tools at gmail dot com
  2008-09-06 15:58 ` [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass ubizjak at gmail dot com
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-04 20:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from hjl dot tools at gmail dot com  2008-09-04 20:30 -------
Created an attachment (id=16224)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16224&action=view)
A patch

I am testing this patch

2008-09-04  H.J. Lu  <hongjiu.lu@intel.com>

        PR target/37364
        * config/i386/i386.md (*movsi_1): Remove * from *Yi.
        (*movdi_1_rex64): Remove * from pairs of r and *Ym/*Yi.
        (zero_extendsidi2_32): Likewise.
        (zero_extendsidi2_rex64): Likewise.
        (*movsf_1): Remove * from *Ym.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (10 preceding siblings ...)
  2008-09-04 20:32 ` hjl dot tools at gmail dot com
@ 2008-09-06 15:58 ` ubizjak at gmail dot com
  2008-09-06 16:26 ` ubizjak at gmail dot com
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 15:58 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #11 from ubizjak at gmail dot com  2008-09-06 15:57 -------
(In reply to comment #9)
> I am concerned about those "*Yi"/"*Ym" and "r" pairs:

> IRA is much more sensitive to "*" constraint. We can ignore "*Ym"/"r" pairs.
> But should we remove * from "*Yi"/"r" pairs?

This approach was used to keep register allocator away from moving value from
integer ("r") register to non-integer register that can otherwise hold SImode
or DImode value. This happened when there was a shortage of "r" registers.

This delicate balance was achieved by using "?" and "*" for MMX and SSE
registers in various move patterns. I'm afraid that removing these decorations
could (in the worst case) lead to MMX instructions that lock out x87 registers.

IMO, running a simple regmove pass avoids this situation, since this pass will
just connect two already used registers together. Due to "*" decoration,
allocator won't allocate the register in the move pattern, no matter how hard
the register pressure is.

So, due to this, I vote to bring back regmove pass to fix this issue
independently of RA. Maybe this pass can even be enhanced a bit to fix PR
19389?

-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-09-06 15:57:14
               date|                            |
            Summary|[4.4 Regression] IRA        |[4.4 Regression] IRA
                   |generates inefficient code  |generates inefficient code
                   |                            |due to missing regmove pass

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (11 preceding siblings ...)
  2008-09-06 15:58 ` [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass ubizjak at gmail dot com
@ 2008-09-06 16:26 ` ubizjak at gmail dot com
  2008-09-06 16:50 ` hjl dot tools at gmail dot com
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 16:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from ubizjak at gmail dot com  2008-09-06 16:25 -------
(In reply to comment #11)

> of RA. Maybe this pass can even be enhanced a bit to fix PR 19389?

Eh, PR 19398.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (12 preceding siblings ...)
  2008-09-06 16:26 ` ubizjak at gmail dot com
@ 2008-09-06 16:50 ` hjl dot tools at gmail dot com
  2008-09-09 16:02 ` hjl dot tools at gmail dot com
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-06 16:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from hjl dot tools at gmail dot com  2008-09-06 16:49 -------
Vladimir, I will re-enable regmove on ira-merge branch to see its
impact on compile time as well as SPEC CPU 2K/2006.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (13 preceding siblings ...)
  2008-09-06 16:50 ` hjl dot tools at gmail dot com
@ 2008-09-09 16:02 ` hjl dot tools at gmail dot com
  2008-09-11 13:47 ` bonzini at gnu dot org
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-09-09 16:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from hjl dot tools at gmail dot com  2008-09-09 16:01 -------
I re-enabled the regmove pass on ira-merge branch at revision 140065
and ran SPEC CPU 2K/2006 with -O2 -msse2 -mfpmath=sse -ffast-math for
both 32bit and 64bit on Intel Core 2. Here are the performance impacts
of the regmove pass:

32bit:

164.gzip                         -0.597213%
175.vpr                          0.42898%
176.gcc                          2.66576%
181.mcf                          -0.913938%
186.crafty                       -0.514019%
197.parser                       0.185644%
252.eon                          1.10181%
253.perlbmk                      -2.17957%
254.gap                          -0.993521%
255.vortex                       0.209644%
256.bzip2                        -0.294551%
300.twolf                        0.131363%
SPECint_base2000                         -0.0744109%

168.wupwise                      0.45045%
171.swim                         0.653595%
172.mgrid                        4.23729%
173.applu                        1.31265%
177.mesa                         -2.31596%
178.galgel                       -0.410633%
179.art                          0.504994%
183.equake                       -0.744153%
187.facerec                      4.28781%
188.ammp                         1.28707%
189.lucas                        13.7548%
191.fma3d                        -0.53528%
200.sixtrack                     1.49551%
301.apsi                         0.27088%
SPECfp_base2000                          1.66845%

400.perlbench                    1.52284%
401.bzip2                        -1.36986%
403.gcc                          0.497512%
429.mcf                          0%
445.gobmk                        -1.08696%
456.hmmer                        0%
458.sjeng                        0%
462.libquantum                   -0.490196%
464.h264ref                      -20.1581%
471.omnetpp                      0.689655%
473.astar                        0.826446%
483.xalancbmk                    0%
SPECint(R)_base2006                      -2.1978%

410.bwaves                       0%
416.gamess                       1.37931%
433.milc                         0%
434.zeusmp                       -3.0303%
435.gromacs                      -6.19469%
436.cactusADM                    -8.57143%
437.leslie3d                     0%
444.namd                         1.5625%
447.dealII                       -0.446429%
450.soplex                       0%
453.povray                       1.26582%
454.calculix                     1.59884%
459.GemsFDTD                     1.76991%
465.tonto                        -0.793651%
470.lbm                          -5.64103%
481.wrf                          0.917431%
482.sphinx3                      4.69484%
SPECfp(R)_base2006                       -1.41844%

64bit:

164.gzip                         -0.124611%
175.vpr                          -0.145208%
176.gcc                          0.172831%
181.mcf                          0.143833%
186.crafty                       -0.934579%
197.parser                       0.122549%
252.eon                          -0.465839%
253.perlbmk                      0.565348%
254.gap                          -0.158983%
255.vortex                       0.0648298%
256.bzip2                        0.0863931%
300.twolf                        0.346566%
SPECint_base2000                         -0.0274812%

68.wupwise                       1.79766%
171.swim                         0.389105%
172.mgrid                        -1.01574%
173.applu                        4.83871%
177.mesa                         -0.634026%
178.galgel                       0.239704%
179.art                          0.71311%
183.equake                       1.14286%
187.facerec                      2.05011%
188.ammp                         -1.99076%
189.lucas                        2.137%
191.fma3d                        -0.20816%
200.sixtrack                     0.303951%
301.apsi                         0.0847099%
SPECfp_base2000                          0.690346%

400.perlbench                    -0.440529%
401.bzip2                        -0.564972%
403.gcc                          -0.5%
429.mcf                          -0.510204%
445.gobmk                        -1.05263%
456.hmmer                        2.7027%
458.sjeng                        1.5544%
462.libquantum                   0.45045%
464.h264ref                      0.374532%
471.omnetpp                      -0.70922%
473.astar                        0%
483.xalancbmk                    0.507614%
SPECint(R)_base2006                      0%

410.bwaves                       0%
416.gamess                       -0.568182%
433.milc                         1.45985%
434.zeusmp                       0%
435.gromacs                      3.22581%
436.cactusADM                    -6.66667%
437.leslie3d                     0%
444.namd                         0%
447.dealII                       1.13636%
450.soplex                       0.546448%
453.povray                       -0.900901%
454.calculix                     1.98598%
459.GemsFDTD                     2.27273%
465.tonto                        0%
470.lbm                          -2.06186%
481.wrf                          0.8%
482.sphinx3                      0.881057%
SPECfp(R)_base2006                       0%

At -O2, the impact of the regmove pass is mixed. However,
for -mtune=core2 is used, the results could be different.
See PR 37437.


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |37437
              nThis|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (14 preceding siblings ...)
  2008-09-09 16:02 ` hjl dot tools at gmail dot com
@ 2008-09-11 13:47 ` bonzini at gnu dot org
  2008-09-11 17:34 ` ubizjak at gmail dot com
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-09-11 13:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from bonzini at gnu dot org  2008-09-11 13:46 -------
Uros, does your comment #11 apply also to SSE registers (Yi), which could
alleviate for PR 37437, or only to MMX (Ym)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (15 preceding siblings ...)
  2008-09-11 13:47 ` bonzini at gnu dot org
@ 2008-09-11 17:34 ` ubizjak at gmail dot com
  2008-10-22  3:19 ` mmitchel at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-11 17:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from ubizjak at gmail dot com  2008-09-11 17:33 -------
(In reply to comment #15)
> Uros, does your comment #11 apply also to SSE registers (Yi), which could
> alleviate for PR 37437, or only to MMX (Ym)?

Comment #11 is also true for Yi SSE registers (for TARGET_INTER_UNIT_MOVES
targets, i.e. -march=core2). Under integer register shortage, allocator will
start to allocate SSE registers, and reload will later generate various
xmm->reg moves to satisfy subsequent instruction constraints.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (16 preceding siblings ...)
  2008-09-11 17:34 ` ubizjak at gmail dot com
@ 2008-10-22  3:19 ` mmitchel at gcc dot gnu dot org
  2008-10-23  8:44 ` Joey dot ye at intel dot com
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2008-10-22  3:19 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (17 preceding siblings ...)
  2008-10-22  3:19 ` mmitchel at gcc dot gnu dot org
@ 2008-10-23  8:44 ` Joey dot ye at intel dot com
  2008-10-24  8:37 ` Joey dot ye at intel dot com
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-23  8:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from Joey dot ye at intel dot com  2008-10-23 08:42 -------
CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on
Core2 ix86:
                 IRA    IRA_core2   NO_IRA_core2
454.calculix     1.00   0.90        1.01

Revision: trunk 140514

Options in detail:
IRA= -m32 -O2 -mssse3 -mfpmath=sse
IRA_core2= $IRA -mtune=core2
NO_IRA_core2= $IRA_core2 -fno-ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (18 preceding siblings ...)
  2008-10-23  8:44 ` Joey dot ye at intel dot com
@ 2008-10-24  8:37 ` Joey dot ye at intel dot com
  2008-10-24 10:12 ` bonzini at gnu dot org
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-24  8:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from Joey dot ye at intel dot com  2008-10-24 08:36 -------
Created an attachment (id=16536)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536&action=view)
Reduced performance case from cpu2006/454.calculix

50% regression with IRA core2 on trunk revsion 140514 and 141335

$ gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --disable-bootstrap
--enable-languages=c,c++,fortran --enable-checking=assert
Thread model: posix
gcc version 4.4.0 20081024 (experimental) [trunk revision 141335] (GCC) 
$ gcc -m32 -O2 -mssse3 -mfpmath=sse 36.c
$ time -p ./a.out
real 7.97
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -o core2.exe 36.c
$ time -p ./core2.exe
real 12.27
$ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -fno-ira -o no-ira.exe 36.c
$ time -p ./no-ira.exe
real 8.03


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (19 preceding siblings ...)
  2008-10-24  8:37 ` Joey dot ye at intel dot com
@ 2008-10-24 10:12 ` bonzini at gnu dot org
  2008-10-24 10:17 ` bonzini at gnu dot org
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-10-24 10:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from bonzini at gnu dot org  2008-10-24 10:11 -------
Left = old, right = IRA.

It seems to me that the better register allocation of IRA gives the
post-regalloc scheduling pass much less freedom.

Intel guys, could you run SPEC with -O2 -fschedule-insns and -O2, both of them
using IRA?

.L4:
        movsd   (%esi,%eax,8), %xmm3                    movl    12(%ebp), %edx
        movsd   (%ebx,%eax,8), %xmm4                    movsd   (%edx,%eax,8),
%xmm7
        movsd   (%ecx,%eax,8), %xmm6                    movl    -44(%ebp), %edx
        movl    12(%ebp), %edx                          movsd   %xmm7,
-40(%ebp)
        movsd   (%edx,%eax,8), %xmm1                    movsd   (%edx,%eax,8),
%xmm7
        movl    16(%ebp), %edx                          movsd   %xmm7,
-56(%ebp)
        movapd  %xmm1, %xmm0                            movsd   -40(%ebp),
%xmm7
        movsd   (%edx,%eax,8), %xmm2                    mulsd   (%ebx,%eax,8),
%xmm7
        mulsd   %xmm3, %xmm0                            addsd   %xmm7, %xmm6
        movl    20(%ebp), %edx                          movsd   -40(%ebp),
%xmm7
        addsd   -80(%ebp), %xmm0                        mulsd   (%esi,%eax,8),
%xmm7
        movsd   (%edx,%eax,8), %xmm5                    addsd   %xmm7, %xmm5
        movsd   %xmm0, -80(%ebp)                        movsd   -40(%ebp),
%xmm7
        incl    %eax                                    mulsd   (%edi,%eax,8),
%xmm7
        movapd  %xmm1, %xmm0                            addsd   %xmm7, %xmm4
        cmpl    %eax, %edi                              movsd   -56(%ebp),
%xmm7
        mulsd   %xmm4, %xmm0                            mulsd   (%ebx,%eax,8),
%xmm7
        mulsd   %xmm6, %xmm1                            addsd   %xmm7, %xmm3
        addsd   -72(%ebp), %xmm0                        movsd   -56(%ebp),
%xmm7
        addsd   -64(%ebp), %xmm1                        mulsd   (%esi,%eax,8),
%xmm7
        movsd   %xmm0, -72(%ebp)                        addsd   %xmm7, %xmm2
        movsd   %xmm1, -64(%ebp)                        movsd   -56(%ebp),
%xmm7
        movapd  %xmm2, %xmm0                            mulsd   (%edi,%eax,8),
%xmm7
        mulsd   %xmm3, %xmm0                            addsd   %xmm7, %xmm1
        mulsd   %xmm5, %xmm3                            movsd   (%ecx,%eax,8),
%xmm7
        addsd   -56(%ebp), %xmm0                        mulsd   (%ebx,%eax,8),
%xmm7
        addsd   -32(%ebp), %xmm3                        addsd   -32(%ebp),
%xmm7
        movsd   %xmm0, -56(%ebp)                        movsd   %xmm7,
-32(%ebp)
        movsd   %xmm3, -32(%ebp)                        movsd   (%ecx,%eax,8),
%xmm7
        movapd  %xmm2, %xmm0                            mulsd   (%esi,%eax,8),
%xmm7
        mulsd   %xmm6, %xmm2                            addsd   -24(%ebp),
%xmm7
        mulsd   %xmm4, %xmm0                            movsd   %xmm7,
-24(%ebp)
        addsd   -40(%ebp), %xmm2                        movsd   (%ecx,%eax,8),
%xmm7
        mulsd   %xmm5, %xmm4                            mulsd   (%edi,%eax,8),
%xmm7
        addsd   -48(%ebp), %xmm0                        incl    %eax
        addsd   -24(%ebp), %xmm4                        addsd   %xmm7, %xmm0
        mulsd   %xmm6, %xmm5                            cmpl    %eax, 8(%ebp)
        movsd   %xmm0, -48(%ebp)                        jg      .L4
        movsd   %xmm2, -40(%ebp)
        movsd   %xmm4, -24(%ebp)
        addsd   %xmm5, %xmm7
        jg      .L4


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (20 preceding siblings ...)
  2008-10-24 10:12 ` bonzini at gnu dot org
@ 2008-10-24 10:17 ` bonzini at gnu dot org
  2008-10-25  4:15 ` Joey dot ye at intel dot com
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2008-10-24 10:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from bonzini at gnu dot org  2008-10-24 10:16 -------
Why doesn't bugzilla have a preview?  Run the previous comment through

   sed -e '/, *$/\!b; N; y/\n/ /'

to get the side-by-side outputs.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (21 preceding siblings ...)
  2008-10-24 10:17 ` bonzini at gnu dot org
@ 2008-10-25  4:15 ` Joey dot ye at intel dot com
  2008-10-28  1:13 ` hjl dot tools at gmail dot com
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-25  4:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from Joey dot ye at intel dot com  2008-10-25 04:14 -------
To me scheduler is irrelevant here. GCC has no core2 pipeline description so
the instruction scheduling doesn't looks optimized. But for OOO processor like
core2, IMHO scheduling shouldn't make that much difference. Also core2 + no-ira
doesn't hurt, which means core2 scheduling is not the root cause.

Instead old code uses different register for loading, but IRA code always uses
xmm7 as load target. Need to figure out two questions:
1. why instructions from core2+ira runs slower than ira?
2. why core2+ira generate so different code as non-core2?

Scheduler dump for core2:
;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;      108    47     4     0     0     0   nothing     : 70 109 43
;;       43   102     4     1     0     0   nothing     : 70 51 117 114 67 109
;;      109    47     4     2     0     0   nothing     : 70 44
;;       44   102     4     1     0     0   nothing     : 70 57 55 59 67
;;       45   102     4     0     0     0   nothing     : 70 65 67 112 110
;;       46   102     4     0     0     0   nothing     : 70 55 49 67 61
;;      110   102     4     1     0     0   nothing     : 70 65 61
;;       61   720     4     2     0     0   nothing     : 70 55 62
;;       62   720     4     1     0     0   nothing     : 70 47 111


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (22 preceding siblings ...)
  2008-10-25  4:15 ` Joey dot ye at intel dot com
@ 2008-10-28  1:13 ` hjl dot tools at gmail dot com
  2008-10-28  1:21 ` Joey dot ye at intel dot com
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-10-28  1:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from hjl dot tools at gmail dot com  2008-10-28 01:12 -------
Created an attachment (id=16571)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view)
A patch to re-enable regmove

After applying this patch to re-enable regmove, I got

[hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
[hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o ira -m32
[hjl@gnu-6 gcc]$ time ./ira

real    0m7.307s
user    0m7.305s
sys     0m0.001s
[hjl@gnu-6 gcc]$ time ./noira

real    0m7.478s
user    0m7.474s
sys     0m0.002s
[hjl@gnu-6 gcc]$


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (23 preceding siblings ...)
  2008-10-28  1:13 ` hjl dot tools at gmail dot com
@ 2008-10-28  1:21 ` Joey dot ye at intel dot com
  2008-10-28  1:37 ` hjl dot tools at gmail dot com
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Joey dot ye at intel dot com @ 2008-10-28  1:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from Joey dot ye at intel dot com  2008-10-28 01:19 -------
(In reply to comment #22)
> Created an attachment (id=16571)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view) [edit]
> A patch to re-enable regmove
> After applying this patch to re-enable regmove, I got
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
HJ, is your foo.c the case attached in comment #18?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (24 preceding siblings ...)
  2008-10-28  1:21 ` Joey dot ye at intel dot com
@ 2008-10-28  1:37 ` hjl dot tools at gmail dot com
  2008-11-04 19:30 ` hjl dot tools at gmail dot com
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-10-28  1:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from hjl dot tools at gmail dot com  2008-10-28 01:36 -------
(In reply to comment #23)
> (In reply to comment #22)
> > Created an attachment (id=16571)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571&action=view) [edit]
> > A patch to re-enable regmove
> > After applying this patch to re-enable regmove, I got
> > [hjl@gnu-6 gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32
> HJ, is your foo.c the case attached in comment #18?
> 

Yes.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (25 preceding siblings ...)
  2008-10-28  1:37 ` hjl dot tools at gmail dot com
@ 2008-11-04 19:30 ` hjl dot tools at gmail dot com
  2008-11-30 20:46 ` steven at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-11-04 19:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from hjl dot tools at gmail dot com  2008-11-04 19:28 -------
This regression isn't specific to -mtune=core2. I saw 3% regression
with -O3 on Intel64. Enable the full regmove pass fixes the regression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (26 preceding siblings ...)
  2008-11-04 19:30 ` hjl dot tools at gmail dot com
@ 2008-11-30 20:46 ` steven at gcc dot gnu dot org
  2008-11-30 20:53 ` hjl dot tools at gmail dot com
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 20:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from steven at gcc dot gnu dot org  2008-11-30 20:45 -------
Resurrecting regmove is not an option. Time is better spent on figuring out
what regmove does, that makes a difference, and see if IRA can be taught to do
the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (27 preceding siblings ...)
  2008-11-30 20:46 ` steven at gcc dot gnu dot org
@ 2008-11-30 20:53 ` hjl dot tools at gmail dot com
  2008-11-30 21:20 ` steven at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-11-30 20:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from hjl dot tools at gmail dot com  2008-11-30 20:52 -------
(In reply to comment #26)
> Resurrecting regmove is not an option. Time is better spent on figuring out
> what regmove does, that makes a difference, and see if IRA can be taught to do
> the same.
> 

x86 has

(define_insn "*movdi_1_rex64"
  [(set (match_operand:DI 0 "nonimmediate_operand"
          "=r,r  ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
        (match_operand:DI 1 "general_operand"
          "Z ,rem,i,re,n ,C ,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m ,*Ym,*x"))]
  "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))"

The alternative with * is available for regmove, but not for
IRA/reload. I don't know how you can resolve it without a different
pass.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (28 preceding siblings ...)
  2008-11-30 20:53 ` hjl dot tools at gmail dot com
@ 2008-11-30 21:20 ` steven at gcc dot gnu dot org
  2008-11-30 21:34 ` steven at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 21:20 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #28 from steven at gcc dot gnu dot org  2008-11-30 21:18 -------
You're not explaining what regmove does. What transformation do these
alternatives with "*" enable regmove to do?

I'm not saying that a separate pass is not an option. Perhaps a regmove-like
pass is necessary.  In fact it probably is, I think even Vlad already
acknowledged so. But not regmove as-we-know-it, which is a friggin' mess that
ought to go away.

What we should figure out IMHO, is which transformation(s) it is (are) that
regmove does, which are helpful here.  Maybe we can add a simple pre-regalloc
pass that does these transformations, or make it part of an existing pass,
using clean infrastructure (cfg, df) instead of the ad-hoc mess of regmove.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (29 preceding siblings ...)
  2008-11-30 21:20 ` steven at gcc dot gnu dot org
@ 2008-11-30 21:34 ` steven at gcc dot gnu dot org
  2008-12-01 18:27 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-11-30 21:34 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #29 from steven at gcc dot gnu dot org  2008-11-30 21:32 -------
The insns 8 in comment #0 show the regmove transformation that matters here:

With regmove disabled::
(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 61)
        (plus:V2SI (reg:V2SI 63 [ x ])
            (mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7f66abfb5c80 y>) [2
y
+0 S8 A64]))) 992 {*mmx_addv2si3} (expr_list:REG_DEAD (reg:V2SI 63 [ x ])
        (nil)))

vs. with regmove enabled:
(insn:HI 8 7 14 2 ../../include/mmintrin.h:300 (set (reg:V2SI 63 [ x ])
        (plus:V2SI (reg:V2SI 63 [ x ])
            (mem/c/i:V2SI (symbol_ref:DI ("y") <var_decl 0x7fd36e03cc80 y>) [2 
y+0 S8 A64]))) 992 {*mmx_addv2si3} (nil))

This is a "textbook" example of a regmove transformation (where the use of the
word "textbook" is an insult to all textbooks ever written, since nothing in
regmove is "textbook", but oh well...).  It's turned a 3-address instruction
into a 2-address instruction, and this is precisely the transformation that
regmove was originally written for.

In IRA, I would've expected reg 61 and reg 63 to be coalesced to give the same
result.

Vlad already commented on this in comment #6.  It's clear from his description
why IRA did not coalesce reg 61 and reg 63.  

Other compilers do this kind of transformation via reverse copy propagation. 
GCC could perhaps add something like that too, when it transforms a 3-address
insn to a 2-address insn.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (30 preceding siblings ...)
  2008-11-30 21:34 ` steven at gcc dot gnu dot org
@ 2008-12-01 18:27 ` ubizjak at gmail dot com
  2008-12-21 18:06 ` hjl dot tools at gmail dot com
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: ubizjak at gmail dot com @ 2008-12-01 18:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from ubizjak at gmail dot com  2008-12-01 18:26 -------
(In reply to comment #29)

> Other compilers do this kind of transformation via reverse copy propagation. 
> GCC could perhaps add something like that too, when it transforms a 3-address
> insn to a 2-address insn.

Will this also solve PR 19398, where we have:

        fstps  -4(%ebp)
(*)     movss  -4(%ebp), %xmm0
(*)     cvttss2si       %xmm0, %eax
        leave

Note, that we could simply have:

        cvttss2si       -4(%ebp), %eax


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (31 preceding siblings ...)
  2008-12-01 18:27 ` ubizjak at gmail dot com
@ 2008-12-21 18:06 ` hjl dot tools at gmail dot com
  2009-01-29 17:13 ` hjl dot tools at gmail dot com
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-12-21 18:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from hjl dot tools at gmail dot com  2008-12-21 18:04 -------
*** Bug 38601 has been marked as a duplicate of this bug. ***


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tkoenig at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (32 preceding siblings ...)
  2008-12-21 18:06 ` hjl dot tools at gmail dot com
@ 2009-01-29 17:13 ` hjl dot tools at gmail dot com
  2009-01-29 17:57 ` hjl dot tools at gmail dot com
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-01-29 17:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from hjl dot tools at gmail dot com  2009-01-29 17:13 -------
This has been fixed by revision 143757:

http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01410.html


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (33 preceding siblings ...)
  2009-01-29 17:13 ` hjl dot tools at gmail dot com
@ 2009-01-29 17:57 ` hjl dot tools at gmail dot com
  2009-02-04  7:57 ` bonzini at gnu dot org
  2009-02-04  7:58 ` bonzini at gnu dot org
  36 siblings, 0 replies; 38+ messages in thread
From: hjl dot tools at gmail dot com @ 2009-01-29 17:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from hjl dot tools at gmail dot com  2009-01-29 17:57 -------
Reopen since revision 143757 isn't supposed to fix it.


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (34 preceding siblings ...)
  2009-01-29 17:57 ` hjl dot tools at gmail dot com
@ 2009-02-04  7:57 ` bonzini at gnu dot org
  2009-02-04  7:58 ` bonzini at gnu dot org
  36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2009-02-04  7:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from bonzini at gnu dot org  2009-02-04 07:57 -------
> Reopen since revision 143757 isn't supposed to fix it.

Who cares if 143757 "isn't supposed to fix it", it *is* fixed:

        movq    x(%rip), %mm0
        paddd   y(%rip), %mm0
        movd    %mm0, %rax
        ret


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
  2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
                   ` (35 preceding siblings ...)
  2009-02-04  7:57 ` bonzini at gnu dot org
@ 2009-02-04  7:58 ` bonzini at gnu dot org
  36 siblings, 0 replies; 38+ messages in thread
From: bonzini at gnu dot org @ 2009-02-04  7:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #35 from bonzini at gnu dot org  2009-02-04 07:57 -------
(and all bugs depending on this one are also fixed)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2009-02-04  7:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-04  5:45 [Bug middle-end/37364] New: [4.4 Regression] IRA generates ineffient code hjl dot tools at gmail dot com
2008-09-04  9:50 ` [Bug middle-end/37364] " rguenth at gcc dot gnu dot org
2008-09-04 16:04 ` hjl dot tools at gmail dot com
2008-09-04 16:14 ` hjl dot tools at gmail dot com
2008-09-04 17:44 ` hjl dot tools at gmail dot com
2008-09-04 17:55 ` hjl dot tools at gmail dot com
2008-09-04 18:40 ` [Bug middle-end/37364] [4.4 Regression] IRA generates inefficient code hjl dot tools at gmail dot com
2008-09-04 19:26 ` vmakarov at redhat dot com
2008-09-04 19:46 ` hjl dot tools at gmail dot com
2008-09-04 20:14 ` rguenth at gcc dot gnu dot org
2008-09-04 20:18 ` hjl dot tools at gmail dot com
2008-09-04 20:32 ` hjl dot tools at gmail dot com
2008-09-06 15:58 ` [Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass ubizjak at gmail dot com
2008-09-06 16:26 ` ubizjak at gmail dot com
2008-09-06 16:50 ` hjl dot tools at gmail dot com
2008-09-09 16:02 ` hjl dot tools at gmail dot com
2008-09-11 13:47 ` bonzini at gnu dot org
2008-09-11 17:34 ` ubizjak at gmail dot com
2008-10-22  3:19 ` mmitchel at gcc dot gnu dot org
2008-10-23  8:44 ` Joey dot ye at intel dot com
2008-10-24  8:37 ` Joey dot ye at intel dot com
2008-10-24 10:12 ` bonzini at gnu dot org
2008-10-24 10:17 ` bonzini at gnu dot org
2008-10-25  4:15 ` Joey dot ye at intel dot com
2008-10-28  1:13 ` hjl dot tools at gmail dot com
2008-10-28  1:21 ` Joey dot ye at intel dot com
2008-10-28  1:37 ` hjl dot tools at gmail dot com
2008-11-04 19:30 ` hjl dot tools at gmail dot com
2008-11-30 20:46 ` steven at gcc dot gnu dot org
2008-11-30 20:53 ` hjl dot tools at gmail dot com
2008-11-30 21:20 ` steven at gcc dot gnu dot org
2008-11-30 21:34 ` steven at gcc dot gnu dot org
2008-12-01 18:27 ` ubizjak at gmail dot com
2008-12-21 18:06 ` hjl dot tools at gmail dot com
2009-01-29 17:13 ` hjl dot tools at gmail dot com
2009-01-29 17:57 ` hjl dot tools at gmail dot com
2009-02-04  7:57 ` bonzini at gnu dot org
2009-02-04  7:58 ` bonzini at gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).