* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
@ 2011-08-17 13:53 ` kirill.yukhin at intel dot com
2011-08-17 17:19 ` vmakarov at redhat dot com
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-08-17 13:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #1 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-08-17 13:41:58 UTC ---
Created attachment 25033
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25033
Testcase
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
@ 2011-08-17 17:19 ` vmakarov at redhat dot com
2011-08-17 18:58 ` hjl.tools at gmail dot com
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-17 17:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #2 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-17 17:16:11 UTC ---
I guess something wrong with hard register preferencing for multi-register
pseudos in ira-color.c::ira_assign. I believe it works fine for one-register
pseudos. I'll look at this. Thanks for reporting.
By the way, your patch is wrong. There should be TARGET_64BIT in define_split
instead of !TARGET_64BIT.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
2011-08-17 17:19 ` vmakarov at redhat dot com
@ 2011-08-17 18:58 ` hjl.tools at gmail dot com
2011-08-17 19:18 ` hjl.tools at gmail dot com
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 18:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 18:43:41 UTC ---
(In reply to comment #2)
> I guess something wrong with hard register preferencing for multi-register
> pseudos in ira-color.c::ira_assign. I believe it works fine for one-register
> pseudos. I'll look at this. Thanks for reporting.
One problem is IRA ues RCX/RBX pair, instead of R8/R9, for TImode.
Since RBX is callee-saved, we have to save and restore it.
> By the way, your patch is wrong. There should be TARGET_64BIT in define_split
> instead of !TARGET_64BIT.
It has been fixed. Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (2 preceding siblings ...)
2011-08-17 18:58 ` hjl.tools at gmail dot com
@ 2011-08-17 19:18 ` hjl.tools at gmail dot com
2011-08-17 19:32 ` hjl.tools at gmail dot com
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 19:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 19:16:40 UTC ---
Created attachment 25038
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25038
A patch
This patch generates:
movq %rdi, %rdx
mulx %rsi, %r10, %r9
addq $3, %r9
adcq $0, %r10
movq %r9, k2(%rip)
movq %r9, %rax
movq %r10, k2+8(%rip)
movq %r10, %rdx
ret
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (3 preceding siblings ...)
2011-08-17 19:18 ` hjl.tools at gmail dot com
@ 2011-08-17 19:32 ` hjl.tools at gmail dot com
2011-08-17 22:29 ` vmakarov at redhat dot com
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 19:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 19:22:51 UTC ---
(In reply to comment #2)
> I guess something wrong with hard register preferencing for multi-register
> pseudos in ira-color.c::ira_assign. I believe it works fine for one-register
> pseudos. I'll look at this. Thanks for reporting.
>
Does IRA choose caller-saved register over callee-saved registers?
For multi-register pseudos, one of hard register may be callee-saved.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (4 preceding siblings ...)
2011-08-17 19:32 ` hjl.tools at gmail dot com
@ 2011-08-17 22:29 ` vmakarov at redhat dot com
2011-08-18 14:55 ` hjl.tools at gmail dot com
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-17 22:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #6 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-17 22:21:13 UTC ---
(In reply to comment #4)
> Created attachment 25038 [details]
> A patch
>
> This patch generates:
>
> movq %rdi, %rdx
> mulx %rsi, %r10, %r9
> addq $3, %r9
> adcq $0, %r10
> movq %r9, k2(%rip)
> movq %r9, %rax
> movq %r10, k2+8(%rip)
> movq %r10, %rdx
> ret
I don't think it is a good patch (changing register allocation order) because
it prefers new x86-64 registers and results in longer insns and bigger code for
many programs.
I am working on a patch to fix it in IRA. I found a typo which is a reason for
such behaviour. I think it will be ready tomorrow.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (5 preceding siblings ...)
2011-08-17 22:29 ` vmakarov at redhat dot com
@ 2011-08-18 14:55 ` hjl.tools at gmail dot com
2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 14:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 14:44:03 UTC ---
Another problem is
[hjl@gnu-6 pr50107]$ cat udi.i
extern unsigned long long k2;
unsigned long long test_mul_64 (unsigned long a, unsigned long b)
{
k2 = (unsigned long long) a * b;
k2+=3;
return k2;
}
[hjl@gnu-6 pr50107]$ make udi.s PIC=-m32
/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o udi.s -O2 -mbmi2 -m32
udi.i
[hjl@gnu-6 pr50107]$ cat udi.s
.file "udi.i"
.text
.p2align 4,,15
.globl test_mul_64
.type test_mul_64, @function
test_mul_64:
.LFB0:
.cfi_startproc
pushl %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
movl 8(%esp), %edx
mulx 12(%esp), %ecx, %ebx
movl %ecx, %eax
movl %ebx, %edx
addl $3, %eax
adcl $0, %edx
movl %eax, k2
movl %edx, k2+4
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 4
ret
.cfi_endproc
.LFE0:
.size test_mul_64, .-test_mul_64
.ident "GCC: (GNU) 4.7.0 20110817 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$
EDX is the input of MULX and dead after MULX. IRA should allocate
EAD/EDX for the output of mulx.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (6 preceding siblings ...)
2011-08-18 14:55 ` hjl.tools at gmail dot com
@ 2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
2011-08-18 15:31 ` hjl.tools at gmail dot com
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2011-08-18 15:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #8 from Vladimir Makarov <vmakarov at gcc dot gnu.org> 2011-08-18 14:56:46 UTC ---
Author: vmakarov
Date: Thu Aug 18 14:56:36 2011
New Revision: 177865
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177865
Log:
2011-08-17 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/50107
* ira-int.h (ira_hard_reg_not_in_set_p): Remove.
(ira_hard_reg_in_set_p): New.
* ira-color.c (calculate_saved_nregs): New.
(assign_hard_reg): Use it. Set up allocated_hard_reg_p for all
hard regs.
(allocno_reload_assign, fast_allocation): Use
ira_hard_reg_set_intersection_p instead of
ira_hard_reg_not_in_set_p.
* ira.c (setup_reg_renumber): Use
ira_hard_reg_set_intersection_p instead of
ira_hard_reg_not_in_set_p.
(setup_allocno_assignment_flags, calculate_allocation_cost): Use
ira_hard_reg_in_set_p instead of ira_hard_reg_not_in_set_p.
* ira-costs.c (ira_tune_allocno_costs): Use
ira_hard_reg_set_intersection_p instead of
ira_hard_reg_not_in_set_p.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira-color.c
trunk/gcc/ira-costs.c
trunk/gcc/ira-int.h
trunk/gcc/ira.c
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (7 preceding siblings ...)
2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
@ 2011-08-18 15:31 ` hjl.tools at gmail dot com
2011-08-18 18:29 ` vmakarov at redhat dot com
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 15:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011-08-18
Ever Confirmed|0 |1
--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 15:23:59 UTC ---
With revision 177865 + MULX change, I got
[hjl@gnu-6 pr50107]$ cat uti-2.i
unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
{
return (unsigned __int128) a*b;
}
[hjl@gnu-6 pr50107]$ make uti-2.s
/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2
uti-2.i
[hjl@gnu-6 pr50107]$ cat uti-2.s
.file "uti-2.i"
.text
.p2align 4,,15
.globl test_mul_64
.type test_mul_64, @function
test_mul_64:
.LFB0:
.cfi_startproc
movq %rdi, %rdx
mulx %rsi, %rax, %rsi
movq %rsi, %rdx
ret
.cfi_endproc
.LFE0:
.size test_mul_64, .-test_mul_64
.ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$
I would expect
movq %rdi, %rdx
mulx %rsi, %rax, %rdx
ret
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (8 preceding siblings ...)
2011-08-18 15:31 ` hjl.tools at gmail dot com
@ 2011-08-18 18:29 ` vmakarov at redhat dot com
2011-08-18 19:02 ` hjl.tools at gmail dot com
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-18 18:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #10 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-18 18:24:42 UTC ---
(In reply to comment #9)
> With revision 177865 + MULX change, I got
>
> [hjl@gnu-6 pr50107]$ cat uti-2.i
> unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
> {
> return (unsigned __int128) a*b;
> }
> [hjl@gnu-6 pr50107]$ make uti-2.s
> /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2
> uti-2.i
> [hjl@gnu-6 pr50107]$ cat uti-2.s
> .file "uti-2.i"
> .text
> .p2align 4,,15
> .globl test_mul_64
> .type test_mul_64, @function
> test_mul_64:
> .LFB0:
> .cfi_startproc
> movq %rdi, %rdx
> mulx %rsi, %rax, %rsi
> movq %rsi, %rdx
> ret
> .cfi_endproc
> .LFE0:
> .size test_mul_64, .-test_mul_64
> .ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
> .section .note.GNU-stack,"",@progbits
> [hjl@gnu-6 pr50107]$
>
> I would expect
>
> movq %rdi, %rdx
> mulx %rsi, %rax, %rdx
> ret
I think it i a reload problem. IRA assigns dx to pseudo 71 (an insn output)
but reload then spills it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (9 preceding siblings ...)
2011-08-18 18:29 ` vmakarov at redhat dot com
@ 2011-08-18 19:02 ` hjl.tools at gmail dot com
2011-08-19 6:05 ` [Bug rtl-optimization/50107] [IRA, i386] allocates registers " hjl.tools at gmail dot com
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 19:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 18:31:37 UTC ---
(In reply to comment #10)
> > movq %rdi, %rdx
> > mulx %rsi, %rax, %rsi
> > movq %rsi, %rdx
> > ret
> > .cfi_endproc
> > .LFE0:
> > .size test_mul_64, .-test_mul_64
> > .ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
> > .section .note.GNU-stack,"",@progbits
> > [hjl@gnu-6 pr50107]$
> >
> > I would expect
> >
> > movq %rdi, %rdx
> > mulx %rsi, %rax, %rdx
> > ret
>
> I think it i a reload problem. IRA assigns dx to pseudo 71 (an insn output)
> but reload then spills it.
uti-2.i.188r.asmcons has
(insn 11 4 24 2 (parallel [
(set (reg:DI 72)
(mult:DI (reg/v:DI 64 [ b ])
(reg/v:DI 63 [ a ])))
(set (reg:DI 73 [+8 ])
(truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 64
[ b ]))
(zero_extend:TI (reg/v:DI 63 [ a ])))
(const_int 64 [0x40]))))
]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
(expr_list:REG_DEAD (reg/v:DI 64 [ b ])
(expr_list:REG_DEAD (reg/v:DI 63 [ a ])
(nil))))
uti-2.i.191r.ira generates:
(insn 11 28 25 2 (parallel [
(set (reg:DI 0 ax [72])
(mult:DI (reg/v:DI 4 si [orig:64 b ] [64])
(reg:DI 1 dx)))
(set (reg:DI 4 si [orig:73+8 ] [73])
(truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4
s
i [orig:64 b ] [64]))
(zero_extend:TI (reg:DI 1 dx)))
(const_int 64 [0x40]))))
]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
(nil))
Why does IRA/reload choose SI for pseudo 73?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (10 preceding siblings ...)
2011-08-18 19:02 ` hjl.tools at gmail dot com
@ 2011-08-19 6:05 ` hjl.tools at gmail dot com
2011-08-19 16:14 ` hjl.tools at gmail dot com
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-19 6:05 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-19 01:12:56 UTC ---
I changed MULX to
(define_insn "bmi2_umul<mode><dwi>3_1"
[(set (match_operand:<DWI> 0 "register_operand" "=r")
(mult:<DWI>
(zero_extend:<DWI>
(match_operand:DWIH 1 "register_operand" "d"))
(zero_extend:<DWI>
(match_operand:DWIH 2 "nonimmediate_operand" "rm"))))]
"TARGET_BMI2"
{
if (<MODE>mode == DImode)
return "mulx\t{%2, %q0, %N0|%N0, %q0, %2}";
else
return "mulx\t{%2, %k0, %K0|%K0, %k0, %2}";
}
[(set_attr "type" "imul")
(set_attr "prefix" "vex")
(set_attr "mode" "<MODE>")])
Now I got
[hjl@gnu-6 pr50107]$ cat udi-2.i
unsigned long long test_mul_64 (unsigned long a, unsigned long b)
{
return (unsigned long long) a * b;
}
[hjl@gnu-6 pr50107]$ /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -O2 -mbmi2 -dp -m32
udi-2.i
[hjl@gnu-6 pr50107]$ cat udi-2.s
.file "udi-2.i"
.text
.p2align 4,,15
.globl test_mul_64
.type test_mul_64, @function
test_mul_64:
.LFB0:
.cfi_startproc
movl 8(%esp), %edx # 20 *movsi_internal/1 [length = 4]
mulx 4(%esp), %eax, %edx # 9 bmi2_umulsidi3_1 [length = 7]
ret # 25 return_internal [length = 1]
.cfi_endproc
.LFE0:
.size test_mul_64, .-test_mul_64
.ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$ cat uti-2.i
unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
{
return (unsigned __int128) a*b;
}
[hjl@gnu-6 pr50107]$ /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -O2 -mbmi2 -dp uti-2.i
[hjl@gnu-6 pr50107]$ cat uti-2.s
.file "uti-2.i"
.text
.p2align 4,,15
.globl test_mul_64
.type test_mul_64, @function
test_mul_64:
.LFB0:
.cfi_startproc
movq %rsi, %rdx # 24 *movdi_internal_rex64/2 [length = 3]
mulx %rdi, %rsi, %rdi # 11 bmi2_umulditi3_1 [length = 5]
movq %rsi, %rax # 25 *movdi_internal_rex64/2 [length = 3]
movq %rdi, %rdx # 26 *movdi_internal_rex64/2 [length = 3]
ret # 29 return_internal [length = 1]
.cfi_endproc
.LFE0:
.size test_mul_64, .-test_mul_64
.ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$
Why don't we generate
mulx %rdi, %rax, %rdx
for 64bit?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (11 preceding siblings ...)
2011-08-19 6:05 ` [Bug rtl-optimization/50107] [IRA, i386] allocates registers " hjl.tools at gmail dot com
@ 2011-08-19 16:14 ` hjl.tools at gmail dot com
2011-08-19 16:24 ` vmakarov at redhat dot com
2021-12-26 22:18 ` pinskia at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-19 16:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-19 16:05:58 UTC ---
We start with
(insn 11 4 16 2 (set (reg:TI 65)
(mult:TI (zero_extend:TI (reg/v:DI 64 [ b ]))
(zero_extend:TI (reg/v:DI 63 [ a ])))) uti-2.i:3 339
{bmi2_umulditi3_1}
(expr_list:REG_DEAD (reg/v:DI 64 [ b ])
(expr_list:REG_DEAD (reg/v:DI 63 [ a ])
(nil))))
(insn 16 11 19 2 (set (reg/i:TI 0 ax)
(reg:TI 65)) uti-2.i:4 60 {*movti_internal_rex64}
(expr_list:REG_DEAD (reg:TI 65)
(nil)))
and IRA generates:
(insn 24 4 11 2 (set (reg:DI 1 dx)
(reg/v:DI 4 si [orig:64 b ] [64])) uti-2.i:3 62 {*movdi_internal_rex64}
(nil))
(insn 11 24 16 2 (set (reg:TI 4 si [65])
(mult:TI (zero_extend:TI (reg:DI 1 dx))
(zero_extend:TI (reg/v:DI 5 di [orig:63 a ] [63])))) uti-2.i:3 339
{bmi2_umulditi3_1}
(nil))
(insn 16 11 19 2 (set (reg/i:TI 0 ax)
(reg:TI 4 si [65])) uti-2.i:4 60 {*movti_internal_rex64}
(nil))
(insn 19 16 22 2 (use (reg/i:TI 0 ax)) uti-2.i:4 -1
(nil))
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (12 preceding siblings ...)
2011-08-19 16:14 ` hjl.tools at gmail dot com
@ 2011-08-19 16:24 ` vmakarov at redhat dot com
2021-12-26 22:18 ` pinskia at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-19 16:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
--- Comment #14 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-19 16:12:48 UTC ---
(In reply to comment #11)
> (In reply to comment #10)
> > > movq %rdi, %rdx
> > > mulx %rsi, %rax, %rsi
> > > movq %rsi, %rdx
> > > ret
> > > .cfi_endproc
> > > .LFE0:
> > > .size test_mul_64, .-test_mul_64
> > > .ident "GCC: (GNU) 4.7.0 20110818 (experimental)"
> > > .section .note.GNU-stack,"",@progbits
> > > [hjl@gnu-6 pr50107]$
> > >
> > > I would expect
> > >
> > > movq %rdi, %rdx
> > > mulx %rsi, %rax, %rdx
> > > ret
> >
> > I think it i a reload problem. IRA assigns dx to pseudo 71 (an insn output)
> > but reload then spills it.
>
> uti-2.i.188r.asmcons has
>
> (insn 11 4 24 2 (parallel [
> (set (reg:DI 72)
> (mult:DI (reg/v:DI 64 [ b ])
> (reg/v:DI 63 [ a ])))
> (set (reg:DI 73 [+8 ])
> (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 64
> [ b ]))
> (zero_extend:TI (reg/v:DI 63 [ a ])))
> (const_int 64 [0x40]))))
> ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
> (expr_list:REG_DEAD (reg/v:DI 64 [ b ])
> (expr_list:REG_DEAD (reg/v:DI 63 [ a ])
> (nil))))
>
> uti-2.i.191r.ira generates:
>
> (insn 11 28 25 2 (parallel [
> (set (reg:DI 0 ax [72])
> (mult:DI (reg/v:DI 4 si [orig:64 b ] [64])
> (reg:DI 1 dx)))
> (set (reg:DI 4 si [orig:73+8 ] [73])
> (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4
> s
> i [orig:64 b ] [64]))
> (zero_extend:TI (reg:DI 1 dx)))
> (const_int 64 [0x40]))))
> ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
> (nil))
>
> Why does IRA/reload choose SI for pseudo 73?
IRA assigns dx to pseudo 73. Than reload pass needs dx for pseudo 63 and
reload spills 73 and assigns si to 73 again. Reload pass spills pseudo 73
because it believes that pseudos living through insn or dead or set (pseudo 73
is set) in the insn conflict with necessary reload.
Of course it is really not necessary to spill pseudo 73, but to teach reload
pass to that is a big, error-prune project. I'd not recommend to start it.
I myself am not interesting to work on the reload pass. Instead I prefer to
work on LRA (local RA) which is a reload pass replacement.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
` (13 preceding siblings ...)
2011-08-19 16:24 ` vmakarov at redhat dot com
@ 2021-12-26 22:18 ` pinskia at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-26 22:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.8.0
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
All of the register allocation issues referenced in this bug report were fixed
in GCC 4.8 and above as far as I can test.
^ permalink raw reply [flat|nested] 16+ messages in thread