public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way
@ 2011-08-17 13:42 kirill.yukhin at intel dot com
  2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-08-17 13:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

             Bug #: 50107
           Summary: [IRA, i386] allocates regiters in very non-optimal way
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: kirill.yukhin@intel.com


Created attachment 25032
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25032
Patch, enabling MULX insn

Hi,
I am working on enabling of new MULX instruction for GCC.
It have to relax generic unsigned mult in two ways: no falgs are clobbered, and
(the main) destination may be arbitrary 2 GPR's.

Patch is attached along with testcase.

Problem is that such relaxation leads to useless spills/fills.
Command line is:
./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc test.c -S -Ofast
Here is assembly with MULX:
test_mul_64:
.LFB0:
        movq    %rdi, %rdx
        pushq   %rbx              <--------
        mulx    %rsi, %rbx, %rcx
        addq    $3, %rcx
        adcq    $0, %rbx
        movq    %rcx, %rax
        movq    %rcx, k2(%rip)
        movq    %rbx, %rdx        <--------
        movq    %rbx, k2+8(%rip)
        popq    %rbx              <--------
        ret

You can see, that if we replace ebx usage with edx, instruction marked with
arrows will dissapear. 

Maybe the problem is connected with my definition of MULX?
But it seems to me as IRA misoptimization.

BTW, r8, r9 etc. regs are caller-safe, so we may just use them without saving
to stack? Why IRA doesn't do that?

Thanks, K


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
@ 2011-08-17 13:53 ` kirill.yukhin at intel dot com
  2011-08-17 17:19 ` vmakarov at redhat dot com
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-08-17 13:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #1 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-08-17 13:41:58 UTC ---
Created attachment 25033
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25033
Testcase


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
  2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
@ 2011-08-17 17:19 ` vmakarov at redhat dot com
  2011-08-17 18:58 ` hjl.tools at gmail dot com
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-17 17:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #2 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-17 17:16:11 UTC ---
I guess something wrong with hard register preferencing for multi-register
pseudos in ira-color.c::ira_assign.  I believe it works fine for one-register
pseudos.  I'll look at this.  Thanks for reporting.

By the way, your patch is wrong.  There should be TARGET_64BIT in define_split
instead of !TARGET_64BIT.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
  2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
  2011-08-17 17:19 ` vmakarov at redhat dot com
@ 2011-08-17 18:58 ` hjl.tools at gmail dot com
  2011-08-17 19:18 ` hjl.tools at gmail dot com
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 18:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 18:43:41 UTC ---
(In reply to comment #2)
> I guess something wrong with hard register preferencing for multi-register
> pseudos in ira-color.c::ira_assign.  I believe it works fine for one-register
> pseudos.  I'll look at this.  Thanks for reporting.

One problem is IRA ues RCX/RBX pair, instead of R8/R9, for TImode.
Since RBX is callee-saved, we have to save and restore it.

> By the way, your patch is wrong.  There should be TARGET_64BIT in define_split
> instead of !TARGET_64BIT.

It has been fixed.  Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (2 preceding siblings ...)
  2011-08-17 18:58 ` hjl.tools at gmail dot com
@ 2011-08-17 19:18 ` hjl.tools at gmail dot com
  2011-08-17 19:32 ` hjl.tools at gmail dot com
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 19:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 19:16:40 UTC ---
Created attachment 25038
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25038
A patch

This patch generates:

    movq    %rdi, %rdx
    mulx    %rsi, %r10, %r9
    addq    $3, %r9
    adcq    $0, %r10
    movq    %r9, k2(%rip)
    movq    %r9, %rax
    movq    %r10, k2+8(%rip)
    movq    %r10, %rdx
    ret


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (3 preceding siblings ...)
  2011-08-17 19:18 ` hjl.tools at gmail dot com
@ 2011-08-17 19:32 ` hjl.tools at gmail dot com
  2011-08-17 22:29 ` vmakarov at redhat dot com
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-17 19:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-17 19:22:51 UTC ---
(In reply to comment #2)
> I guess something wrong with hard register preferencing for multi-register
> pseudos in ira-color.c::ira_assign.  I believe it works fine for one-register
> pseudos.  I'll look at this.  Thanks for reporting.
> 

Does IRA choose caller-saved register over callee-saved registers?
For multi-register pseudos, one of hard register may be callee-saved.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (4 preceding siblings ...)
  2011-08-17 19:32 ` hjl.tools at gmail dot com
@ 2011-08-17 22:29 ` vmakarov at redhat dot com
  2011-08-18 14:55 ` hjl.tools at gmail dot com
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-17 22:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #6 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-17 22:21:13 UTC ---
(In reply to comment #4)
> Created attachment 25038 [details]
> A patch
> 
> This patch generates:
> 
>     movq    %rdi, %rdx
>     mulx    %rsi, %r10, %r9
>     addq    $3, %r9
>     adcq    $0, %r10
>     movq    %r9, k2(%rip)
>     movq    %r9, %rax
>     movq    %r10, k2+8(%rip)
>     movq    %r10, %rdx
>     ret

I don't think it is a good patch (changing register allocation order) because
it prefers new x86-64 registers and results in longer insns and bigger code for
many programs.

I am working on a patch to fix it in IRA.  I found a typo which is a reason for
such behaviour.  I think it will be ready tomorrow.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (5 preceding siblings ...)
  2011-08-17 22:29 ` vmakarov at redhat dot com
@ 2011-08-18 14:55 ` hjl.tools at gmail dot com
  2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 14:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 14:44:03 UTC ---
Another problem is

[hjl@gnu-6 pr50107]$ cat udi.i 
extern unsigned long long k2;

unsigned long long test_mul_64 (unsigned long a, unsigned long b)
{
  k2 = (unsigned long long) a * b;
  k2+=3;
  return k2;
}
[hjl@gnu-6 pr50107]$ make udi.s PIC=-m32
/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o udi.s -O2 -mbmi2 -m32
udi.i
[hjl@gnu-6 pr50107]$ cat udi.s
    .file    "udi.i"
    .text
    .p2align 4,,15
    .globl    test_mul_64
    .type    test_mul_64, @function
test_mul_64:
.LFB0:
    .cfi_startproc
    pushl    %ebx
    .cfi_def_cfa_offset 8
    .cfi_offset 3, -8
    movl    8(%esp), %edx
    mulx    12(%esp), %ecx, %ebx
    movl    %ecx, %eax
    movl    %ebx, %edx
    addl    $3, %eax
    adcl    $0, %edx
    movl    %eax, k2
    movl    %edx, k2+4
    popl    %ebx
    .cfi_restore 3
    .cfi_def_cfa_offset 4
    ret
    .cfi_endproc
.LFE0:
    .size    test_mul_64, .-test_mul_64
    .ident    "GCC: (GNU) 4.7.0 20110817 (experimental)"
    .section    .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$ 

EDX is the input of MULX and dead after MULX.  IRA should allocate
EAD/EDX for the output of mulx.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (6 preceding siblings ...)
  2011-08-18 14:55 ` hjl.tools at gmail dot com
@ 2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
  2011-08-18 15:31 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2011-08-18 15:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #8 from Vladimir Makarov <vmakarov at gcc dot gnu.org> 2011-08-18 14:56:46 UTC ---
Author: vmakarov
Date: Thu Aug 18 14:56:36 2011
New Revision: 177865

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177865
Log:
2011-08-17  Vladimir Makarov  <vmakarov@redhat.com>

    PR rtl-optimization/50107
    * ira-int.h (ira_hard_reg_not_in_set_p): Remove.
    (ira_hard_reg_in_set_p): New.

    * ira-color.c (calculate_saved_nregs): New.
    (assign_hard_reg): Use it.  Set up allocated_hard_reg_p for all
    hard regs.
    (allocno_reload_assign, fast_allocation): Use
    ira_hard_reg_set_intersection_p instead of
    ira_hard_reg_not_in_set_p.

    * ira.c (setup_reg_renumber): Use
    ira_hard_reg_set_intersection_p instead of
    ira_hard_reg_not_in_set_p.
    (setup_allocno_assignment_flags, calculate_allocation_cost): Use
    ira_hard_reg_in_set_p instead of ira_hard_reg_not_in_set_p.

    * ira-costs.c (ira_tune_allocno_costs): Use
    ira_hard_reg_set_intersection_p instead of
    ira_hard_reg_not_in_set_p.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-color.c
    trunk/gcc/ira-costs.c
    trunk/gcc/ira-int.h
    trunk/gcc/ira.c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (7 preceding siblings ...)
  2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
@ 2011-08-18 15:31 ` hjl.tools at gmail dot com
  2011-08-18 18:29 ` vmakarov at redhat dot com
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 15:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-08-18
     Ever Confirmed|0                           |1

--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 15:23:59 UTC ---
With revision 177865 + MULX change, I got

[hjl@gnu-6 pr50107]$ cat uti-2.i
unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
{
  return (unsigned __int128) a*b;
}
[hjl@gnu-6 pr50107]$ make uti-2.s
/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2 
uti-2.i
[hjl@gnu-6 pr50107]$ cat uti-2.s
    .file    "uti-2.i"
    .text
    .p2align 4,,15
    .globl    test_mul_64
    .type    test_mul_64, @function
test_mul_64:
.LFB0:
    .cfi_startproc
    movq    %rdi, %rdx
    mulx    %rsi, %rax, %rsi
    movq    %rsi, %rdx
    ret
    .cfi_endproc
.LFE0:
    .size    test_mul_64, .-test_mul_64
    .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
    .section    .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$ 

I would expect

    movq    %rdi, %rdx
    mulx    %rsi, %rax, %rdx
    ret


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (8 preceding siblings ...)
  2011-08-18 15:31 ` hjl.tools at gmail dot com
@ 2011-08-18 18:29 ` vmakarov at redhat dot com
  2011-08-18 19:02 ` hjl.tools at gmail dot com
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-18 18:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #10 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-18 18:24:42 UTC ---
(In reply to comment #9)
> With revision 177865 + MULX change, I got
> 
> [hjl@gnu-6 pr50107]$ cat uti-2.i
> unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
> {
>   return (unsigned __int128) a*b;
> }
> [hjl@gnu-6 pr50107]$ make uti-2.s
> /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2 
> uti-2.i
> [hjl@gnu-6 pr50107]$ cat uti-2.s
>     .file    "uti-2.i"
>     .text
>     .p2align 4,,15
>     .globl    test_mul_64
>     .type    test_mul_64, @function
> test_mul_64:
> .LFB0:
>     .cfi_startproc
>     movq    %rdi, %rdx
>     mulx    %rsi, %rax, %rsi
>     movq    %rsi, %rdx
>     ret
>     .cfi_endproc
> .LFE0:
>     .size    test_mul_64, .-test_mul_64
>     .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
>     .section    .note.GNU-stack,"",@progbits
> [hjl@gnu-6 pr50107]$ 
> 
> I would expect
> 
>     movq    %rdi, %rdx
>     mulx    %rsi, %rax, %rdx
>     ret

I think it i a reload problem.  IRA assigns dx to pseudo 71 (an insn output)
but reload then spills it.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (9 preceding siblings ...)
  2011-08-18 18:29 ` vmakarov at redhat dot com
@ 2011-08-18 19:02 ` hjl.tools at gmail dot com
  2011-08-19  6:05 ` [Bug rtl-optimization/50107] [IRA, i386] allocates registers " hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-18 19:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-18 18:31:37 UTC ---
(In reply to comment #10)
> >     movq    %rdi, %rdx
> >     mulx    %rsi, %rax, %rsi
> >     movq    %rsi, %rdx
> >     ret
> >     .cfi_endproc
> > .LFE0:
> >     .size    test_mul_64, .-test_mul_64
> >     .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
> >     .section    .note.GNU-stack,"",@progbits
> > [hjl@gnu-6 pr50107]$ 
> > 
> > I would expect
> > 
> >     movq    %rdi, %rdx
> >     mulx    %rsi, %rax, %rdx
> >     ret
> 
> I think it i a reload problem.  IRA assigns dx to pseudo 71 (an insn output)
> but reload then spills it.

uti-2.i.188r.asmcons has

(insn 11 4 24 2 (parallel [
            (set (reg:DI 72)
                (mult:DI (reg/v:DI 64 [ b ])
                    (reg/v:DI 63 [ a ])))
            (set (reg:DI 73 [+8 ])
                (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 64 
[ b ]))
                            (zero_extend:TI (reg/v:DI 63 [ a ])))
                        (const_int 64 [0x40]))))
        ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
     (expr_list:REG_DEAD (reg/v:DI 64 [ b ])
        (expr_list:REG_DEAD (reg/v:DI 63 [ a ])
            (nil))))

uti-2.i.191r.ira generates:

(insn 11 28 25 2 (parallel [
            (set (reg:DI 0 ax [72])
                (mult:DI (reg/v:DI 4 si [orig:64 b ] [64])
                    (reg:DI 1 dx)))
            (set (reg:DI 4 si [orig:73+8 ] [73])
                (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4
s
i [orig:64 b ] [64]))
                            (zero_extend:TI (reg:DI 1 dx)))
                        (const_int 64 [0x40]))))
        ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
     (nil))

Why does IRA/reload choose SI for pseudo 73?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (10 preceding siblings ...)
  2011-08-18 19:02 ` hjl.tools at gmail dot com
@ 2011-08-19  6:05 ` hjl.tools at gmail dot com
  2011-08-19 16:14 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-19  6:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-19 01:12:56 UTC ---
I changed MULX to

(define_insn "bmi2_umul<mode><dwi>3_1"
  [(set (match_operand:<DWI> 0 "register_operand" "=r") 
        (mult:<DWI>
          (zero_extend:<DWI>
            (match_operand:DWIH 1 "register_operand" "d")) 
          (zero_extend:<DWI>
            (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))]
  "TARGET_BMI2"
{
  if (<MODE>mode == DImode)
    return "mulx\t{%2, %q0, %N0|%N0, %q0, %2}";
  else
    return "mulx\t{%2, %k0, %K0|%K0, %k0, %2}";
}
  [(set_attr "type" "imul")
   (set_attr "prefix" "vex")
   (set_attr "mode" "<MODE>")])

Now I got

[hjl@gnu-6 pr50107]$ cat udi-2.i
unsigned long long test_mul_64 (unsigned long a, unsigned long b)
{
  return (unsigned long long) a * b;
}
[hjl@gnu-6 pr50107]$ /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S  -O2 -mbmi2 -dp -m32
udi-2.i
[hjl@gnu-6 pr50107]$ cat udi-2.s
    .file    "udi-2.i"
    .text
    .p2align 4,,15
    .globl    test_mul_64
    .type    test_mul_64, @function
test_mul_64:
.LFB0:
    .cfi_startproc
    movl    8(%esp), %edx    # 20    *movsi_internal/1    [length = 4]
    mulx    4(%esp), %eax, %edx    # 9    bmi2_umulsidi3_1    [length = 7]
    ret    # 25    return_internal    [length = 1]
    .cfi_endproc
.LFE0:
    .size    test_mul_64, .-test_mul_64
    .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
    .section    .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$ cat uti-2.i
unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
{
  return (unsigned __int128) a*b;
}
[hjl@gnu-6 pr50107]$ /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S  -O2 -mbmi2 -dp uti-2.i
[hjl@gnu-6 pr50107]$ cat uti-2.s
    .file    "uti-2.i"
    .text
    .p2align 4,,15
    .globl    test_mul_64
    .type    test_mul_64, @function
test_mul_64:
.LFB0:
    .cfi_startproc
    movq    %rsi, %rdx    # 24    *movdi_internal_rex64/2    [length = 3]
    mulx    %rdi, %rsi, %rdi    # 11    bmi2_umulditi3_1    [length = 5]
    movq    %rsi, %rax    # 25    *movdi_internal_rex64/2    [length = 3]
    movq    %rdi, %rdx    # 26    *movdi_internal_rex64/2    [length = 3]
    ret    # 29    return_internal    [length = 1]
    .cfi_endproc
.LFE0:
    .size    test_mul_64, .-test_mul_64
    .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
    .section    .note.GNU-stack,"",@progbits
[hjl@gnu-6 pr50107]$ 

Why don't we generate

mulx    %rdi, %rax, %rdx

for 64bit?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (11 preceding siblings ...)
  2011-08-19  6:05 ` [Bug rtl-optimization/50107] [IRA, i386] allocates registers " hjl.tools at gmail dot com
@ 2011-08-19 16:14 ` hjl.tools at gmail dot com
  2011-08-19 16:24 ` vmakarov at redhat dot com
  2021-12-26 22:18 ` pinskia at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2011-08-19 16:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> 2011-08-19 16:05:58 UTC ---
We start with

(insn 11 4 16 2 (set (reg:TI 65) 
        (mult:TI (zero_extend:TI (reg/v:DI 64 [ b ])) 
            (zero_extend:TI (reg/v:DI 63 [ a ])))) uti-2.i:3 339
{bmi2_umulditi3_1}
     (expr_list:REG_DEAD (reg/v:DI 64 [ b ])
        (expr_list:REG_DEAD (reg/v:DI 63 [ a ])
            (nil))))

(insn 16 11 19 2 (set (reg/i:TI 0 ax) 
        (reg:TI 65)) uti-2.i:4 60 {*movti_internal_rex64}
     (expr_list:REG_DEAD (reg:TI 65) 
        (nil)))

and IRA generates:

(insn 24 4 11 2 (set (reg:DI 1 dx)
        (reg/v:DI 4 si [orig:64 b ] [64])) uti-2.i:3 62 {*movdi_internal_rex64}
     (nil))

(insn 11 24 16 2 (set (reg:TI 4 si [65])
        (mult:TI (zero_extend:TI (reg:DI 1 dx))
            (zero_extend:TI (reg/v:DI 5 di [orig:63 a ] [63])))) uti-2.i:3 339
{bmi2_umulditi3_1}
     (nil))

(insn 16 11 19 2 (set (reg/i:TI 0 ax) 
        (reg:TI 4 si [65])) uti-2.i:4 60 {*movti_internal_rex64}
     (nil))

(insn 19 16 22 2 (use (reg/i:TI 0 ax)) uti-2.i:4 -1
     (nil))


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (12 preceding siblings ...)
  2011-08-19 16:14 ` hjl.tools at gmail dot com
@ 2011-08-19 16:24 ` vmakarov at redhat dot com
  2021-12-26 22:18 ` pinskia at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: vmakarov at redhat dot com @ 2011-08-19 16:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #14 from Vladimir Makarov <vmakarov at redhat dot com> 2011-08-19 16:12:48 UTC ---
(In reply to comment #11)
> (In reply to comment #10)
> > >     movq    %rdi, %rdx
> > >     mulx    %rsi, %rax, %rsi
> > >     movq    %rsi, %rdx
> > >     ret
> > >     .cfi_endproc
> > > .LFE0:
> > >     .size    test_mul_64, .-test_mul_64
> > >     .ident    "GCC: (GNU) 4.7.0 20110818 (experimental)"
> > >     .section    .note.GNU-stack,"",@progbits
> > > [hjl@gnu-6 pr50107]$ 
> > > 
> > > I would expect
> > > 
> > >     movq    %rdi, %rdx
> > >     mulx    %rsi, %rax, %rdx
> > >     ret
> > 
> > I think it i a reload problem.  IRA assigns dx to pseudo 71 (an insn output)
> > but reload then spills it.
> 
> uti-2.i.188r.asmcons has
> 
> (insn 11 4 24 2 (parallel [
>             (set (reg:DI 72)
>                 (mult:DI (reg/v:DI 64 [ b ])
>                     (reg/v:DI 63 [ a ])))
>             (set (reg:DI 73 [+8 ])
>                 (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 64 
> [ b ]))
>                             (zero_extend:TI (reg/v:DI 63 [ a ])))
>                         (const_int 64 [0x40]))))
>         ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
>      (expr_list:REG_DEAD (reg/v:DI 64 [ b ])
>         (expr_list:REG_DEAD (reg/v:DI 63 [ a ])
>             (nil))))
> 
> uti-2.i.191r.ira generates:
> 
> (insn 11 28 25 2 (parallel [
>             (set (reg:DI 0 ax [72])
>                 (mult:DI (reg/v:DI 4 si [orig:64 b ] [64])
>                     (reg:DI 1 dx)))
>             (set (reg:DI 4 si [orig:73+8 ] [73])
>                 (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4
> s
> i [orig:64 b ] [64]))
>                             (zero_extend:TI (reg:DI 1 dx)))
>                         (const_int 64 [0x40]))))
>         ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
>      (nil))
> 
> Why does IRA/reload choose SI for pseudo 73?

IRA assigns dx to pseudo 73.  Than reload pass needs dx for pseudo 63 and
reload spills 73 and assigns si to 73 again.  Reload pass spills pseudo 73
because it believes that pseudos living through insn or dead or set (pseudo 73
is set) in the insn conflict with necessary reload.

Of course it is really not necessary to spill pseudo 73, but to teach reload
pass to that is a big, error-prune project.  I'd not recommend to start it.

I myself am not interesting to work on the reload pass.  Instead I prefer to
work on LRA (local RA) which is a reload pass replacement.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
  2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
                   ` (13 preceding siblings ...)
  2011-08-19 16:24 ` vmakarov at redhat dot com
@ 2021-12-26 22:18 ` pinskia at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-26 22:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.8.0
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
All of the register allocation issues referenced in this bug report were fixed
in GCC 4.8 and above as far as I can test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-26 22:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-17 13:42 [Bug rtl-optimization/50107] New: [IRA, i386] allocates regiters in very non-optimal way kirill.yukhin at intel dot com
2011-08-17 13:53 ` [Bug rtl-optimization/50107] " kirill.yukhin at intel dot com
2011-08-17 17:19 ` vmakarov at redhat dot com
2011-08-17 18:58 ` hjl.tools at gmail dot com
2011-08-17 19:18 ` hjl.tools at gmail dot com
2011-08-17 19:32 ` hjl.tools at gmail dot com
2011-08-17 22:29 ` vmakarov at redhat dot com
2011-08-18 14:55 ` hjl.tools at gmail dot com
2011-08-18 15:03 ` vmakarov at gcc dot gnu.org
2011-08-18 15:31 ` hjl.tools at gmail dot com
2011-08-18 18:29 ` vmakarov at redhat dot com
2011-08-18 19:02 ` hjl.tools at gmail dot com
2011-08-19  6:05 ` [Bug rtl-optimization/50107] [IRA, i386] allocates registers " hjl.tools at gmail dot com
2011-08-19 16:14 ` hjl.tools at gmail dot com
2011-08-19 16:24 ` vmakarov at redhat dot com
2021-12-26 22:18 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).