[Bug rtl-optimization/17236] inefficient code for long long multiply on x86

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
@ 2007-02-02  0:17 ` roger at eyesopen dot com
  2007-11-23 20:48 ` steven at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: roger at eyesopen dot com @ 2007-02-02  0:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from roger at eyesopen dot com  2007-02-02 00:17 -------
It looks like Ian's recent subreg lowering pass patch has improved code
generation on this testcase.  Previously, we'd spill three integer registers to
the stack for "LLM", we're now down to two.  [A significant improvement from
the five we spilled when this bug was reported]

Before:

LLM:    subl    $12, %esp
        movl    %ebx, (%esp)
        movl    28(%esp), %edx
        movl    20(%esp), %ebx
        movl    16(%esp), %ecx
        movl    24(%esp), %eax
        movl    %esi, 4(%esp)
        movl    %edx, %esi
        movl    %edi, 8(%esp)
        movl    %ebx, %edi
        movl    (%esp), %ebx
        imull   %ecx, %esi
        imull   %eax, %edi
        mull    %ecx
        addl    %edi, %esi
        movl    8(%esp), %edi
        leal    (%esi,%edx), %edx
        movl    4(%esp), %esi
        addl    $12, %esp
        ret

After:

LLM:    subl    $8, %esp
        movl    %ebx, (%esp)
        movl    20(%esp), %eax
        movl    %esi, 4(%esp)
        movl    24(%esp), %ecx
        movl    12(%esp), %esi
        movl    16(%esp), %ebx
        imull   %esi, %ecx
        imull   %eax, %ebx
        mull    %esi
        movl    4(%esp), %esi
        addl    %ebx, %ecx
        movl    (%esp), %ebx
        addl    $8, %esp
        leal    (%ecx,%edx), %edx
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
  2007-02-02  0:17 ` [Bug rtl-optimization/17236] inefficient code for long long multiply on x86 roger at eyesopen dot com
@ 2007-11-23 20:48 ` steven at gcc dot gnu dot org
  2007-12-18  7:43 ` bonzini at gnu dot org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-11-23 20:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from steven at gcc dot gnu dot org  2007-11-23 20:48 -------
*** Bug 6585 has been marked as a duplicate of this bug. ***


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bruno at clisp dot org
Bug 17236 depends on bug 6585, which changed state.

Bug 6585 Summary: Redundant store/load instruction pairs on ix86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |DUPLICATE

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
  2007-02-02  0:17 ` [Bug rtl-optimization/17236] inefficient code for long long multiply on x86 roger at eyesopen dot com
  2007-11-23 20:48 ` steven at gcc dot gnu dot org
@ 2007-12-18  7:43 ` bonzini at gnu dot org
  2007-12-18  7:53 ` bonzini at gnu dot org
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-18  7:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from bonzini at gnu dot org  2007-12-18 07:43 -------
The generated code has changed a lot recently, though it still uses two spills:

        pushl   %esi
        pushl   %ebx
        movl    12(%esp), %ebx         ; load alow
        movl    20(%esp), %esi         ; load blow
        movl    24(%esp), %ecx         ; load bhigh
        imull   %ebx, %ecx             ; bhigh = bhigh * alow
        movl    16(%esp), %eax         ; load ahigh
        imull   %esi, %eax             ; ahigh = ahigh * blow
        addl    %eax, %ecx
        movl    %esi, %eax             ; need to move a low part in %eax
        mull    %ebx
        leal    (%ecx,%edx), %esi      ; what the heck, a simple addl could do!
        movl    %esi, %edx
        popl    %ebx
        popl    %esi

The important thing is that the adds are one before, and one after the
multiply.      And the RTL before reload is very nice!

   28 r64:SI=[argp:SI]
   30 r66:SI=[argp:SI+0x8]
    7 {r62:SI=[argp:SI+0xc]*r64:SI;clobber flags:CC;}
    8 {r63:SI=[argp:SI+0x4]*r66:SI;clobber flags:CC;}
    9 {r62:SI=r62:SI+r63:SI;clobber flags:CC;}
   10 {r61:DI=zero_extend(r66:SI)*zero_extend(r64:SI);clobber flags:CC;}
   12 {r61:DI#4=r62:SI+r61:DI#4;clobber flags:CC;}

That could be basically

   mov (blow), %eax
   mov (alow), %ecx
   imul (bhigh), %eax, %ebx  ;needs reload of (bh) in %ebx
   imul (ahigh), %ecx, %edx  ;needs reload of (ah) in %edx
   addl %edx, %ebx
   mul  %ecx
   addl %ebx, %edx

So, if we are content with a single spill, there is now mainly a register
allocation problem.  For zero spills, we would need rematerialization of either
alow or blow, as in:

   mov (blow), %eax
   imul (bhigh), %eax, %ecx  ;needs reload of (bh) in %ecx
   imul (ahigh), (alow), %edx  ;needs reload of (ah) in %edx
   addl %edx, %ecx
   mul  (al)
   addl %ebx, %edx

So, for now let's aim at having one spill.  Then the main problem is that gcc
should try to keep one of alow or blow in %eax, in order to reuse it for the
mull.  Then it could do with a single spill, like this:

        pushl   %ebx
        movl    12(%esp), %eax         ; load alow
        movl    20(%esp), %ecx         ; load blow
        movl    24(%esp), %ebx         ; load bhigh, input reload
        imull   %eax, %ebx
        movl    16(%esp), %edx         ; load ahigh, input reload
        imull   %ecx, %edx
        addl    %edx, %ebx
        mul     %ecx
        addl    %ebx, %edx
        popl    %ebx


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bonzini at gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2007-12-18  7:43 ` bonzini at gnu dot org
@ 2007-12-18  7:53 ` bonzini at gnu dot org
  2007-12-18  8:05 ` bonzini at gnu dot org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-18  7:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from bonzini at gnu dot org  2007-12-18 07:53 -------
For -mregparm=2 the code is this:

        pushl   %ebx
        movl    8(%esp), %ebx
        movl    12(%esp), %ecx
        imull   %ebx, %edx
        imull   %eax, %ecx
        addl    %edx, %ecx
        mull    %ebx
        popl    %ebx
        leal    (%ecx,%edx), %edx
        ret

It is better only for luck: the low part of "a" is already in %eax, so that
regclass decides for the "right" preference.

  Register 64 pref AREG, else GENERAL_REGS

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2007-12-18  7:53 ` bonzini at gnu dot org
@ 2007-12-18  8:05 ` bonzini at gnu dot org
  2007-12-18  8:10 ` ebotcazou at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-18  8:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from bonzini at gnu dot org  2007-12-18 08:05 -------
Uros mentioned offlist that he wanted to hijack fwprop to always propagate
stack slots into instructions.

It would be a relatively useful piece of infrastructure to have a flag in MEMs
that marks on-stack MEMs, because other MEMs definitely must not be propagated
blindly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2007-12-18  8:05 ` bonzini at gnu dot org
@ 2007-12-18  8:10 ` ebotcazou at gcc dot gnu dot org
  2007-12-18 13:48 ` ubizjak at gmail dot com
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ebotcazou at gcc dot gnu dot org @ 2007-12-18  8:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from ebotcazou at gcc dot gnu dot org  2007-12-18 08:10 -------
> It would be a relatively useful piece of infrastructure to have a flag in MEMs
> that marks on-stack MEMs, because other MEMs definitely must not be propagated
> blindly.

Depending on your needs, MEM_NOTRAP_P could be a good approximation.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2007-12-18  8:10 ` ebotcazou at gcc dot gnu dot org
@ 2007-12-18 13:48 ` ubizjak at gmail dot com
  2007-12-18 16:02 ` bonzini at gnu dot org
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-18 13:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from ubizjak at gmail dot com  2007-12-18 13:47 -------
Generated code for a similar example is just plain stupid:

--cut here--
int test(long long a, long long b)
{
        return a * b;
}
--cut here--

gcc -O3:

test:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movl    %esi, 4(%esp)
        movl    16(%ebp), %esi
        movl    %ebx, (%esp)
        movl    8(%ebp), %ebx
        movl    %esi, %eax
        movl    4(%esp), %esi
        mull    %ebx
        movl    (%esp), %ebx
        movl    %ebp, %esp
        popl    %ebp
        ret

gcc-4.0 is a little less creative and creates:

test:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    8(%ebp), %eax
        mull    16(%ebp)
        popl    %ebx
        popl    %esi
        popl    %edi
        leave
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2007-12-18 13:48 ` ubizjak at gmail dot com
@ 2007-12-18 16:02 ` bonzini at gnu dot org
  2007-12-18 16:39 ` jakub at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-18 16:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from bonzini at gnu dot org  2007-12-18 16:01 -------
The problem in comment #11 is that GCC generates a widening multiply, and
cannot remove the DImode operations until after register allocation (!).  While
the root cause is a deficiency in RTL-level DCE, I suggest filing a separate
bug, because it could also be fixed on the tree-level.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2007-12-18 16:02 ` bonzini at gnu dot org
@ 2007-12-18 16:39 ` jakub at gcc dot gnu dot org
  2007-12-18 16:51 ` bonzini at gnu dot org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-12-18 16:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jakub at gcc dot gnu dot org  2007-12-18 16:39 -------
I think tree level does the right thing, TER fixes this up and expand_expr
is called with
return (int) (b * a)
Later on expand_expr is called with
 <mult_expr 0x2aaaae9032c0
    type <integer_type 0x2aaaae937840 long long int DI
        size <integer_cst 0x2aaaae927c60 constant invariant 64>
        unit size <integer_cst 0x2aaaae927c90 constant invariant 8>
        align 64 symtab 0 alias set 2 canonical type 0x2aaaae937840 precision
64 min <integer_cst 0x2aaaae927bd0 -9223372036854775808> max <integer_cst
0x2aaaae927c00 9223372036854775807>>

    arg 0 <parm_decl 0x2aaaae92d2d0 b type <integer_type 0x2aaaae937840 long
long int>
        used DI file pr17236.c line 2 col 30 size <integer_cst 0x2aaaae927c60
64> unit size <integer_cst 0x2aaaae927c90 8>
        align 64 context <function_decl 0x2aaaae9e8a90 test> initial
<integer_type 0x2aaaae937840 long long int>
        (reg/v:DI 60 [ b ]) arg-type <integer_type 0x2aaaae937840 long long
int>
        incoming-rtl (mem/c/i:DI (plus:SI (reg/f:SI 53 virtual-incoming-args)
        (const_int 8 [0x8])) [2 b+0 S8 A32])>
    arg 1 <parm_decl 0x2aaaae92d240 a type <integer_type 0x2aaaae937840 long
long int>
        used DI file pr17236.c line 2 col 17 size <integer_cst 0x2aaaae927c60
64> unit size <integer_cst 0x2aaaae927c90 8>
        align 64 context <function_decl 0x2aaaae9e8a90 test> initial
<integer_type 0x2aaaae937840 long long int>
        (reg/v:DI 59 [ a ]) arg-type <integer_type 0x2aaaae937840 long long
int>
        incoming-rtl (mem/c/i:DI (reg/f:SI 53 virtual-incoming-args) [2 a+0 S8
A32]) chain <parm_decl 0x2aaaae92d2d0 b>>>
and tmode SImode, still enough info to choose a better multiply.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2007-12-18 16:39 ` jakub at gcc dot gnu dot org
@ 2007-12-18 16:51 ` bonzini at gnu dot org
  2007-12-18 18:20 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-18 16:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from bonzini at gnu dot org  2007-12-18 16:50 -------
The bug with 64*64->32 multiplication is now PR34522.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2007-12-18 16:51 ` bonzini at gnu dot org
@ 2007-12-18 18:20 ` ubizjak at gmail dot com
  2007-12-18 18:33 ` ubizjak at gmail dot com
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-18 18:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from ubizjak at gmail dot com  2007-12-18 18:20 -------
(In reply to comment #7)

>         mull    %ebx
>         leal    (%ecx,%edx), %esi   ; what the heck, a simple addl could do!
>         movl    %esi, %edx

Something disturbs RA to emit two DImode moves:

(insn:HI 10 36 37 2 m.c:2 (parallel [
            (set (reg:DI 0 ax)
                (mult:DI (zero_extend:DI (reg:SI 0 ax))
                    (zero_extend:DI (reg:SI 3 bx [orig:64 a ] [64]))))
            (clobber (reg:CC 17 flags))
        ]) 304 {*umulsidi3_insn} (nil))

(insn 37 10 11 2 m.c:2 (set (reg:DI 3 bx [61])
        (reg:DI 0 ax)) 88 {*movdi_2} (nil))

(note:HI 11 37 12 2 NOTE_INSN_DELETED)

(insn:HI 12 11 18 2 m.c:2 (parallel [
            (set (reg:SI 4 si [+4 ])
                (plus:SI (reg:SI 2 cx [62])
                    (reg:SI 4 si [+4 ])))
            (clobber (reg:CC 17 flags))
        ]) 249 {*addsi_1} (nil))

(insn:HI 18 12 24 2 m.c:4 (set (reg/i:DI 0 ax [ <result> ])
        (reg:DI 3 bx [61])) 88 {*movdi_2} (nil))

(insn 24 18 33 2 m.c:4 (use (reg/i:DI 0 ax [ <result> ])) -1 (nil))

Note two moves [(insn 36) and (insn 37)] around (insn 12).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2007-12-18 18:20 ` ubizjak at gmail dot com
@ 2007-12-18 18:33 ` ubizjak at gmail dot com
  2007-12-19  9:49 ` bonzini at gnu dot org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-18 18:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from ubizjak at gmail dot com  2007-12-18 18:33 -------
(In reply to comment #15)

> Note two moves [(insn 36) and (insn 37)] around (insn 12).

Bah. This is the correct sequence, around (insn 10) that seems to be the root
of all problems:

(insn:HI 9 8 36 2 m.c:2 (parallel [
            (set (reg:SI 2 cx [62])
                (plus:SI (reg:SI 2 cx [62])
                    (reg:SI 0 ax [63])))
            (clobber (reg:CC 17 flags))
        ]) 249 {*addsi_1} (nil))

(insn 36 9 10 2 m.c:2 (set (reg:SI 0 ax)
        (reg:SI 4 si [orig:66 b ] [66])) 47 {*movsi_1} (nil))

(insn:HI 10 36 37 2 m.c:2 (parallel [
            (set (reg:DI 0 ax)
                (mult:DI (zero_extend:DI (reg:SI 0 ax))
                    (zero_extend:DI (reg:SI 3 bx [orig:64 a ] [64]))))
            (clobber (reg:CC 17 flags))
        ]) 304 {*umulsidi3_insn} (nil))

(insn 37 10 11 2 m.c:2 (set (reg:DI 3 bx [61])
        (reg:DI 0 ax)) 88 {*movdi_2} (nil))


DImode AX as found in (insn 10) could simply be propagated up and down RTL
stream as SImode destination of (insn 8) and SImode source of (insn 12). For
some reason, RA is afraid to allocate registers in mixed modes.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2007-12-18 18:33 ` ubizjak at gmail dot com
@ 2007-12-19  9:49 ` bonzini at gnu dot org
  2007-12-19 12:11 ` ubizjak at gmail dot com
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19  9:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from bonzini at gnu dot org  2007-12-19 09:49 -------
With this patch, GCC gets the preferences right, but it does not affect code
generation.

Index: regclass.c
===================================================================
--- regclass.c  (revision 130928)
+++ regclass.c  (working copy)
@@ -1651,9 +1651,15 @@ record_reg_classes (int n_alts, int n_op
                          [(unsigned char) reg_pref[REGNO (op)].prefclass]
                          [(int) classes[i]]);

-                 if (REGNO (ops[i]) != REGNO (ops[j])
-                     && ! find_reg_note (insn, REG_DEAD, op))
-                   alt_cost += 2;
+                 if (REGNO (ops[i]) != REGNO (ops[j]))
+                   {
+                     /* If the pseudo dies, tying it to the duplicate
+                        operand can be advantageous.  */
+                     if (find_reg_note (insn, REG_DEAD, op))
+                       pp->cost[classes[j]]--;
+                     else
+                       alt_cost += 2;
+                   }

                  /* This is in place of ordinary cost computation
                     for this operand, so skip to the end of the


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2007-12-19  9:49 ` bonzini at gnu dot org
@ 2007-12-19 12:11 ` ubizjak at gmail dot com
  2007-12-19 12:14 ` bonzini at gnu dot org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-19 12:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from ubizjak at gmail dot com  2007-12-19 12:11 -------
Another baby step can be performed by:

Index: optabs.c
===================================================================
--- optabs.c    (revision 131053)
+++ optabs.c    (working copy)
@@ -1245,7 +1245,7 @@ expand_doubleword_mult (enum machine_mod
        return NULL_RTX;
     }

-  adjust = expand_binop (word_mode, smul_optab, op0_high, op1_low,
+  adjust = expand_binop (word_mode, smul_optab, op1_low, op0_high,
                         NULL_RTX, 0, OPTAB_DIRECT);
   if (!adjust)
     return NULL_RTX;
@@ -1274,7 +1274,7 @@ expand_doubleword_mult (enum machine_mod
        return NULL_RTX;
     }

-  temp = expand_binop (word_mode, smul_optab, op1_high, op0_low,
+  temp = expand_binop (word_mode, smul_optab, op0_low, op1_high,
                       NULL_RTX, 0, OPTAB_DIRECT);
   if (!temp)
     return NULL_RTX;

This patch generates (note mem operands to imull):

test:
        subl    $8, %esp
        movl    %ebx, (%esp)
        movl    12(%esp), %ebx
        movl    %esi, 4(%esp)
        movl    20(%esp), %esi
        movl    %ebx, %ecx
        movl    %esi, %eax
        imull   24(%esp), %ecx
        imull   16(%esp), %eax
        addl    %eax, %ecx
        movl    %esi, %eax
        mull    %ebx
        movl    (%esp), %ebx
        leal    (%ecx,%edx), %esi
        movl    %esi, %edx
        movl    4(%esp), %esi
        addl    $8, %esp
        ret

but since mulsi3 is defined as:

(define_insn "*mulsi3_1"
  [(set (match_operand:SI 0 "register_operand" "=r,r,r")
        (mult:SI (match_operand:SI 1 "nonimmediate_operand" "%rm,rm,0")
                 (match_operand:SI 2 "general_operand" "K,i,mr")))
   (clobber (reg:CC FLAGS_REG))]

the question is, why didn't RA perform operand reversal by itself. Currently it
converts _both_ SI multiplies from:

(insn:HI 8 7 9 2 m.c:14 (parallel [
            (set (reg:SI 63)
                (mult:SI (mem/c/i:SI (plus:SI (reg/f:SI 16 argp)
                            (const_int 4 [0x4])) [2 a+4 S4 A32])
                    (reg:SI 66 [ b ])))
            (clobber (reg:CC 17 flags))
        ]) 214 {*mulsi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

into

--cut here--
(insn 35 7 8 2 m.c:14 (set (reg:SI 0 ax [63])
        (mem/c/i:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 16 [0x10])) [2 a+4 S4 A32])) 41 {*movsi_1} (nil))

(insn:HI 8 35 9 2 m.c:14 (parallel [
            (set (reg:SI 0 ax [63])
                (mult:SI (reg:SI 0 ax [63])
                    (reg:SI 4 si [orig:66 b ] [66])))
            (clobber (reg:CC 17 flags))
        ]) 214 {*mulsi3_1} (nil))
--cut here--


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2007-12-19 12:11 ` ubizjak at gmail dot com
@ 2007-12-19 12:14 ` bonzini at gnu dot org
  2007-12-19 12:32 ` bonzini at gnu dot org
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 12:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from bonzini at gnu dot org  2007-12-19 12:13 -------
Created an attachment (id=14792)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14792&action=view)
patch to almost fix the bug

With this patch:
1) local-alloc first tries to allocate registers that go into small classes
2) regclass tries to use the same class for an output and an input that dies
3) i386 modes_tiable_p allows tying SImode with DImode (but not vice versa)

I get now this:
        pushl   %esi
        pushl   %ebx
        movl    12(%esp), %ecx
        movl    20(%esp), %eax
        movl    24(%esp), %ebx
        imull   %ecx, %ebx
        movl    16(%esp), %esi
        imull   %eax, %esi
        addl    %esi, %ebx
        mull    %ecx
        leal    (%ebx,%edx), %edx
        popl    %ebx
        popl    %esi

which has no moves.  It still spills two registers because it is not able to
use %edx (I'll look into it later).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2007-12-19 12:14 ` bonzini at gnu dot org
@ 2007-12-19 12:32 ` bonzini at gnu dot org
  2007-12-19 12:43 ` bonzini at gnu dot org
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 12:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from bonzini at gnu dot org  2007-12-19 12:32 -------
There is a big catch-22.  When GCC ties one of regs 64/66 with reg 61, it
enlarges reg 61's live range to cover the live range of the tied range.  When
it does this, it basically locks up %edx for the whole live range of reg 61. 
:-(


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2007-12-19 12:32 ` bonzini at gnu dot org
@ 2007-12-19 12:43 ` bonzini at gnu dot org
  2007-12-19 13:12 ` ubizjak at gmail dot com
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 12:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from bonzini at gnu dot org  2007-12-19 12:43 -------
Created an attachment (id=14793)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14793&action=view)
two hunks only from the previous patch

Indeed, if I only use the regclass.c and local-alloc.c hunks, I get only one
spill!

        pushl   %ebx
        movl    8(%esp), %edx
        movl    16(%esp), %eax
        movl    20(%esp), %ecx
        imull   %edx, %ecx
        movl    12(%esp), %ebx
        imull   %eax, %ebx
        addl    %ebx, %ecx
        mull    %edx
        leal    (%ecx,%edx), %edx
        popl    %ebx

and no reg-reg move.

With Uros' patch it does use memory imull's, but also adds reg-reg moves.  The
next step should be to figure out why (with my patch and without his) reload
feels the need to put the memory operands from insns 7 and 8 into regs.


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #14792|0                           |1
        is obsolete|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2007-12-19 12:43 ` bonzini at gnu dot org
@ 2007-12-19 13:12 ` ubizjak at gmail dot com
  2007-12-19 13:36 ` bonzini at gnu dot org
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2007-12-19 13:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from ubizjak at gmail dot com  2007-12-19 13:11 -------
(In reply to comment #21)

> Indeed, if I only use the regclass.c and local-alloc.c hunks, I get only one
> spill!
> 
>         pushl   %ebx
>         movl    8(%esp), %edx
>         movl    16(%esp), %eax
>         movl    20(%esp), %ecx
>         imull   %edx, %ecx
>         movl    12(%esp), %ebx
>         imull   %eax, %ebx
>         addl    %ebx, %ecx
>         mull    %edx
>         leal    (%ecx,%edx), %edx
>         popl    %ebx
> 
> and no reg-reg move.

Excellent!

I think that we should fwprop stack arguments into the insns to further
optimize the code. According to pentopt.pdf, there is no penalty at all (at
least for P4+) when one of the insn operands is a memory reference. This would
relieve register pressure considerably.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2007-12-19 13:12 ` ubizjak at gmail dot com
@ 2007-12-19 13:36 ` bonzini at gnu dot org
  2007-12-19 13:49 ` steven at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 13:36 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #23 from bonzini at gnu dot org  2007-12-19 13:36 -------
Created an attachment (id=14794)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14794&action=view)
teach combine that reg = op(reg, mem) is better

Since combine operates on the whole pattern, it can be taught the trick that
swap_commutative_operands_with_target does in optabs.c.  This is the "right
way" to do the hack Uros proposed in comment #18.

Note that optabs.c does not get it because it creates reg=subreg*subreg.  Then
CSE creates SImode memory loads from the subreg, and combine creates
reg=mem*reg.  With this patch, combine creates reg=reg*mem instead.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2007-12-19 13:36 ` bonzini at gnu dot org
@ 2007-12-19 13:49 ` steven at gcc dot gnu dot org
  2007-12-19 13:50 ` bonzini at gnu dot org
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-12-19 13:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from steven at gcc dot gnu dot org  2007-12-19 13:48 -------
The patch in comment #23 might even be suitable for GCC 4.3 ...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2007-12-19 13:49 ` steven at gcc dot gnu dot org
@ 2007-12-19 13:50 ` bonzini at gnu dot org
  2007-12-19 13:54 ` bonzini at gnu dot org
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 13:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from bonzini at gnu dot org  2007-12-19 13:50 -------
Note that "fwprop" is not an exact term, because there *is* a memory load in
each multiplication, and propagating a second memory operand will create an
invalid insn.  You may try to add a split from reg=op(mem1, mem2) to
reg=mem1;reg=reg*mem2, but anyway, it won't help propagating into the mull,
since it is not LOG_LINKS-related to the load.

The problem is that we end up with having to reload *anyway*.  If we had a way
to express the relationship between operands 0 and 1 of the multiplication
*before reload*, probably the RTL optimizations could do much more.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2007-12-19 13:50 ` bonzini at gnu dot org
@ 2007-12-19 13:54 ` bonzini at gnu dot org
  2007-12-20 13:53 ` bonzini at gnu dot org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-19 13:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from bonzini at gnu dot org  2007-12-19 13:53 -------
I'm starting a SPEC run on the overall patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2007-12-19 13:54 ` bonzini at gnu dot org
@ 2007-12-20 13:53 ` bonzini at gnu dot org
  2007-12-20 14:16 ` bonzini at gnu dot org
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-20 13:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from bonzini at gnu dot org  2007-12-20 13:53 -------
I screwed up so I have to rerun most of SPECfp2000, but the results seem a
wash.  Anybody can fire the patch I'll attach soon on a wide range of machines?


-- 
Bug 17236 depends on bug 6585, which changed state.

Bug 6585 Summary: Redundant store/load instruction pairs on ix86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|RESOLVED                    |ASSIGNED
         Resolution|DUPLICATE                   |

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2007-12-20 13:53 ` bonzini at gnu dot org
@ 2007-12-20 14:16 ` bonzini at gnu dot org
  2008-03-07  8:27 ` bonzini at gnu dot org
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2007-12-20 14:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from bonzini at gnu dot org  2007-12-20 14:15 -------
Created an attachment (id=14800)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14800&action=view)
combined patch


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #14793|0                           |1
        is obsolete|                            |
  Attachment #14794|0                           |1
        is obsolete|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2007-12-20 14:16 ` bonzini at gnu dot org
@ 2008-03-07  8:27 ` bonzini at gnu dot org
  2008-03-07  8:42 ` bonzini at gnu dot org
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2008-03-07  8:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from bonzini at gnu dot org  2008-03-07 08:26 -------
ira branch produces the same code as with my patch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2008-03-07  8:27 ` bonzini at gnu dot org
@ 2008-03-07  8:42 ` bonzini at gnu dot org
  2008-09-06 16:12 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 33+ messages in thread
From: bonzini at gnu dot org @ 2008-03-07  8:42 UTC (permalink / raw)
  To: gcc-bugs



-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |SUSPENDED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2008-03-07  8:42 ` bonzini at gnu dot org
@ 2008-09-06 16:12 ` ubizjak at gmail dot com
  2008-09-06 16:19 ` ubizjak at gmail dot com
  2008-09-06 16:20 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 16:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from ubizjak at gmail dot com  2008-09-06 16:11 -------
Current mainline (4.4.0 20080906) produces:

        pushl   %ebx
        movl    8(%ebp), %eax
        movl    16(%ebp), %edx
        movl    20(%ebp), %ecx
        movl    12(%ebp), %ebx
        imull   %eax, %ecx
        imull   %edx, %ebx
        mull    %edx
        addl    %ebx, %ecx
        popl    %ebx
        leal    (%ecx,%edx), %edx

So, fixed.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|SUSPENDED                   |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2008-09-06 16:12 ` ubizjak at gmail dot com
@ 2008-09-06 16:19 ` ubizjak at gmail dot com
  2008-09-06 16:20 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 16:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from ubizjak at gmail dot com  2008-09-06 16:18 -------
*** Bug 6585 has been marked as a duplicate of this bug. ***


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|6585                        |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
       [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2008-09-06 16:19 ` ubizjak at gmail dot com
@ 2008-09-06 16:20 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 33+ messages in thread
From: ubizjak at gmail dot com @ 2008-09-06 16:20 UTC (permalink / raw)
  To: gcc-bugs



-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] New: inefficient code for long long multiply on x86
@ 2004-08-30  5:50 dann at godzilla dot ics dot uci dot edu
  2004-10-06 23:27 ` [Bug rtl-optimization/17236] " giovannibajo at libero dot it
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2004-08-30  5:50 UTC (permalink / raw)
  To: gcc-bugs

Compiling:
unsigned long long LLM(unsigned long long arg1, unsigned long long arg2)
{
  return arg1 *  arg2;
}

with gcc -O2 -fomit-frame-pointer generates:

        pushl   %ebp
        pushl   %edi
        pushl   %ebx
        pushl   %edi
        pushl   %edi
        movl    32(%esp), %ecx
        movl    24(%esp), %eax
        movl    36(%esp), %ebx
        movl    24(%esp), %edi
        mull    %ecx
        imull   28(%esp), %ecx
        imull   %ebx, %edi
        movl    %edx, %ebp
        movl    %eax, (%esp)
        addl    %edi, %ebp
        movl    (%esp), %eax
        leal    (%ebp,%ecx), %ecx
        movl    %ecx, 4(%esp)
        movl    4(%esp), %edx
        popl    %ecx
        popl    %ebx
        popl    %ebx
        popl    %edi
        popl    %ebp
        ret

ICC -O2 generates much smaller code:

        movl      8(%esp), %ecx                                 #516.18
        imull     12(%esp), %ecx                                #516.18
        movl      16(%esp), %eax                                #516.18
        imull     4(%esp), %eax                                 #516.18
        addl      %eax, %ecx                                    #516.18
        movl      4(%esp), %eax                                 #516.18
        mull      12(%esp)                                      #516.18
        addl      %ecx, %edx                                    #516.18
        ret                                                     #516.18

-- 
           Summary: inefficient code for long long multiply on x86
           Product: gcc
           Version: 3.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
  2004-08-30  5:50 [Bug rtl-optimization/17236] New: " dann at godzilla dot ics dot uci dot edu
@ 2004-10-06 23:27 ` giovannibajo at libero dot it
  2005-03-14 18:24 ` cvs-commit at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: giovannibajo at libero dot it @ 2004-10-06 23:27 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2004-10-06 23:27 -------
Confirmed. The code is know a little bit better:

LLM:
     pushl      %ebp
     pushl      %edi
     pushl      %ebx
     subl       $8, %esp
     movl       32(%esp), %ecx
     movl       36(%esp), %ebx
     movl       24(%esp), %eax
     mull       %ecx
     movl       %eax, (%esp)
     movl       24(%esp), %edi
     imull      %ebx, %edi
     movl       %edx, %ebp
     addl       %edi, %ebp
     imull      28(%esp), %ecx
     leal       (%ebp,%ecx), %ecx
     movl       %ecx, 4(%esp)
     movl       (%esp), %eax
     movl       4(%esp), %edx
     addl       $8, %esp
     popl       %ebx
     popl       %edi
     popl       %ebp
     ret
     .size      LLM, .-LLM
     .ident     "GCC: (GNU) 4.0.0 20041003 (experimental)"




On 2.95, we used to emit:

LLM:
     pushl %ebp
     pushl %edi
     pushl %esi
     pushl %ebx
     movl 20(%esp),%ecx
     movl 24(%esp),%ebx
     movl 28(%esp),%eax
     movl 32(%esp),%esi
     mull %ecx
     movl %eax,%edi
     movl 28(%esp),%eax
     imull %ecx,%esi
     imull %ebx,%eax
     movl %edx,%ebp
     addl %esi,%ebp
     addl %eax,%ebp
     movl %edi,%eax
     movl %ebp,%edx
     popl %ebx
     popl %esi
     popl %edi
     popl %ebp
     ret
.Lfe1:
     .size       LLM,.Lfe1-LLM
     .ident     "GCC: (GNU) 2.95.3 20010315 (release)"



Not a regression, and ICC does so much better.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
      Known to fail|                            |2.95.3 3.0.4 3.3.3 3.4.1
                   |                            |4.0.0
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-06 23:27:47
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
  2004-08-30  5:50 [Bug rtl-optimization/17236] New: " dann at godzilla dot ics dot uci dot edu
  2004-10-06 23:27 ` [Bug rtl-optimization/17236] " giovannibajo at libero dot it
@ 2005-03-14 18:24 ` cvs-commit at gcc dot gnu dot org
  2005-03-15 10:05 ` giovannibajo at libero dot it
  2005-03-15 10:07 ` giovannibajo at libero dot it
  3 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-03-14 18:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-03-14 18:24 -------
Subject: Bug 17236

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	sayle@gcc.gnu.org	2005-03-14 18:24:15

Modified files:
	gcc            : ChangeLog optabs.c 

Log message:
	PR rtl-optimization/17236
	* optabs.c (expand_doubleword_mult): New helper function split out
	from expand_binop.  Permute the order in which instructions are
	emitted to minimize the number of simultaneously live registers.
	(expand_binop): Call expand_doubleword_mult to synthesize a double
	word multiplication.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7846&r2=2.7847
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&r1=1.260&r2=1.261



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
  2004-08-30  5:50 [Bug rtl-optimization/17236] New: " dann at godzilla dot ics dot uci dot edu
  2004-10-06 23:27 ` [Bug rtl-optimization/17236] " giovannibajo at libero dot it
  2005-03-14 18:24 ` cvs-commit at gcc dot gnu dot org
@ 2005-03-15 10:05 ` giovannibajo at libero dot it
  2005-03-15 10:07 ` giovannibajo at libero dot it
  3 siblings, 0 replies; 33+ messages in thread
From: giovannibajo at libero dot it @ 2005-03-15 10:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-03-15 10:04 -------
Roger explains what else needs to be done here:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01386.html

Right now, after his patch, mainline generates this code:

        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    16(%esp), %eax
        movl    20(%esp), %edx
        movl    24(%esp), %ecx
        movl    28(%esp), %ebx
        movl    %edx, %esi
        imull   %ecx, %esi
        movl    %ebx, %edi
        imull   %eax, %edi
        addl    %edi, %esi
        mull    %ecx
        leal    (%esi,%edx), %edx
        popl    %ebx
        popl    %esi
        popl    %edi
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug rtl-optimization/17236] inefficient code for long long multiply on x86
  2004-08-30  5:50 [Bug rtl-optimization/17236] New: " dann at godzilla dot ics dot uci dot edu
                   ` (2 preceding siblings ...)
  2005-03-15 10:05 ` giovannibajo at libero dot it
@ 2005-03-15 10:07 ` giovannibajo at libero dot it
  3 siblings, 0 replies; 33+ messages in thread
From: giovannibajo at libero dot it @ 2005-03-15 10:07 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2005-03-15 10:07 -------
Uros did some additional comments:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01427.html

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |19398


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17236


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2008-09-06 16:20 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-17236-1008@http.gcc.gnu.org/bugzilla/>
2007-02-02  0:17 ` [Bug rtl-optimization/17236] inefficient code for long long multiply on x86 roger at eyesopen dot com
2007-11-23 20:48 ` steven at gcc dot gnu dot org
2007-12-18  7:43 ` bonzini at gnu dot org
2007-12-18  7:53 ` bonzini at gnu dot org
2007-12-18  8:05 ` bonzini at gnu dot org
2007-12-18  8:10 ` ebotcazou at gcc dot gnu dot org
2007-12-18 13:48 ` ubizjak at gmail dot com
2007-12-18 16:02 ` bonzini at gnu dot org
2007-12-18 16:39 ` jakub at gcc dot gnu dot org
2007-12-18 16:51 ` bonzini at gnu dot org
2007-12-18 18:20 ` ubizjak at gmail dot com
2007-12-18 18:33 ` ubizjak at gmail dot com
2007-12-19  9:49 ` bonzini at gnu dot org
2007-12-19 12:11 ` ubizjak at gmail dot com
2007-12-19 12:14 ` bonzini at gnu dot org
2007-12-19 12:32 ` bonzini at gnu dot org
2007-12-19 12:43 ` bonzini at gnu dot org
2007-12-19 13:12 ` ubizjak at gmail dot com
2007-12-19 13:36 ` bonzini at gnu dot org
2007-12-19 13:49 ` steven at gcc dot gnu dot org
2007-12-19 13:50 ` bonzini at gnu dot org
2007-12-19 13:54 ` bonzini at gnu dot org
2007-12-20 13:53 ` bonzini at gnu dot org
2007-12-20 14:16 ` bonzini at gnu dot org
2008-03-07  8:27 ` bonzini at gnu dot org
2008-03-07  8:42 ` bonzini at gnu dot org
2008-09-06 16:12 ` ubizjak at gmail dot com
2008-09-06 16:19 ` ubizjak at gmail dot com
2008-09-06 16:20 ` ubizjak at gmail dot com
2004-08-30  5:50 [Bug rtl-optimization/17236] New: " dann at godzilla dot ics dot uci dot edu
2004-10-06 23:27 ` [Bug rtl-optimization/17236] " giovannibajo at libero dot it
2005-03-14 18:24 ` cvs-commit at gcc dot gnu dot org
2005-03-15 10:05 ` giovannibajo at libero dot it
2005-03-15 10:07 ` giovannibajo at libero dot it

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).