public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
@ 2011-09-20 20:54 ` jules at gcc dot gnu.org
  2013-05-29  9:55 ` ktkachov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: jules at gcc dot gnu.org @ 2011-09-20 20:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

jules at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
                 CC|                            |jules at gcc dot gnu.org
         Resolution|FIXED                       |

--- Comment #9 from jules at gcc dot gnu.org 2011-09-20 19:03:43 UTC ---
This appears to have regressed on mainline. I now get the following assembly
output for the test case added by Maxim:

longfunc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        stmfd   sp!, {r4, r5}
        umull   r4, r5, r0, r2
        mul     r3, r0, r3
        mla     r1, r2, r1, r3
        mov     r0, r4
        add     r1, r1, r5
        ldmfd   sp!, {r4, r5}
        bx      lr


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
  2011-09-20 20:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness jules at gcc dot gnu.org
@ 2013-05-29  9:55 ` ktkachov at gcc dot gnu.org
  2014-02-14  7:44 ` bernd.edlinger at hotmail dot de
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2013-05-29  9:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
                 CC|                            |ktkachov at gcc dot gnu.org
         Resolution|---                         |FIXED

--- Comment #10 from ktkachov at gcc dot gnu.org ---
(In reply to jules from comment #9)
> This appears to have regressed on mainline. I now get the following assembly
> output for the test case added by Maxim:
> 
> longfunc:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         stmfd   sp!, {r4, r5}
>         umull   r4, r5, r0, r2
>         mul     r3, r0, r3
>         mla     r1, r2, r1, r3
>         mov     r0, r4
>         add     r1, r1, r5
>         ldmfd   sp!, {r4, r5}
>         bx      lr

Current trunk (r199375) gives, I think this can be closed.

longfunc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        mul     r3, r0, r3
        mla     r3, r2, r1, r3
        umull   r0, r1, r0, r2
        add     r1, r3, r1
        bx      lr


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
  2011-09-20 20:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness jules at gcc dot gnu.org
  2013-05-29  9:55 ` ktkachov at gcc dot gnu.org
@ 2014-02-14  7:44 ` bernd.edlinger at hotmail dot de
  2014-02-14  7:47 ` bernd.edlinger at hotmail dot de
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: bernd.edlinger at hotmail dot de @ 2014-02-14  7:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

Bernd Edlinger <bernd.edlinger at hotmail dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bernd.edlinger at hotmail dot de

--- Comment #11 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
The test case fails on current trunk:

longfunc:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    mul    r3, r0, r3
    push    {r4, r5}
    umull    r4, r5, r0, r2
    mla    r1, r2, r1, r3
    mov    r0, r4
    add    r5, r5, r1
    mov    r1, r5
    pop    {r4, r5}
    bx    lr
    .size    longfunc, .-longfunc
    .ident    "GCC: (GNU) 4.9.0 20140209 (experimental)"


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2014-02-14  7:44 ` bernd.edlinger at hotmail dot de
@ 2014-02-14  7:47 ` bernd.edlinger at hotmail dot de
  2014-11-17 16:23 ` ktkachov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: bernd.edlinger at hotmail dot de @ 2014-02-14  7:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

--- Comment #12 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
$ gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/ed/gnu/arm-linux-gnueabihf/libexec/gcc/armv7l-unknown-linux-gnueabihf/4.9.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-4.9-20140209/configure
--prefix=/home/ed/gnu/arm-linux-gnueabihf
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go --with-arch=armv7-a
--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard
Thread model: posix
gcc version 4.9.0 20140209 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2014-02-14  7:47 ` bernd.edlinger at hotmail dot de
@ 2014-11-17 16:23 ` ktkachov at gcc dot gnu.org
  2015-02-12 14:40 ` ktkachov at gcc dot gnu.org
  2015-03-26 16:14 ` ktkachov at gcc dot gnu.org
  6 siblings, 0 replies; 14+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-11-17 16:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #13 from ktkachov at gcc dot gnu.org ---
So I see this regression still, but only for some -mcpu options.
For example for -mcpu=cortex-a15 we get:
        mul     r3, r0, r3
        strd    r4, [sp, #-8]!
        umull   r4, r5, r0, r2
        mla     r1, r2, r1, r3
        mov     r0, r4
        add     r5, r1, r5
        mov     r1, r5
        ldrd    r4, [sp]
        add     sp, sp, #8

whereas for cortex-a7 we get:
        mul     r3, r0, r3
        mla     r3, r2, r1, r3
        umull   r0, r1, r0, r2
        add     r1, r3, r1


I think the problem here is reload.
If I look at the the dump of postreload, for the 'bad' RTL I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r4(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r2(SI) * r1(SI) + r3(SI)
r5(SI) := r1(SI) + r5(SI)
r0(DI) := r4(DI)

whereas for the good one I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r3(SI) := r2(SI) * r1(SI) + r3(SI)
r0(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r3(SI) + r1(SI)
r0(DI) := r0(DI)

In the good one the final insn is eliminated due to being dead, whereas the in
the bad one the final DImode move is split into two moves.

Sched1 changed the order of the mult and mult-accumulate but it's the register
allocator that causes the bad codegen


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2014-11-17 16:23 ` ktkachov at gcc dot gnu.org
@ 2015-02-12 14:40 ` ktkachov at gcc dot gnu.org
  2015-03-26 16:14 ` ktkachov at gcc dot gnu.org
  6 siblings, 0 replies; 14+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2015-02-12 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com

--- Comment #14 from ktkachov at gcc dot gnu.org ---
Vlad, do you have any insight on this? The difference in scheduling is only the
order between a mult and an add but the register allocation looks like the
underlying cause.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
       [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2015-02-12 14:40 ` ktkachov at gcc dot gnu.org
@ 2015-03-26 16:14 ` ktkachov at gcc dot gnu.org
  6 siblings, 0 replies; 14+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2015-03-26 16:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|4.2.1                       |5.0

--- Comment #15 from ktkachov at gcc dot gnu.org ---
Updating version as this still affects trunk


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
                   ` (5 preceding siblings ...)
  2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
@ 2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
  6 siblings, 0 replies; 14+ messages in thread
From: mkuvyrkov at gcc dot gnu dot org @ 2010-08-18 10:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from mkuvyrkov at gcc dot gnu dot org  2010-08-18 10:43 -------
Bernd did all the heavy lifting for this patch.  The above patch fixes the last
piece of the problem -- extra move when compiling for ARMv7-A.


-- 

mkuvyrkov at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
                   ` (4 preceding siblings ...)
  2010-07-29 12:40 ` bernds at gcc dot gnu dot org
@ 2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
  2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
  6 siblings, 0 replies; 14+ messages in thread
From: mkuvyrkov at gcc dot gnu dot org @ 2010-08-18 10:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from mkuvyrkov at gcc dot gnu dot org  2010-08-18 10:34 -------
Subject: Bug 42575

Author: mkuvyrkov
Date: Wed Aug 18 10:34:02 2010
New Revision: 163334

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163334
Log:
        gcc/
        PR rtl-optimization/42575
        * optabs.c (expand_doubleword_mult): Generate new pseudos to shorten
        live ranges.

        gcc/testsuite/
        PR rtl-optimization/42575
        * gcc.target/pr42575.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/arm/pr42575.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/optabs.c
    trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
                   ` (3 preceding siblings ...)
  2010-02-22 21:06 ` drow at gcc dot gnu dot org
@ 2010-07-29 12:40 ` bernds at gcc dot gnu dot org
  2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
  2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
  6 siblings, 0 replies; 14+ messages in thread
From: bernds at gcc dot gnu dot org @ 2010-07-29 12:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from bernds at gcc dot gnu dot org  2010-07-29 12:40 -------
Subject: Bug 42575

Author: bernds
Date: Thu Jul 29 12:39:57 2010
New Revision: 162678

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162678
Log:
        PR rtl-optimization/42575
        * dce.c (word_dce_process_block): Renamed from byte_dce_process_block.
        Argument AU removed.  All callers changed.  Ignore artificial refs.
        Use return value of df_word_lr_simulate_defs to decide whether an insn
        is necessary.
        (fast_dce): Rename arg to WORD_LEVEL.
        (run_word_dce): Renamed from rest_of_handle_fast_byte_dce.  No longer
        static.
        (pass_fast_rtl_byte_dce): Delete.
        * dce.h (run_word_dce): Declare.
        * df-core.c (df_print_word_regset): Renamed from df_print_byteregset.
        All callers changed.  Simplify code to only deal with two-word regs.
        * df.h (DF_WORD_LR): Renamed from DF_BYTE_LR.
        (DF_WORD_LR_BB_INFO): Renamed from DF_BYTE_LR_BB_INFO.
        (DF_WORD_LR_IN): Renamed from DF_BYTE_LR_IN.
        (DF_WORD_LR_OUT): Renamed from DF_BYTE_LR_OUT.
        (struct df_word_lr_bb_info): Renamed from df_byte_lr_bb_info.
        (df_word_lr_mark_ref): Declare.
        (df_word_lr_add_problem, df_word_lr_mark_ref, df_word_lr_simulate_defs,
        df_word_lr_simulate_uses): Declare or rename from byte variants.
        (df_byte_lr_simulate_artificial_refs_at_top,
        df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
        df_byte_lr_get_regno_len, df_compute_accessed_bytes): Delete
        declarations.
        (df_word_lr_get_bb_info): Rename from df_byte_lr_get_bb_info.
        (enum df_mm): Delete.
        * df-byte-scan.c: Delete file.
        * df-problems.c (df_word_lr_problem_data): Renamed from
        df_byte_lr_problem_data, all members deleted except for
        WORD_LR_BITMAPS, which is renamed from BYTE_LR_BITMAPS.  Uses changed.
        (df_word_lr_expand_bitmap, df_byte_lr_simulate_artificial_refs_at_top,
        df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
        df_byte_lr_get_regno_len, df_byte_lr_check_regs,
        df_byte_lr_confluence_0): Delete functions.
        (df_word_lr_free_bb_info): Renamed from df_byte_lr_free_bb_info; all
        callers changed.
        (df_word_lr_alloc): Renamed from df_byte_lr_alloc; all callers changed.
        Don't initialize members that were deleted, don't try to discover data
        about registers.  Ignore hard regs.
        (df_word_lr_reset): Renamed from df_byte_lr_reset; all callers changed.
        (df_word_lr_mark_ref): New function.
        (df_word_lr_bb_local_compute): Renamed from
        df_byte_bb_lr_local_compute; all callers changed.  Use
        df_word_lr_mark_ref.  Assert that artificial refs don't include
        pseudos.  Ignore hard registers.
        (df_word_lr_local_compute): Renamed from df_byte_lr_local_compute.
        Assert that exit block uses don't contain pseudos.
        (df_word_lr_init): Renamed from df_byte_lr_init; all callers changed.
        (df_word_lr_confluence_n): Renamed from df_byte_lr_confluence_n; all
        callers changed.  Ignore hard regs.
        (df_word_lr_transfer_function): Renamed from
        df_byte_lr_transfer_function; all callers changed.
        (df_word_lr_free): Renamed from df_byte_lr_free; all callers changed.
        (df_word_lr_top_dump): Renamed from df_byte_lr_top_dump; all callers
        changed.
        (df_word_lr_bottom_dump): Renamed from df_byte_lr_bottom_dump; all
        callers changed.
        (problem_WORD_LR): Renamed from problem_BYTE_LR; uses changed;
        confluence operator 0 set to NULL.
        (df_word_lr_add_problem): Renamed from df_byte_lr_add_problem; all
        callers changed.
        (df_word_lr_simulate_defs): Renamed from df_byte_lr_simulate_defs.
        Return bool, true if bitmap changed or insn otherwise necessary.
        All callers changed.  Simplify using df_word_lr_mark_ref.
        (df_word_lr_simulate_uses): Renamed from df_byte_lr_simulate_uses;
        all callers changed.  Simplify using df_word_lr_mark_ref.
        * lower-subreg.c: Include "dce.h"
        (decompose_multiword_subregs): Call run_word_dce if df available.
        * Makefile.in (lower-subreg.o): Adjust dependencies.
        (df-byte-scan.o): Delete.
        * timevar.def (TV_DF_WORD_LR): Renamed from TV_DF_BYTE_LR.

Removed:
    trunk/gcc/df-byte-scan.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/dce.c
    trunk/gcc/dce.h
    trunk/gcc/df-core.c
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/lower-subreg.c
    trunk/gcc/timevar.def


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
                   ` (2 preceding siblings ...)
  2010-02-08 10:52 ` steven at gcc dot gnu dot org
@ 2010-02-22 21:06 ` drow at gcc dot gnu dot org
  2010-07-29 12:40 ` bernds at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: drow at gcc dot gnu dot org @ 2010-02-22 21:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from drow at gcc dot gnu dot org  2010-02-22 21:06 -------
(In reply to comment #3)
> * What is the purpose of insn 12 here?  It looks to me like this is dead code,
> since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
> wrong).

I couldn't figure this out either.  Where did it come from - was it so late
that we never DCE'd it, or does something bizarre claim to be dependent on the
value?

> Note how the sched1 pass has switched the two insns around. The register
> allocator now decides to use two new registers here, because r0 and r3 are both
> live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
> available in insn 10 -- but this is too late.
> 
> Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
> and 10, when sched2 wants to swap them back again?

I'm guessing, but presumably we want to separate the mul from the mla because
they're dependent; the umull isn't.  But I don't know what would swap them back
again and that's probably the crux.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
  2010-01-04 10:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc " ramana at gcc dot gnu dot org
  2010-02-08 10:47 ` steven at gcc dot gnu dot org
@ 2010-02-08 10:52 ` steven at gcc dot gnu dot org
  2010-02-22 21:06 ` drow at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-08 10:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from steven at gcc dot gnu dot org  2010-02-08 10:51 -------
Add an ARM guy to the CC:


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ramana at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
  2010-01-04 10:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc " ramana at gcc dot gnu dot org
@ 2010-02-08 10:47 ` steven at gcc dot gnu dot org
  2010-02-08 10:52 ` steven at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-08 10:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from steven at gcc dot gnu dot org  2010-02-08 10:47 -------
Trunk today produces this (with -dAP hacked to print slim RTL):

        .file   "t.c"
        .text
        .align  2
        .global longfunc
        .type   longfunc, %function
longfunc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        @ basic block 2
@    8 ip:SI=r2:SI*r1:SI
@      REG_DEAD: r1:SI
        mul     ip, r2, r1      @ 8     *arm_mulsi3/2   [length = 4]
@   35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;}
@      REG_DEAD: r5:SI
@      REG_DEAD: r4:SI
@      REG_FRAME_RELATED_EXPR: sequence
        stmfd   sp!, {r4, r5}   @ 35    *push_multi     [length = 4]
@    9 r1:SI=r0:SI*r3:SI+ip:SI
@      REG_DEAD: ip:SI
@      REG_DEAD: r3:SI
@      REG_DEAD: r0:SI
        mla     r1, r0, r3, ip  @ 9     *mulsi3addsi/2  [length = 4]
@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@      REG_DEAD: r2:SI
        umull   r4, r5, r2, r0  @ 10    *umulsidi3_nov6 [length = 4]
@   11 r1:SI=r1:SI+r5:SI
@      REG_DEAD: r5:SI
        add     r1, r1, r5      @ 11    *arm_addsi3/1   [length = 4]
@   12 r5:SI=r1:SI
        mov     r5, r1  @ 12    *arm_movsi_insn/1       [length = 4]
@   31 r0:SI=r4:SI
        mov     r0, r4  @ 31    *arm_movsi_insn/1       [length = 4]
@   38 unspec/v{return;}
        ldmfd   sp!, {r4, r5}
        bx      lr
        .size   longfunc, .-longfunc
        .ident  "GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision
156595]"

Questions for those who know ARM:

* What is the purpose of insn 12 here?  It looks to me like this is dead code,
since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
wrong).


* After combine we have these two insns:

    9 r138:SI=r142:SI*r3:SI+r139:SI
      REG_DEAD: r3:SI
      REG_DEAD: r139:SI
   10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI)
      REG_DEAD: r144:SI
      REG_DEAD: r142:SI

which translate to the mla insn and to the umull insn that uses r4 and r5:

@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@      REG_DEAD: r2:SI
        umull   r4, r5, r2, r0  @ 10    *umulsidi3_nov6 [length = 4]
@    9 r1:SI=r0:SI*r3:SI+ip:SI
@      REG_DEAD: ip:SI
@      REG_DEAD: r3:SI
@      REG_DEAD: r0:SI
        mla     r1, r0, r3, ip  @ 9     *mulsi3addsi/2  [length = 4]

Note how the sched1 pass has switched the two insns around. The register
allocator now decides to use two new registers here, because r0 and r3 are both
live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
available in insn 10 -- but this is too late.

Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
and 10, when sched2 wants to swap them back again?

(Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in
insn 10, because the sched2 pass fails to update the notes when it moves insn 9
before insn 10. But that's a separate issue...)


* If I compile with -fno-schedule-insns, I still don't get the optimal code:

        mul     ip, r2, r1
        str     r4, [sp, #-4]!
        mla     r1, r0, r3, ip
        umull   r3, r4, r2, r0
        add     r1, r1, r4
        mov     r4, r1
        mov     r0, r3
        ldmfd   sp!, {r4}
        bx      lr

This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that
is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where
r3 comes before r2. That's another thing for a target maintainer to look into.
If IRA would select r2:DI, you would also lose the save/restore of r4 and get
the perfect code of comment #2.


So two issues:
1. Why does the sched1 pass schedule insn 10 before insn 9?
2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
  2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
@ 2010-01-04 10:54 ` ramana at gcc dot gnu dot org
  2010-02-08 10:47 ` steven at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-01-04 10:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from ramana at gcc dot gnu dot org  2010-01-04 10:54 -------
Confirmed with trunk I get 

longfunc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        mul     r1, r2, r1
        mla     r1, r0, r3, r1
        stmfd   sp!, {r4, r5}
        umull   r4, r5, r2, r0
        add     r1, r1, r5
        mov     r0, r4
        mov     r5, r1
        ldmfd   sp!, {r4, r5}
        bx      lr

r4 and r5 need not be used here  - you could do with just r2 and r3 instead of
r4 and r5 here 

i.e.
        mul     r1, r2, r1
        mla     r1, r0, r3, r1
        umull   r2, r3, r2, r0
        add     r1, r1, r3
        mov     r0, r2
        bx      lr


-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|target                      |rtl-optimization
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization, ra
   Last reconfirmed|0000-00-00 00:00:00         |2010-01-04 10:54:28
               date|                            |
            Summary|arm-eabi-gcc 4.2.1 64-bit   |arm-eabi-gcc 64-bit multiply
                   |multiply weirdness          |weirdness


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-26 15:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-42575-4@http.gcc.gnu.org/bugzilla/>
2011-09-20 20:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness jules at gcc dot gnu.org
2013-05-29  9:55 ` ktkachov at gcc dot gnu.org
2014-02-14  7:44 ` bernd.edlinger at hotmail dot de
2014-02-14  7:47 ` bernd.edlinger at hotmail dot de
2014-11-17 16:23 ` ktkachov at gcc dot gnu.org
2015-02-12 14:40 ` ktkachov at gcc dot gnu.org
2015-03-26 16:14 ` ktkachov at gcc dot gnu.org
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 " sliao at google dot com
2010-01-04 10:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc " ramana at gcc dot gnu dot org
2010-02-08 10:47 ` steven at gcc dot gnu dot org
2010-02-08 10:52 ` steven at gcc dot gnu dot org
2010-02-22 21:06 ` drow at gcc dot gnu dot org
2010-07-29 12:40 ` bernds at gcc dot gnu dot org
2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).