From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-307922-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 1486 invoked by alias); 8 Feb 2010 10:47:21 -0000
Received: (qmail 1346 invoked by uid 48); 8 Feb 2010 10:47:10 -0000
Date: Mon, 08 Feb 2010 10:47:00 -0000
Message-ID: <20100208104710.1345.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-42575-17572@http.gcc.gnu.org/bugzilla/>
Subject: [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
In-Reply-To: <bug-42575-17572@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2010-02/txt/msg00674.txt.bz2


------- Comment #3 from steven at gcc dot gnu dot org  2010-02-08 10:47 -------
Trunk today produces this (with -dAP hacked to print slim RTL):

        .file   "t.c"
        .text
        .align  2
        .global longfunc
        .type   longfunc, %function
longfunc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        @ basic block 2
@    8 ip:SI=r2:SI*r1:SI
@      REG_DEAD: r1:SI
        mul     ip, r2, r1      @ 8     *arm_mulsi3/2   [length = 4]
@   35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;}
@      REG_DEAD: r5:SI
@      REG_DEAD: r4:SI
@      REG_FRAME_RELATED_EXPR: sequence
        stmfd   sp!, {r4, r5}   @ 35    *push_multi     [length = 4]
@    9 r1:SI=r0:SI*r3:SI+ip:SI
@      REG_DEAD: ip:SI
@      REG_DEAD: r3:SI
@      REG_DEAD: r0:SI
        mla     r1, r0, r3, ip  @ 9     *mulsi3addsi/2  [length = 4]
@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@      REG_DEAD: r2:SI
        umull   r4, r5, r2, r0  @ 10    *umulsidi3_nov6 [length = 4]
@   11 r1:SI=r1:SI+r5:SI
@      REG_DEAD: r5:SI
        add     r1, r1, r5      @ 11    *arm_addsi3/1   [length = 4]
@   12 r5:SI=r1:SI
        mov     r5, r1  @ 12    *arm_movsi_insn/1       [length = 4]
@   31 r0:SI=r4:SI
        mov     r0, r4  @ 31    *arm_movsi_insn/1       [length = 4]
@   38 unspec/v{return;}
        ldmfd   sp!, {r4, r5}
        bx      lr
        .size   longfunc, .-longfunc
        .ident  "GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision
156595]"

Questions for those who know ARM:

* What is the purpose of insn 12 here?  It looks to me like this is dead code,
since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
wrong).


* After combine we have these two insns:

    9 r138:SI=r142:SI*r3:SI+r139:SI
      REG_DEAD: r3:SI
      REG_DEAD: r139:SI
   10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI)
      REG_DEAD: r144:SI
      REG_DEAD: r142:SI

which translate to the mla insn and to the umull insn that uses r4 and r5:

@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@      REG_DEAD: r2:SI
        umull   r4, r5, r2, r0  @ 10    *umulsidi3_nov6 [length = 4]
@    9 r1:SI=r0:SI*r3:SI+ip:SI
@      REG_DEAD: ip:SI
@      REG_DEAD: r3:SI
@      REG_DEAD: r0:SI
        mla     r1, r0, r3, ip  @ 9     *mulsi3addsi/2  [length = 4]

Note how the sched1 pass has switched the two insns around. The register
allocator now decides to use two new registers here, because r0 and r3 are both
live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
available in insn 10 -- but this is too late.

Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
and 10, when sched2 wants to swap them back again?

(Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in
insn 10, because the sched2 pass fails to update the notes when it moves insn 9
before insn 10. But that's a separate issue...)


* If I compile with -fno-schedule-insns, I still don't get the optimal code:

        mul     ip, r2, r1
        str     r4, [sp, #-4]!
        mla     r1, r0, r3, ip
        umull   r3, r4, r2, r0
        add     r1, r1, r4
        mov     r4, r1
        mov     r0, r3
        ldmfd   sp!, {r4}
        bx      lr

This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that
is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where
r3 comes before r2. That's another thing for a target maintainer to look into.
If IRA would select r2:DI, you would also lose the save/restore of r4 and get
the perfect code of comment #2.


So two issues:
1. Why does the sched1 pass schedule insn 10 before insn 9?
2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575