From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1486 invoked by alias); 8 Feb 2010 10:47:21 -0000 Received: (qmail 1346 invoked by uid 48); 8 Feb 2010 10:47:10 -0000 Date: Mon, 08 Feb 2010 10:47:00 -0000 Message-ID: <20100208104710.1345.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "steven at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-02/txt/msg00674.txt.bz2 ------- Comment #3 from steven at gcc dot gnu dot org 2010-02-08 10:47 ------- Trunk today produces this (with -dAP hacked to print slim RTL): .file "t.c" .text .align 2 .global longfunc .type longfunc, %function longfunc: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. @ basic block 2 @ 8 ip:SI=r2:SI*r1:SI @ REG_DEAD: r1:SI mul ip, r2, r1 @ 8 *arm_mulsi3/2 [length = 4] @ 35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;} @ REG_DEAD: r5:SI @ REG_DEAD: r4:SI @ REG_FRAME_RELATED_EXPR: sequence stmfd sp!, {r4, r5} @ 35 *push_multi [length = 4] @ 9 r1:SI=r0:SI*r3:SI+ip:SI @ REG_DEAD: ip:SI @ REG_DEAD: r3:SI @ REG_DEAD: r0:SI mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4] @ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI) @ REG_DEAD: r2:SI umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4] @ 11 r1:SI=r1:SI+r5:SI @ REG_DEAD: r5:SI add r1, r1, r5 @ 11 *arm_addsi3/1 [length = 4] @ 12 r5:SI=r1:SI mov r5, r1 @ 12 *arm_movsi_insn/1 [length = 4] @ 31 r0:SI=r4:SI mov r0, r4 @ 31 *arm_movsi_insn/1 [length = 4] @ 38 unspec/v{return;} ldmfd sp!, {r4, r5} bx lr .size longfunc, .-longfunc .ident "GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision 156595]" Questions for those who know ARM: * What is the purpose of insn 12 here? It looks to me like this is dead code, since r5 is restored in insn 38 (although, not knowing ARM so well, I may be wrong). * After combine we have these two insns: 9 r138:SI=r142:SI*r3:SI+r139:SI REG_DEAD: r3:SI REG_DEAD: r139:SI 10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI) REG_DEAD: r144:SI REG_DEAD: r142:SI which translate to the mla insn and to the umull insn that uses r4 and r5: @ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI) @ REG_DEAD: r2:SI umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4] @ 9 r1:SI=r0:SI*r3:SI+ip:SI @ REG_DEAD: ip:SI @ REG_DEAD: r3:SI @ REG_DEAD: r0:SI mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4] Note how the sched1 pass has switched the two insns around. The register allocator now decides to use two new registers here, because r0 and r3 are both live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become available in insn 10 -- but this is too late. Question for the ARM maintainer now is: Why does sched1 want to swap insns 9 and 10, when sched2 wants to swap them back again? (Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in insn 10, because the sched2 pass fails to update the notes when it moves insn 9 before insn 10. But that's a separate issue...) * If I compile with -fno-schedule-insns, I still don't get the optimal code: mul ip, r2, r1 str r4, [sp, #-4]! mla r1, r0, r3, ip umull r3, r4, r2, r0 add r1, r1, r4 mov r4, r1 mov r0, r3 ldmfd sp!, {r4} bx lr This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where r3 comes before r2. That's another thing for a target maintainer to look into. If IRA would select r2:DI, you would also lose the save/restore of r4 and get the perfect code of comment #2. So two issues: 1. Why does the sched1 pass schedule insn 10 before insn 9? 2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575