From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3627 invoked by alias); 15 Jun 2011 12:02:48 -0000 Received: (qmail 3618 invoked by uid 22791); 15 Jun 2011 12:02:47 -0000 X-SWARE-Spam-Status: No, hits=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_DM X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 15 Jun 2011 12:02:31 +0000 From: "philb at gnu dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/49421] New: [arm] suboptimal choice of working regs X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: philb at gnu dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Wed, 15 Jun 2011 12:02:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg01324.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49421 Summary: [arm] suboptimal choice of working regs Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: philb@gnu.org If a leaf function requires one more working register than can be accomodated in the call-clobbered set, gcc currently tends to push r4 and use that next. However, in the specific case of a leaf function, it would be better to push lr and use that as the working register, since then the return can be done with a single pop. Consider the made-up example: int f(int *a, int *b, int *c, int *d) { int i; for (i = 0; i < 4; i++) if (a[i] || b[i] || c[i] || d[i]) return 1; return 0; } which compiles (-march=armv6 -mtune=arm1136jf-s -O2) to: f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov ip, #0 str r4, [sp, #-4]! .L3: ldr r4, [r0, ip] cmp r4, #0 bne .L7 ldr r4, [r1, ip] cmp r4, #0 bne .L7 ldr r4, [r2, ip] cmp r4, #0 bne .L7 ldr r4, [r3, ip] add ip, ip, #4 cmp r4, #0 bne .L7 cmp ip, #16 bne .L3 mov r0, r4 .L2: ldmfd sp!, {r4} bx lr .L7: mov r0, #1 b .L2 If lr had been pushed instead of r4 then the return could have simply been "pop {lr}". Also, since this is arm11, it is no more expensive to push two words than one. If the compiler had stacked both r4 and lr, it would have freed up an extra register for the loop which would probably have allowed the loads to be scheduled better.