From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13307 invoked by alias); 12 Jul 2012 10:58:06 -0000 Received: (qmail 13299 invoked by uid 22791); 12 Jul 2012 10:58:06 -0000 X-SWARE-Spam-Status: No, hits=-3.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_DR X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 12 Jul 2012 10:57:53 +0000 From: "gregpsmith at live dot co.uk" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory Date: Thu, 12 Jul 2012 10:58:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gregpsmith at live dot co.uk X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-07/txt/msg00941.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938 Bug #: 53938 Summary: ARM target generates sub-optimal code (extra instructions) on load from memory Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: gregpsmith@live.co.uk Created attachment 27781 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27781 Example C source code We are targetting an embedded device and we do a lot of work accessing an FPGA (but this applies just as well to memory access). It has annoyed me for years that the GCC compiler emits unncessary code, wasting memory and cycles when reading 8 and 16-bit values. The attached script shows opportunities to generate better code. when compiled with: gcc -c -O3 -mcpu=arm946e-s codegen.c It compiles to (I have added comments): mov r2, #0xE0000000 // base address of the device ldrb r1, [r2] // load an unsigned byte, 0 extend ldrb r12, [r2] // load signed byte - WHY NOT ldrsb? and r1, r1, #0xFF // WHAT IS THIS FOR ldrh r3, [r2] // load unsigned short tst r1, #0x80 // if (i & 0x80) movne r1, #0 // i = 0 lsl r12, r12, #24 // sign extend j (but could be avoided) tst r3, #0x80 // if (k & 0x80) ldrh r0, [r2] // load signed short - WHY NOT ldrsh? movne r3, #0 // k = 0 add r1, r1, r12, asr #24 // add sign extended add r3, r1, r3 lsl r0, r0, #16 // sign extend l add r0, r3, r0, asr #16 bx lr There are two issues: 1) There is a completely redundant and r1,r1,#0xff. This does not occur when loading the unsigned short (which is why I have the similar code for loading an unsigned short). 2) There is unneccesary sign extension taking place. ARM has allowed signed loads of 8 and 16-bit values since v4. Spotting this has to be opportunistic as there are offset restrictions. Ideally the code would look like: mov r2, #0xE0000000 // base address of the device ldrb r1, [r2] // load an unsigned byte, 0 extend ldrsb r12, [r2] // load signed byte ldrh r3, [r2] // load unsigned short tst r1, #0x80 // if (i & 0x80) movne r1, #0 // i = 0 tst r3, #0x80 // if (k & 0x80) ldrsh r0, [r2] // load signed short, extend to 32-bits movne r3, #0 // k = 0 add r1, r1, r12 // add sign extended add r3, r1, r3 add r0, r3, r0 bx lr