From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16037 invoked by alias); 25 Dec 2009 10:12:48 -0000 Received: (qmail 16015 invoked by uid 48); 25 Dec 2009 10:12:35 -0000 Date: Fri, 25 Dec 2009 10:12:00 -0000 Subject: [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "sliao at google dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-12/txt/msg02382.txt.bz2 It seems that GCC 4.2.1 generates better code than GCC 4.4.0 in this case: The following code (extracted from Android's Dalvik_java_lang_System_currentTimeMillis in native/java_lang_System.c): // compilation options: -march=armv5te -mthumb -Os struct timeval { long tv_sec; long tv_usec; }; extern void get_time(struct timeval*); void test(long long *res) { struct timeval tv; get_time(&tv); *res = tv.tv_sec * 1000LL + tv.tv_usec / 1000; } is compiled by gcc-4.4.0 in sub-optimal way, so it takes 110 bytes (vs 74 bytes when compiled by gcc-4.2.1). Assembly files shows that it spills some registers on stack because code that multiply on 1000LL uses more registers that it need (that is use when compiled by gcc-4.2.1). Multiplication code is similar, but gcc 4.4 emits several additional MOVs that can be easily eliminated. This bug can be more easily demonstrated with multiplication of tv_sec by 10 and tv_usec/ 1000 removed. gcc.4.2.1: push {r4, r5, lr} sub sp, sp, #12 mov r5, r0 mov r0, sp bl get_time ldr r2, [sp] add sp, sp, #12 @ sp needed for prologue asr r4, r2, #31 mov r3, r2 lsr r0, r2, #30 lsl r2, r4, #2 orr r2, r2, r0 lsl r1, r3, #2 add r1, r1, r3 adc r2, r2, r4 lsr r0, r1, #31 lsl r4, r2, #1 orr r4, r4, r0 lsl r3, r1, #1 str r3, [r5] str r4, [r5, #4] pop {r4, r5, pc} gcc 4.4.0: push {r4, r5, r6, r7, lr} // note that gcc 4.2.1 uses only {r4, r5, lr} sub sp, sp, #12 mov r4, r0 mov r0, sp bl get_time ldr r6, [sp] add sp, sp, #12 @ sp needed for prologue mov r0, r6 asr r6, r6, #31 lsr r7, r0, #30 lsl r3, r6, #2 orr r3, r3, r7 mov r1, r6 // not needed actually, r6 can be used directly lsl r2, r0, #2 add r0, r0, r2 adc r1, r1, r3 lsr r2, r0, #31 lsl r3, r1, #1 orr r3, r3, r2 lsl r0, r0, #1 str r0, [r4] str r3, [r4, #4] pop {r4, r5, r6, r7, pc} -- Summary: Bad register allocation in multiplication code by constant Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sliao at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499