From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5098 invoked by alias); 26 May 2010 05:12:38 -0000 Received: (qmail 4795 invoked by uid 48); 26 May 2010 05:12:27 -0000 Date: Wed, 26 May 2010 05:12:00 -0000 Subject: [Bug regression/44281] New: Global Register variable pessimisation and regression X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "adam at consulting dot net dot nz" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-05/txt/msg02899.txt.bz2 I am aware developers WONTFIX GCC being a pessimising compiler with respect to some global register variable issues: GCC is copying registers for no good reason whatsoever. Below is a very simple example where gcc 3.3.6 does a better job of optimising the code. Unnecessary copying of registers may also occur with local register variables. #include register uint64_t global_flag_stack __asm__("rbx"); void push_flag_into_global_reg_var(uint64_t a, uint64_t b) { uint64_t flag = (a==b); global_flag_stack <<= 8; global_flag_stack |= flag; } uint64_t push_flag_into_local_var(uint64_t a, uint64_t b, uint64_t local_flag_stack) { uint64_t flag = (a==b); local_flag_stack <<= 8; return local_flag_stack | flag; } int main() { } gcc-3.3 (GCC) 3.3.6 (Debian 1:3.3.6-15): $ gcc-3.3 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400478 : 400478: 31 c0 xor eax,eax 40047a: 48 39 f7 cmp rdi,rsi 40047d: 0f 94 c0 sete al 400480: 48 c1 e3 08 shl rbx,0x8 400484: 48 09 c3 or rbx,rax 400487: c3 ret 0000000000400488 : 400488: 31 c0 xor eax,eax 40048a: 48 39 f7 cmp rdi,rsi 40048d: 0f 94 c0 sete al 400490: 48 c1 e2 08 shl rdx,0x8 400494: 48 09 d0 or rax,rdx 400497: c3 ret ... gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-29): $ gcc-4.1 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400448 : 400448: 48 89 da mov rdx,rbx 40044b: 31 c0 xor eax,eax 40044d: 48 c1 e2 08 shl rdx,0x8 400451: 48 39 f7 cmp rdi,rsi 400454: 0f 94 c0 sete al 400457: 48 89 d3 mov rbx,rdx 40045a: 48 09 c3 or rbx,rax 40045d: c3 ret 000000000040045e : 40045e: 48 c1 e2 08 shl rdx,0x8 400462: 31 c0 xor eax,eax 400464: 48 39 f7 cmp rdi,rsi 400467: 0f 94 c0 sete al 40046a: 48 09 d0 or rax,rdx 40046d: c3 ret ... gcc-4.5 (Debian 4.5.0-1) 4.5.0: $ gcc-4.5 -Os flags.c && objdump -d -m i386:x86-64:intel a.out|less ... 0000000000400494 : 400494: 31 d2 xor edx,edx 400496: 48 39 f7 cmp rdi,rsi 400499: 48 89 d8 mov rax,rbx 40049c: 0f 94 c2 sete dl 40049f: 48 c1 e0 08 shl rax,0x8 4004a3: 48 89 d3 mov rbx,rdx 4004a6: 48 09 c3 or rbx,rax 4004a9: c3 ret 00000000004004aa : 4004aa: 48 89 d0 mov rax,rdx 4004ad: 31 d2 xor edx,edx 4004af: 48 c1 e0 08 shl rax,0x8 4004b3: 48 39 f7 cmp rdi,rsi 4004b6: 0f 94 c2 sete dl 4004b9: 48 09 d0 or rax,rdx 4004bc: c3 ret ... The object code that current GCC is generating is embarrassing compared with GCC 3.3.6. Is it also necessary to increase the code footprint of push_flag_into_local_var when optimising for size (-Os) when compared to gcc 3.3.6 and 4.1.3? -- Summary: Global Register variable pessimisation and regression Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: adam at consulting dot net dot nz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281