From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22271 invoked by alias); 23 Apr 2012 17:36:30 -0000 Received: (qmail 22116 invoked by uid 22791); 23 Apr 2012 17:36:28 -0000 X-SWARE-Spam-Status: No, hits=-3.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_DD,TW_LV X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 23 Apr 2012 17:36:16 +0000 From: "xinliangli at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/53090] New: suboptimal ivopt Date: Mon, 23 Apr 2012 17:36:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: xinliangli at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg01986.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53090 Bug #: 53090 Summary: suboptimal ivopt Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned@gcc.gnu.org ReportedBy: xinliangli@gmail.com Compiling the attached benchmark code with trunk gcc, the code generated for the hot memory swap loop (line 60) is very inefficient : both icc and llvm use two ivs and generate a tight loop with 9 instructions, but gcc decides to use 3 ivs, and the loop exit testing code is wierd and inefficient -- it ends up produce a loop with 11 instructions. #define XCH(x,y) { Aint t_mp; t_mp=(x); (x)=(y); (y)=t_mp; } for( i=1, j=k-1 ; i This Inner Loop Header: Depth=3 movl (%rbx,%rdi,4), %ebp movl (%rbx,%rsi,4), %eax movl %eax, (%rbx,%rdi,4) movl %ebp, (%rbx,%rsi,4) decq %rsi incq %rdi cmpl %edx, %edi leal -1(%rdx), %edx jl .LBB0_11 The gcc version: .L18: movl (%rdx), %edi addl $1, %ecx movl (%rsi), %eax movl %eax, (%rdx) addq $4, %rdx movl %edi, (%rsi) movl %r8d, %edi subq $4, %rsi subl %ecx, %edi cmpl %edi, %ecx jl .L18 However gcc is doing the right thing when applied on the extracted test case: #define XCH(x,y) { int t_mp; t_mp=(x); (x)=(y); (y)=t_mp; } void foo (int *perm, int k) { int i,j; for( i=1, j=k-1 ; i