From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 110713 invoked by alias); 17 Sep 2015 06:08:39 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 107394 invoked by uid 48); 17 Sep 2015 06:08:34 -0000 From: "peter at cordes dot ca" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/67606] New: Missing optimization: load possible return value before early-out test Date: Thu, 17 Sep 2015 06:08:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 5.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: peter at cordes dot ca X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-09/txt/msg01352.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606 Bug ID: 67606 Summary: Missing optimization: load possible return value before early-out test Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Sry if this is a duplicate; I couldn't think of any good search terms to describe this. In a function that might return a constant if a loop condition is false for the first iteration, gcc generates a separate basic block for the early-out return when it doesn't need to. int f(int a[], int length) { int count=0; for (int i = 0 ; i < length ; i++) if (a[i] > 5) count++; return count; } x86 gcc 5.2 -O3 -fno-tree-vectorize compiles it to: f(int*, int): testl %esi, %esi # length jle .L4 #, leal -1(%rsi), %eax #, D.2373 leaq 4(%rdi,%rax,4), %rcx #, D.2371 xorl %eax, %eax # count .L3: xorl %edx, %edx # D.2370 cmpl $5, (%rdi) #, MEM[base: _29, offset: 0B] setg %dl #, D.2370 addq $4, %rdi #, ivtmp.7 addl %edx, %eax # D.2370, count cmpq %rdi, %rcx # ivtmp.7, D.2371 jne .L3 #, rep ret .L4: xorl %eax, %eax # count ret (actually g++, since I used godbolt.org, but gcc 4.9.2 looked similar) A better sequence would be: f(int*, int): xorl %eax, %eax # count testl %esi, %esi # length jle .L4 #, ... unchanged ... cmpq %rdi, %rcx # ivtmp.7, D.2371 jne .L3 #, .L4: rep ret In this case, the ret was already a rep ret, so this change saves ~3 instruction bytes, (or more usually, none, because of padding) at zero speed cost in either the taken or non-taken early-out case. Also, looks like a separate bug, but what the crap is gcc doing with the two lea instructions? Defending against integer overflow when calculating a one-past-the end loop boundary pointer? It doesn't do that when f() takes a size_t arg. (In this case, nothing depends on the lea results soon enough for it to matter. They're dependent, anyway, and the first one is one of the first 4 insns. leal -1(%rsi), %eax #, D.2373 leaq 4(%rdi,%rax,4), %rcx #, D.2371 xorl %eax, %eax # count Unless I'm missing something, leaq (%rdi, %rsi, 4), %rcx should be just as safe. Is something adding the -1-before-scaling, correct after scaling thing for a reason? Is it something that doesn't realize it will be implemented with lea, which avoids any problems? It's an extra instruction, and 4(...) stores the offset as a 32bit constant in the insn encoding, not a sign-extended 8bit value.