From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-497374-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 110713 invoked by alias); 17 Sep 2015 06:08:39 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 107394 invoked by uid 48); 17 Sep 2015 06:08:34 -0000
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/67606] New: Missing optimization: load possible return value before early-out test
Date: Thu, 17 Sep 2015 06:08:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: 5.2.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: peter at cordes dot ca
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-67606-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-09/txt/msg01352.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606

            Bug ID: 67606
           Summary: Missing optimization: load possible return value
                    before early-out test
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

Sry if this is a duplicate; I couldn't think of any good search terms to
describe this.


In a function that might return a constant if a loop condition is false for the
first iteration, gcc generates a separate basic block for the early-out return
when it doesn't need to.

int f(int a[], int length) {
  int count=0;
  for (int i = 0 ; i < length ; i++)
    if (a[i] > 5)
      count++;

  return count;
}

x86 gcc 5.2 -O3 -fno-tree-vectorize compiles it to:

f(int*, int):
        testl   %esi, %esi      # length
        jle     .L4     #,
        leal    -1(%rsi), %eax  #, D.2373
        leaq    4(%rdi,%rax,4), %rcx    #, D.2371
        xorl    %eax, %eax      # count
.L3:
        xorl    %edx, %edx      # D.2370
        cmpl    $5, (%rdi)      #, MEM[base: _29, offset: 0B]
        setg    %dl     #, D.2370
        addq    $4, %rdi        #, ivtmp.7
        addl    %edx, %eax      # D.2370, count
        cmpq    %rdi, %rcx      # ivtmp.7, D.2371
        jne     .L3     #,
        rep ret
.L4:
        xorl    %eax, %eax      # count
        ret

(actually g++, since I used godbolt.org, but gcc 4.9.2 looked similar)


A better sequence would be:

f(int*, int):
        xorl    %eax, %eax      # count
        testl   %esi, %esi      # length
        jle     .L4     #,

... unchanged ...

        cmpq    %rdi, %rcx      # ivtmp.7, D.2371
        jne     .L3     #,
.L4:
        rep ret


In this case, the ret was already a rep ret, so this change saves ~3
instruction bytes, (or more usually, none, because of padding) at zero speed
cost in either the taken or non-taken early-out case.


Also, looks like a separate bug, but what the crap is gcc doing with the two
lea instructions?  Defending against integer overflow when calculating a
one-past-the end loop boundary pointer?  It doesn't do that when f() takes a
size_t arg.    (In this case, nothing depends on the lea results soon enough
for it to matter.  They're dependent, anyway, and the first one is one of the
first 4 insns.

leal    -1(%rsi), %eax  #, D.2373
        leaq    4(%rdi,%rax,4), %rcx    #, D.2371
        xorl    %eax, %eax      # count

Unless I'm missing something, leaq (%rdi, %rsi, 4), %rcx  should be just as
safe.  Is something adding the -1-before-scaling, correct after scaling thing
for a reason?  Is it something that doesn't realize it will be implemented with
lea, which avoids any problems?  It's an extra instruction, and 4(...) stores
the offset as a 32bit constant in the insn encoding, not a sign-extended 8bit
value.