From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-477196-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31572 invoked by alias); 13 Feb 2015 18:04:23 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 31536 invoked by uid 48); 13 Feb 2015 18:04:20 -0000
From: "basile at opensource dot dyc.edu" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug regression/64812] [4.9 regression] x86 LibreOffice Build failure: undefined reference to acquire
Date: Fri, 13 Feb 2015 18:04:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: regression
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: basile at opensource dot dyc.edu
X-Bugzilla-Status: WAITING
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-64812-4-2aksGWnuiR@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64812-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64812-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01529.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D64812

Anthony G. Basile <basile at opensource dot dyc.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |basile at opensource dot d=
yc.edu
--- Comment #4 from Anthony G. Basile <basile at opensource dot dyc.edu> ---
(In reply to Luke from comment #0)
> LibreOffice 4.2 or newer fails to build in a clean build environment on
> Linux x86-32 with gcc 4.9.0, 4.9.1, and 4.9.2. However, the build succeeds
> on an otherwise identical x86-64 system, and it also succeeds with gcc 4.=
8.2
> (on x86-32 and x86-64). The build also succeeds with clang 3.4 (on x86-32
> and x86-64).
>=20

We are seeing this in gentoo on x86_64 with gcc-4.9.  See

https://bugs.gentoo.org/show_bug.cgi?id=3D538348

(In reply to Timo Ter=C3=A4s from comment #3)
> As a workaround removing -fvisibility-inlines-hidden from the build flags,
> makes things work in the libreoffice.

What's confusing me is that some of our users are seeing this bug and others
are not.  This leads me to think maybe its a c++ abi mismatch because we do
allow our users to build their systems using any recent version of gcc (and=
 c++
abi emitted by 4.8 is not compatbile with that emitted by 4.9).  However, t=
hat
fact tat removing -fvisibility-inlines-hidden fixes this argues against my
suspicion.  So why would some of our users it this and others not?
>>From gcc-bugs-return-477197-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Feb 13 18:22:59 2015
Return-Path: <gcc-bugs-return-477197-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 26529 invoked by alias); 13 Feb 2015 18:22:59 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 26508 invoked by uid 48); 13 Feb 2015 18:22:54 -0000
From: "linux at horizon dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/65056] New: Missed optimization (x86): Redundant test/compare after arithmetic operations
Date: Fri, 13 Feb 2015 18:22:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: linux at horizon dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_file_loc bug_status bug_severity priority component assigned_to reporter cf_gcchost cf_gcctarget
Message-ID: <bug-65056-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01530.txt.bz2
Content-length: 5610

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65056

            Bug ID: 65056
           Summary: Missed optimization (x86): Redundant test/compare
                    after arithmetic operations
           Product: gcc
           Version: 5.0
               URL: http://marc.info/?l=linux-kernel&m=142373514630907
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: linux at horizon dot com
              Host: i586-linux-gnu
            Target: x86_64-*-*, i?86-*-*

The following code seems to miss some really obvious optimizations on
x86, both in -m32 and -m64.  Gcc is generating separate test & compare
instruction for conditions which are available in the condition codes set by
preceding arithmetic operations.

Bugs #30455 and #31799 are similar, but the problems there are caused by a
memory destination, which isn't the case here.

This happens with -O2, -O3 and -Os and with various models specified, so it
doesn't appear to be some obscure model-specific optimization.

Versions tested:
* gcc (Debian 4.9.2-10) 4.9.2
* gcc-5 (Debian 5-20150205-1) 5.0.0 20150205 (experimental) [trunk revision
220455]
* gcc (Debian 20150211-1) 5.0.0 20150211 (experimental) [trunk revision 220605]

#include <stddef.h>

#define BITS_PER_LONG (8 * sizeof(long))
#define DIV_ROUND_UP(x,n) (((x) + (n) - 1) / (n))
#define LAST_WORD_MASK(x) (~0ull >> (-(x) & BITS_PER_LONG - 1))

/*
 * __fls: find last set bit in word
 * @word: The word to search
 *
 * Undefined if no set bit exists, so code should check against 0 first.
 */
static inline unsigned long __fls(unsigned long word)
{
    asm("bsr %1,%0"
        : "=r" (word)
        : "rm" (word));
    return word;
}

size_t find_last_bit(const unsigned long *addr, size_t size)
{
    size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
    unsigned long mask = LAST_WORD_MASK(size);

    while (idx--) {
        unsigned long val = addr[idx] & mask;
        if (val)
            return idx * BITS_PER_LONG + __fls(val);
        mask = ~0ul;
    }
    return size;
}

The code generated is (-m32 is equivalent):

    .file    "flb.c"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB0:
    .text
.LHOTB0:
    .p2align 4,,15
    .globl    find_last_bit
    .type    find_last_bit, @function
find_last_bit:
.LFB1:
    .cfi_startproc
    movl    %esi, %ecx
    leaq    63(%rsi), %rdx
    movq    $-1, %r8
    negl    %ecx
    shrq    %cl, %r8
    shrq    $6, %rdx
    movq    %r8, %rcx
    jmp    .L2
    .p2align 4,,10
    .p2align 3
.L4:
    andq    (%rdi,%rdx,8), %rcx
    movq    %rcx, %r8
    movq    $-1, %rcx
    testq    %r8, %r8
    jne    .L8
.L2:
    subq    $1, %rdx
    cmpq    $-1, %rdx
    jne    .L4
    movq    %rsi, %rax
    ret
    .p2align 4,,10
    .p2align 3
.L8:
    salq    $6, %rdx
#APP
# 15 "flb.c" 1
    bsr %r8,%r8
# 0 "" 2
#NO_APP
    leaq    (%rdx,%r8), %rax
    ret
    .cfi_endproc
.LFE1:
    .size    find_last_bit, .-find_last_bit
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .ident    "GCC: (Debian 20150211-1) 5.0.0 20150211 (experimental) [trunk
revision 220605]"
    .section    .note.GNU-stack,"",@progbits

In the loop at .L4, there's a completely unnecessary "movq %rcx, %r8;
testq %r8, %r8", when the jne could go right after the andq (and the
code at .L8 changed to expect the masked value in %rcx rather than %r8).

At .L2, it's even more ridiculous.  The subq generates a borrow if the value
wraps to -1.  Why is that not just "subq $1, %rdx; jnc .L4"?

A smarter compiler would notice that %rdx must have its top 6 bits clear
and thus "decq %rdx; jpl .L4" would also be legal.  (For non-x86 weenies,
the "dec" instructions do not modify the carry flag, originally so they
could be used for loop control in multi-word arithmetic.  This partial flags
update makes them slower than "subq $1" on many processors, so which is used
depends on the model flags.)

I tried reorganizing the source to encourage the first optimization:

size_t find_last_bit2(const unsigned long *addr, size_t size)
{
    unsigned long val = LAST_WORD_MASK(size);
    size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);

    while (idx--) {
        val &= addr[idx];
        if (val)
            return idx * BITS_PER_LONG + __fls(val);
        val = ~0ul;
    }
    return size;
}

... but the generated code is identical.


This version:

size_t find_last_bit3(const unsigned long *addr, size_t size)
{
    if (size) {
        unsigned long val = LAST_WORD_MASK(size);
        size_t idx = (size-1) / BITS_PER_LONG;

        do {
            val &= addr[idx];
            if (val)
                return idx * BITS_PER_LONG + __fls(val);
            val = ~0ul;
        } while (idx--);
    }
    return size;
}

Makes the first optimziation, and is at least clever with the second, but it's
still three instructions rather than two for an absolutely bog-standard
decrement loop:

find_last_bit3:
.LFB3:
    .cfi_startproc
    xorl    %eax, %eax
    testq    %rsi, %rsi
    je    .L17
    movl    %esi, %ecx
    leaq    -1(%rsi), %rax
    movq    $-1, %rdx
    negl    %ecx
    shrq    %cl, %rdx
    shrq    $6, %rax
    jmp    .L19
    .p2align 4,,10
    .p2align 3
.L18:
    subq    $1, %rax
    movq    $-1, %rdx
    cmpq    %rdx, %rax
    je    .L23
.L19:
    andq    (%rdi,%rax,8), %rdx
    je    .L18
    salq    $6, %rax
#APP
# 15 "flb.c" 1
    bsr %rdx,%rdx
# 0 "" 2
#NO_APP
    addq    %rdx, %rax
    ret
    .p2align 4,,10
    .p2align 3
.L23:
    movq    %rsi, %rax
.L17:
    rep ret