public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero
@ 2020-05-05 15:38 steinar+gcc at gunderson dot no
  2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2020-05-05 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

            Bug ID: 94956
           Summary: Unable to remove impossible ffs() test for zero
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

Hi,

Given GCC 10 x86-64, and this code:

#include <stdint.h>
#include <string.h>

int foo(uint32_t x) {
        if (x == 0) __builtin_unreachable();
        return ffs(x) - 1;
}

I get this assembler:

atum17:~> gcc-10 -O2 -c test.c 
atum17:~> objdump --disassemble test.o

The cmovne test is rather expensive for me due to high instruction latency,and
I can never have zero in my situation. (It costs ~10% in a much larger graph
algorithm.) I'm unable to get GCC to understand that it doesn't need it, save
for using an explicit asm statement.

By contrast, Clang 10 gets this right:

atum17:~> clang-10 -O2 -c test.c        
atum17:~> objdump --disassemble test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   0f bc c7                bsf    %edi,%eax
   3:   c3                      retq   

Is it possible to get access to the raw instruction by some clever means? :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
@ 2020-05-05 15:39 ` steinar+gcc at gunderson dot no
  2020-05-05 17:21 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2020-05-05 15:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #1 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
Sorry, truncated the assembler. GCC's is:

atum17:~> objdump --disassemble test.o                          

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   0f bc ff                bsf    %edi,%edi
   3:   b8 ff ff ff ff          mov    $0xffffffff,%eax
   8:   0f 45 c7                cmovne %edi,%eax
   b:   c3                      retq   
atum17:~>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
  2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
@ 2020-05-05 17:21 ` jakub at gcc dot gnu.org
  2020-05-05 18:21 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-05 17:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
VRP knows that the argument must be non-zero:
  # RANGE ~[0, 0]
  x.0_1 = (int) x_3(D);
  # RANGE [1, 32] NONZERO 63
  _2 = __builtin_ffs (x.0_1);
  # RANGE [0, 31] NONZERO 31
  _4 = _2 + -1;
  return _4;
in *.optimized dump, but the comparison and conditional move is emitted by the
ffssi2 i386.md expander.  Unfortunately, from what that expander sees there is
no way to find out the argument is non-zero.
So, I wonder if for these optabs where whether the argument must be nonzero
might matter (anything but ffs?  I mean, clz, ctz have undefined values for
zero) we shouldn't add further optab, like ffsnonzero, where the code which
still can query VRP info of the argument could be used and use the other optab
in expand_direct_optab_fn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
  2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
  2020-05-05 17:21 ` jakub at gcc dot gnu.org
@ 2020-05-05 18:21 ` jakub at gcc dot gnu.org
  2020-05-06  6:58 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-05 18:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Another option would be to fold FFS (x) for x known non-zero into CTZ (x) + 1
in match.pd.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (2 preceding siblings ...)
  2020-05-05 18:21 ` jakub at gcc dot gnu.org
@ 2020-05-06  6:58 ` rguenth at gcc dot gnu.org
  2020-05-08  7:34 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-06  6:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-05-06

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (3 preceding siblings ...)
  2020-05-06  6:58 ` rguenth at gcc dot gnu.org
@ 2020-05-08  7:34 ` cvs-commit at gcc dot gnu.org
  2020-05-08  7:51 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-08  7:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:df569f7da567af4996821dc0a1871eec79957d04

commit r11-194-gdf569f7da567af4996821dc0a1871eec79957d04
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri May 8 09:33:55 2020 +0200

    match.pd: Optimize ffs of known non-zero arg into ctz + 1 [PR94956]

    The ffs expanders on several targets (x86, ia64, aarch64 at least)
    emit a conditional move or similar code to handle the case when the
    argument is 0, which makes the code longer.
    If we know from VRP that the argument will not be zero, we can (if the
    target has also an ctz expander) just use ctz which is undefined at zero
    and thus the expander doesn't need to deal with that.

    2020-05-08  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/94956
            * match.pd (FFS): Optimize __builtin_ffs* of non-zero argument into
            __builtin_ctz* + 1 if direct IFN_CTZ is supported.

            * gcc.target/i386/pr94956.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (4 preceding siblings ...)
  2020-05-08  7:34 ` cvs-commit at gcc dot gnu.org
@ 2020-05-08  7:51 ` jakub at gcc dot gnu.org
  2020-05-08  7:53 ` steinar+gcc at gunderson dot no
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-08  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11+.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (5 preceding siblings ...)
  2020-05-08  7:51 ` jakub at gcc dot gnu.org
@ 2020-05-08  7:53 ` steinar+gcc at gunderson dot no
  2021-06-20  9:50 ` steinar+gcc at gunderson dot no
  2021-08-03  2:09 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2020-05-08  7:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #6 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
Oh nice! I'll be sure to benchmark when I can get my hands on appropriate
binaries (I typically use gcc-snapshot from Debian).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (6 preceding siblings ...)
  2020-05-08  7:53 ` steinar+gcc at gunderson dot no
@ 2021-06-20  9:50 ` steinar+gcc at gunderson dot no
  2021-08-03  2:09 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-20  9:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

--- Comment #7 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
To wrap this up, confirming that GCC 11 does well on my benchmark:

BM_Chain20            54529 iterations      18781 ns/iter   GCC 10, asm bsfq
BM_Chain20            44584 iterations      22509 ns/iter   GCC 10, ffsll()
BM_Chain20            49753 iterations      20216 ns/iter   GCC 11, asm bsfq
BM_Chain20            53346 iterations      18816 ns/iter   GCC 11, ffsll()
BM_Chain20            64926 iterations      15747 ns/iter   Clang 12, asm bsfq
BM_Chain20            71208 iterations      14374 ns/iter   Clang 12, ffsll()

So basically for 11+, the ffsll() statement does better than the bsfq
statement, whereas it used to do markedly worse.

Clang does even better, but I can live with that. :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
  2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
                   ` (7 preceding siblings ...)
  2021-06-20  9:50 ` steinar+gcc at gunderson dot no
@ 2021-08-03  2:09 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-03  2:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fuz at fuz dot su

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 68294 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-03  2:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
2020-05-05 17:21 ` jakub at gcc dot gnu.org
2020-05-05 18:21 ` jakub at gcc dot gnu.org
2020-05-06  6:58 ` rguenth at gcc dot gnu.org
2020-05-08  7:34 ` cvs-commit at gcc dot gnu.org
2020-05-08  7:51 ` jakub at gcc dot gnu.org
2020-05-08  7:53 ` steinar+gcc at gunderson dot no
2021-06-20  9:50 ` steinar+gcc at gunderson dot no
2021-08-03  2:09 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).