[Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two
@ 2020-04-18 18:28 pascal_cuoq at hotmail dot com
  2020-04-20  7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: pascal_cuoq at hotmail dot com @ 2020-04-18 18:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650

            Bug ID: 94650
           Summary: Missed x86-64 peephole optimization: x >= large power
                    of two
           Product: gcc
           Version: 9.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pascal_cuoq at hotmail dot com
  Target Milestone: ---

Consider the three functions check, test0 and test1:

(Compiler Explorer link: https://gcc.godbolt.org/z/Sh4GpR )

#include <string.h>

#define LARGE_POWER_OF_TWO (1UL << 40)

int check(unsigned long m)
{
    return m >= LARGE_POWER_OF_TWO;
}

void g(int);

void test0(unsigned long m)
{
    if (m >= LARGE_POWER_OF_TWO) g(0);
}

void test1(unsigned long m)
{
    if (m >= LARGE_POWER_OF_TWO) g(m);
}

At least in the case of check and test0, the optimal way to compare m to 1<<40
is to shift m by 40 and compare the result to 0. This is the code generated for
these functions by Clang 10:

check:                                  # @check
        xorl    %eax, %eax
        shrq    $40, %rdi
        setne   %al
        retq
test0:                                  # @test0
        shrq    $40, %rdi
        je      .LBB1_1
        xorl    %edi, %edi
        jmp     g                       # TAILCALL
.LBB1_1:
        retq

In contrast, GCC 9.3 uses a 64-bit constant that needs to be loaded in a
register with movabsq:

check:
        movabsq $1099511627775, %rax
        cmpq    %rax, %rdi
        seta    %al
        movzbl  %al, %eax
        ret
test0:
        movabsq $1099511627775, %rax
        cmpq    %rax, %rdi
        ja      .L5
        ret
.L5:
        xorl    %edi, %edi
        jmp     g


In the case of the function test1 the comparison is between these two version,
because the shift is destructive:

Clang10:
test1:                                  # @test1
        movq    %rdi, %rax
        shrq    $40, %rax
        je      .LBB2_1
        jmp     g                       # TAILCALL
.LBB2_1:
        retq

GCC9.3:
test1:
        movabsq $1099511627775, %rax
        cmpq    %rax, %rdi
        ja      .L8
        ret
.L8:
        jmp     g

It is less obvious which approach is better in the case of the function test1,
but generally speaking the shift approach should still be faster. The
register-register move can be free on Skylake (in the sense of not needing any
execution port), whereas movabsq requires an execution port and also it's a
10-byte instruction!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
  2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
@ 2020-04-20  7:05 ` rguenth at gcc dot gnu.org
  2020-04-20 16:10 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-20  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2020-04-20
             Status|UNCONFIRMED                 |NEW
             Target|                            |x86_64-*-*

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
  2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
  2020-04-20  7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
@ 2020-04-20 16:10 ` ubizjak at gmail dot com
  2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
  2020-05-04 11:54 ` ubizjak at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2020-04-20 16:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
             Status|NEW                         |ASSIGNED

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 48315
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48315&action=edit
Prototype patch

Using this patch, the following asm is created (-O2):

--cut here--
check:
        xorl    %eax, %eax
        shrq    $40, %rdi
        setne   %al
        ret

test0:
        shrq    $40, %rdi
        jne     .L5
        ret
.L5:
        xorl    %edi, %edi
        jmp     g

test1:
        movq    %rdi, %rax
        shrq    $40, %rax
        jne     .L8
        ret
.L8:
        jmp     g
--cut here--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
  2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
  2020-04-20  7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
  2020-04-20 16:10 ` ubizjak at gmail dot com
@ 2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
  2020-05-04 11:54 ` ubizjak at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-04 11:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:8ea03e9016cbca5a7ee2b4befa4d5c32467b0982

commit r11-37-g8ea03e9016cbca5a7ee2b4befa4d5c32467b0982
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Mon May 4 13:49:14 2020 +0200

    i386: Use SHR to compare with large power-of-two constants [PR94650]

    Convert unsigned compares where

            m >= LARGE_POWER_OF_TWO

    and LARGE_POWER_OF_TWO represent an immediate where bit 33+ is set to use
    a SHR instruction and compare the result to 0.  This avoids loading a
    large immediate with MOVABS insn.

            movabsq $1099511627775, %rax
            cmpq    %rax, %rdi
            ja      .L5

    gets converted to:

            shrq    $40, %rdi
            jne     .L5

            PR target/94650
            * config/i386/predicates.md (shr_comparison_operator): New
predicate.
            * config/i386/i386.md (compare->shr splitter): New splitters.

    testsuite/ChangeLog:

            PR target/94650
            * gcc.targeti/i386/pr94650.c: New test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
  2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
                   ` (2 preceding siblings ...)
  2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
@ 2020-05-04 11:54 ` ubizjak at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-04 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |11.0

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-11.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-04 11:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
2020-04-20  7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
2020-04-20 16:10 ` ubizjak at gmail dot com
2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
2020-05-04 11:54 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).