public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two
@ 2020-04-18 18:28 pascal_cuoq at hotmail dot com
2020-04-20 7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: pascal_cuoq at hotmail dot com @ 2020-04-18 18:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650
Bug ID: 94650
Summary: Missed x86-64 peephole optimization: x >= large power
of two
Product: gcc
Version: 9.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pascal_cuoq at hotmail dot com
Target Milestone: ---
Consider the three functions check, test0 and test1:
(Compiler Explorer link: https://gcc.godbolt.org/z/Sh4GpR )
#include <string.h>
#define LARGE_POWER_OF_TWO (1UL << 40)
int check(unsigned long m)
{
return m >= LARGE_POWER_OF_TWO;
}
void g(int);
void test0(unsigned long m)
{
if (m >= LARGE_POWER_OF_TWO) g(0);
}
void test1(unsigned long m)
{
if (m >= LARGE_POWER_OF_TWO) g(m);
}
At least in the case of check and test0, the optimal way to compare m to 1<<40
is to shift m by 40 and compare the result to 0. This is the code generated for
these functions by Clang 10:
check: # @check
xorl %eax, %eax
shrq $40, %rdi
setne %al
retq
test0: # @test0
shrq $40, %rdi
je .LBB1_1
xorl %edi, %edi
jmp g # TAILCALL
.LBB1_1:
retq
In contrast, GCC 9.3 uses a 64-bit constant that needs to be loaded in a
register with movabsq:
check:
movabsq $1099511627775, %rax
cmpq %rax, %rdi
seta %al
movzbl %al, %eax
ret
test0:
movabsq $1099511627775, %rax
cmpq %rax, %rdi
ja .L5
ret
.L5:
xorl %edi, %edi
jmp g
In the case of the function test1 the comparison is between these two version,
because the shift is destructive:
Clang10:
test1: # @test1
movq %rdi, %rax
shrq $40, %rax
je .LBB2_1
jmp g # TAILCALL
.LBB2_1:
retq
GCC9.3:
test1:
movabsq $1099511627775, %rax
cmpq %rax, %rdi
ja .L8
ret
.L8:
jmp g
It is less obvious which approach is better in the case of the function test1,
but generally speaking the shift approach should still be faster. The
register-register move can be free on Skylake (in the sense of not needing any
execution port), whereas movabsq requires an execution port and also it's a
10-byte instruction!
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
@ 2020-04-20 7:05 ` rguenth at gcc dot gnu.org
2020-04-20 16:10 ` ubizjak at gmail dot com
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-20 7:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed| |2020-04-20
Status|UNCONFIRMED |NEW
Target| |x86_64-*-*
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
2020-04-20 7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
@ 2020-04-20 16:10 ` ubizjak at gmail dot com
2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
2020-05-04 11:54 ` ubizjak at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2020-04-20 16:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
Status|NEW |ASSIGNED
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 48315
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48315&action=edit
Prototype patch
Using this patch, the following asm is created (-O2):
--cut here--
check:
xorl %eax, %eax
shrq $40, %rdi
setne %al
ret
test0:
shrq $40, %rdi
jne .L5
ret
.L5:
xorl %edi, %edi
jmp g
test1:
movq %rdi, %rax
shrq $40, %rax
jne .L8
ret
.L8:
jmp g
--cut here--
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
2020-04-20 7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
2020-04-20 16:10 ` ubizjak at gmail dot com
@ 2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
2020-05-04 11:54 ` ubizjak at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-04 11:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:8ea03e9016cbca5a7ee2b4befa4d5c32467b0982
commit r11-37-g8ea03e9016cbca5a7ee2b4befa4d5c32467b0982
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Mon May 4 13:49:14 2020 +0200
i386: Use SHR to compare with large power-of-two constants [PR94650]
Convert unsigned compares where
m >= LARGE_POWER_OF_TWO
and LARGE_POWER_OF_TWO represent an immediate where bit 33+ is set to use
a SHR instruction and compare the result to 0. This avoids loading a
large immediate with MOVABS insn.
movabsq $1099511627775, %rax
cmpq %rax, %rdi
ja .L5
gets converted to:
shrq $40, %rdi
jne .L5
PR target/94650
* config/i386/predicates.md (shr_comparison_operator): New
predicate.
* config/i386/i386.md (compare->shr splitter): New splitters.
testsuite/ChangeLog:
PR target/94650
* gcc.targeti/i386/pr94650.c: New test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/94650] Missed x86-64 peephole optimization: x >= large power of two
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
` (2 preceding siblings ...)
2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
@ 2020-05-04 11:54 ` ubizjak at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-04 11:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94650
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |11.0
--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-11.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-05-04 11:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-18 18:28 [Bug target/94650] New: Missed x86-64 peephole optimization: x >= large power of two pascal_cuoq at hotmail dot com
2020-04-20 7:05 ` [Bug target/94650] " rguenth at gcc dot gnu.org
2020-04-20 16:10 ` ubizjak at gmail dot com
2020-05-04 11:50 ` cvs-commit at gcc dot gnu.org
2020-05-04 11:54 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).