* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
@ 2020-05-05 15:39 ` steinar+gcc at gunderson dot no
2020-05-05 17:21 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2020-05-05 15:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #1 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
Sorry, truncated the assembler. GCC's is:
atum17:~> objdump --disassemble test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 0f bc ff bsf %edi,%edi
3: b8 ff ff ff ff mov $0xffffffff,%eax
8: 0f 45 c7 cmovne %edi,%eax
b: c3 retq
atum17:~>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
@ 2020-05-05 17:21 ` jakub at gcc dot gnu.org
2020-05-05 18:21 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-05 17:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org,
| |uros at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
VRP knows that the argument must be non-zero:
# RANGE ~[0, 0]
x.0_1 = (int) x_3(D);
# RANGE [1, 32] NONZERO 63
_2 = __builtin_ffs (x.0_1);
# RANGE [0, 31] NONZERO 31
_4 = _2 + -1;
return _4;
in *.optimized dump, but the comparison and conditional move is emitted by the
ffssi2 i386.md expander. Unfortunately, from what that expander sees there is
no way to find out the argument is non-zero.
So, I wonder if for these optabs where whether the argument must be nonzero
might matter (anything but ffs? I mean, clz, ctz have undefined values for
zero) we shouldn't add further optab, like ffsnonzero, where the code which
still can query VRP info of the argument could be used and use the other optab
in expand_direct_optab_fn.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
2020-05-05 15:39 ` [Bug tree-optimization/94956] " steinar+gcc at gunderson dot no
2020-05-05 17:21 ` jakub at gcc dot gnu.org
@ 2020-05-05 18:21 ` jakub at gcc dot gnu.org
2020-05-06 6:58 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-05 18:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Another option would be to fold FFS (x) for x known non-zero into CTZ (x) + 1
in match.pd.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (2 preceding siblings ...)
2020-05-05 18:21 ` jakub at gcc dot gnu.org
@ 2020-05-06 6:58 ` rguenth at gcc dot gnu.org
2020-05-08 7:34 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-06 6:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-05-06
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (3 preceding siblings ...)
2020-05-06 6:58 ` rguenth at gcc dot gnu.org
@ 2020-05-08 7:34 ` cvs-commit at gcc dot gnu.org
2020-05-08 7:51 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-08 7:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:df569f7da567af4996821dc0a1871eec79957d04
commit r11-194-gdf569f7da567af4996821dc0a1871eec79957d04
Author: Jakub Jelinek <jakub@redhat.com>
Date: Fri May 8 09:33:55 2020 +0200
match.pd: Optimize ffs of known non-zero arg into ctz + 1 [PR94956]
The ffs expanders on several targets (x86, ia64, aarch64 at least)
emit a conditional move or similar code to handle the case when the
argument is 0, which makes the code longer.
If we know from VRP that the argument will not be zero, we can (if the
target has also an ctz expander) just use ctz which is undefined at zero
and thus the expander doesn't need to deal with that.
2020-05-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94956
* match.pd (FFS): Optimize __builtin_ffs* of non-zero argument into
__builtin_ctz* + 1 if direct IFN_CTZ is supported.
* gcc.target/i386/pr94956.c: New test.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (4 preceding siblings ...)
2020-05-08 7:34 ` cvs-commit at gcc dot gnu.org
@ 2020-05-08 7:51 ` jakub at gcc dot gnu.org
2020-05-08 7:53 ` steinar+gcc at gunderson dot no
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-05-08 7:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11+.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (5 preceding siblings ...)
2020-05-08 7:51 ` jakub at gcc dot gnu.org
@ 2020-05-08 7:53 ` steinar+gcc at gunderson dot no
2021-06-20 9:50 ` steinar+gcc at gunderson dot no
2021-08-03 2:09 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2020-05-08 7:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #6 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
Oh nice! I'll be sure to benchmark when I can get my hands on appropriate
binaries (I typically use gcc-snapshot from Debian).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (6 preceding siblings ...)
2020-05-08 7:53 ` steinar+gcc at gunderson dot no
@ 2021-06-20 9:50 ` steinar+gcc at gunderson dot no
2021-08-03 2:09 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-20 9:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
--- Comment #7 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
To wrap this up, confirming that GCC 11 does well on my benchmark:
BM_Chain20 54529 iterations 18781 ns/iter GCC 10, asm bsfq
BM_Chain20 44584 iterations 22509 ns/iter GCC 10, ffsll()
BM_Chain20 49753 iterations 20216 ns/iter GCC 11, asm bsfq
BM_Chain20 53346 iterations 18816 ns/iter GCC 11, ffsll()
BM_Chain20 64926 iterations 15747 ns/iter Clang 12, asm bsfq
BM_Chain20 71208 iterations 14374 ns/iter Clang 12, ffsll()
So basically for 11+, the ffsll() statement does better than the bsfq
statement, whereas it used to do markedly worse.
Clang does even better, but I can live with that. :-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/94956] Unable to remove impossible ffs() test for zero
2020-05-05 15:38 [Bug tree-optimization/94956] New: Unable to remove impossible ffs() test for zero steinar+gcc at gunderson dot no
` (7 preceding siblings ...)
2021-06-20 9:50 ` steinar+gcc at gunderson dot no
@ 2021-08-03 2:09 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-03 2:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94956
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fuz at fuz dot su
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 68294 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 10+ messages in thread