public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd
@ 2020-10-20 14:44 christophe.leroy at csgroup dot eu
2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 14:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
Bug ID: 97503
Summary: Suboptimal use of cntlzw and cntlzd
Product: gcc
Version: 10.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: christophe.leroy at csgroup dot eu
Target Milestone: ---
int f(int x)
{
return x ? __builtin_clz(x) : 32;
}
Is built as
00000000 <f>:
0: 2c 03 00 00 cmpwi r3,0
4: 40 82 00 0c bne 10 <f+0x10>
8: 38 60 00 20 li r3,32
c: 4e 80 00 20 blr
10: 7c 63 00 34 cntlzw r3,r3
14: 4e 80 00 20 blr
I would expect
00000000 <f>:
0: 7c 63 00 34 cntlzw r3,r3
4: 4e 80 00 20 blr
Because cntlzw (Count Leading Zeros Word) is documentated in powerpc
instruction set as returning 0 to 32 inclusive
The same applies to the 64 bits version:
long f(long x)
{
return x ? __builtin_clzll(x) : 64;
}
0000000000000000 <.f>:
0: 2c 23 00 00 cmpdi r3,0
4: 41 82 00 0c beq 10 <.f+0x10>
8: 7c 63 00 74 cntlzd r3,r3
c: 4e 80 00 20 blr
10: 38 60 00 40 li r3,64
14: 4e 80 00 20 blr
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
@ 2020-10-20 16:25 ` jakub at gcc dot gnu.org
2020-10-20 16:49 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 16:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49409
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49409&action=edit
gcc11-pr97503.patch
While that is something that can and often is done during ifcvt, I think for
various architectures we can do it at the GIMPLE level too, as done in this
patch (untested so far).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
@ 2020-10-20 16:49 ` jakub at gcc dot gnu.org
2020-10-21 8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 16:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On the RTL side, there is simplify_cond_clz_ctz that should simplify it and
noce_try_ifelse_collapse that should be matching it (it does on x86 with -mbmi
-mlzcnt).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
2020-10-20 16:49 ` jakub at gcc dot gnu.org
@ 2020-10-21 8:54 ` cvs-commit at gcc dot gnu.org
2023-11-09 5:59 ` lh_mouse at 126 dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-21 8:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:5244b4af5e47bc98a2a9cf36f048981583a1b163
commit r11-4183-g5244b4af5e47bc98a2a9cf36f048981583a1b163
Author: Jakub Jelinek <jakub@redhat.com>
Date: Wed Oct 21 10:51:33 2020 +0200
phiopt: Optimize x ? __builtin_clz (x) : 32 in GIMPLE [PR97503]
While we have at the RTL level noce_try_ifelse_collapse combined with
simplify_cond_clz_ctz, that optimization doesn't always trigger because
e.g. on powerpc there is an define_insn to compare a reg against zero and
copy that register to another one and so we end up with a different pseudo
in the simplify_cond_clz_ctz test and punt.
For targets that define C?Z_DEFINED_VALUE_AT_ZERO to 2 for certain modes,
we can optimize it already in phiopt though, just need to ensure that
we transform the __builtin_c?z* calls into .C?Z ifns because my recent
VRP changes codified that the builtin calls are always undefined at zero,
while ifns honor C?Z_DEFINED_VALUE_AT_ZERO equal to 2.
And, in phiopt we already have popcount handling that does pretty much the
same thing, except for always using a zero value rather than the one set
by C?Z_DEFINED_VALUE_AT_ZERO.
So, this patch extends that function to handle not just popcount, but also
clz and ctz.
2020-10-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97503
* tree-ssa-phiopt.c: Include internal-fn.h.
(cond_removal_in_popcount_pattern): Rename to ...
(cond_removal_in_popcount_clz_ctz_pattern): ... this. Handle not
just
popcount, but also clz and ctz if it has C?Z_DEFINED_VALUE_AT_ZERO
2.
* gcc.dg/tree-ssa/pr97503.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
` (2 preceding siblings ...)
2020-10-21 8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
@ 2023-11-09 5:59 ` lh_mouse at 126 dot com
2023-11-09 6:05 ` lh_mouse at 126 dot com
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-09 5:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
LIU Hao <lh_mouse at 126 dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lh_mouse at 126 dot com
--- Comment #4 from LIU Hao <lh_mouse at 126 dot com> ---
Are there any reasons why this was not done for 64?
(https://gcc.godbolt.org/z/7vddPdxaP)
```
using int32_t = int;
using int64_t = long long;
using uint32_t = unsigned int;
using uint64_t = unsigned long long;
void
xlzcnt32(int32_t& val)
{
val = val ? (__builtin_clz(val) & 31) : 32;
}
void
xlzcnt64(int64_t& val)
{
val = val ? (__builtin_clzll(val) & 63) : 64;
}
```
results in
```
xlzcnt32(int&):
xor eax, eax
lzcnt eax, DWORD PTR [rdi]
mov DWORD PTR [rdi], eax
ret
xlzcnt64(long long&):
mov rdx, QWORD PTR [rdi]
xor eax, eax
lzcnt rax, rdx
test rdx, rdx
mov edx, 64
cmove rax, rdx
mov QWORD PTR [rdi], rax
ret
```
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
` (3 preceding siblings ...)
2023-11-09 5:59 ` lh_mouse at 126 dot com
@ 2023-11-09 6:05 ` lh_mouse at 126 dot com
2023-11-09 16:51 ` ubizjak at gmail dot com
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-09 6:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #5 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to LIU Hao from comment #4)
> lzcnt rax, rdx
> test rdx, rdx
> mov edx, 64
> cmove rax, rdx
There is actually another missed optimization here. LZCNT sets CF if the source
operand is zero. so the TEST instruction is totally unnecessary. We can do
this:
```
...
xor eax, eax
lzcnt rax, rdx
mov edx, 64 # or something else, whatever
cmovb eax, edx
```
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
` (4 preceding siblings ...)
2023-11-09 6:05 ` lh_mouse at 126 dot com
@ 2023-11-09 16:51 ` ubizjak at gmail dot com
2023-11-09 16:53 ` ubizjak at gmail dot com
2023-11-27 8:14 ` lh_mouse at 126 dot com
7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-09 16:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to LIU Hao from comment #4)
> Are there any reasons why this was not done for 64?
> (https://gcc.godbolt.org/z/7vddPdxaP)
There is zero-extension from the result of __builtin_clzll that confuses
optimizers.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
` (5 preceding siblings ...)
2023-11-09 16:51 ` ubizjak at gmail dot com
@ 2023-11-09 16:53 ` ubizjak at gmail dot com
2023-11-27 8:14 ` lh_mouse at 126 dot com
7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-09 16:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to LIU Hao from comment #4)
> > Are there any reasons why this was not done for 64?
> > (https://gcc.godbolt.org/z/7vddPdxaP)
>
> There is zero-extension from the result of __builtin_clzll that confuses
> optimizers.
Actually, sign-extension, but the result is never sign-extended.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
` (6 preceding siblings ...)
2023-11-09 16:53 ` ubizjak at gmail dot com
@ 2023-11-27 8:14 ` lh_mouse at 126 dot com
7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-27 8:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
--- Comment #8 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Uroš Bizjak from comment #7)
> Actually, sign-extension, but the result is never sign-extended.
Yes but it should be a no-op right?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-11-27 8:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
2020-10-20 16:49 ` jakub at gcc dot gnu.org
2020-10-21 8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
2023-11-09 5:59 ` lh_mouse at 126 dot com
2023-11-09 6:05 ` lh_mouse at 126 dot com
2023-11-09 16:51 ` ubizjak at gmail dot com
2023-11-09 16:53 ` ubizjak at gmail dot com
2023-11-27 8:14 ` lh_mouse at 126 dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).