public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106471] New: Strange code generation for __builtin_ctzl()
@ 2022-07-28 23:37 torvalds@linux-foundation.org
2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-28 23:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Bug ID: 106471
Summary: Strange code generation for __builtin_ctzl()
Product: gcc
Version: 12.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: torvalds@linux-foundation.org
Target Milestone: ---
So this actually started out as a clang issue report about bad inline asm input
argument behavior at
https://github.com/llvm/llvm-project/issues/56789
but as part of that I was looking at __builtin_ctzl() and while gcc DTRT for
the inline asm version we use in the kernel, the builtin acts very oddly
indeed.
IOW, this code:
unsigned long test(unsigned long arg)
{
return __builtin_ctzl(arg);
}
generates this odd result with 'gcc -O2 -S':
xorl %eax, %eax
rep bsfq %rdi, %rax
cltq
ret
where the xor and the cltq both just confuse me.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug c/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
@ 2022-07-28 23:39 ` torvalds@linux-foundation.org
2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-28 23:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #1 from Linus Torvalds <torvalds@linux-foundation.org> ---
Created attachment 53379
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53379&action=edit
Silly test-case as an attachment too
I expected just
rep bsfq %rdi, %rax
ret
from this, but that's not what gcc generates.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
@ 2022-07-28 23:48 ` pinskia at gcc dot gnu.org
2022-07-28 23:56 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-28 23:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The xor is needed because of an errata in some Intel cores.
There is a different bug already recording the issue with cltq (and should be
fixed soon or was already committed, there is a patch).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
@ 2022-07-28 23:56 ` pinskia at gcc dot gnu.org
2022-07-29 0:01 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-28 23:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting:
/* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
for bit-manipulation instructions. */
DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_LUJIAZUI
| m_GENERIC)
See PR 62011 for more details on that one.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (2 preceding siblings ...)
2022-07-28 23:56 ` pinskia at gcc dot gnu.org
@ 2022-07-29 0:01 ` pinskia at gcc dot gnu.org
2022-07-29 0:08 ` torvalds@linux-foundation.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-29 0:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598930.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (3 preceding siblings ...)
2022-07-29 0:01 ` pinskia at gcc dot gnu.org
@ 2022-07-29 0:08 ` torvalds@linux-foundation.org
2022-07-29 0:16 ` torvalds@linux-foundation.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-29 0:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #5 from Linus Torvalds <torvalds@linux-foundation.org> ---
(In reply to Andrew Pinski from comment #2)
> The xor is needed because of an errata in some Intel cores.
The only errata I'm aware of is that tzcnt can act as tzcnt even when cpuid
doesn't enumerate it (so it would be expected to act as just bsf). Errata 010
for Gen 8/9 cores.
And yes, that's an errata, but the xor doesn't really help.
Sure, the xor means that on old machines, where 'rep' is ignored, and tzcnt
will always act as bsf, the result register is now going to be zero if the
input is zero.
But that's
(a) not what tzcnt does (it sets the result to 64 when the input is zero)
(b) not what __builtin_ctzl() is documented to do anyway
In particular, wrt (b), the documentation already states
"If x is 0, the result is undefined"
which is exactly the old legacy 'bsf' behavior.
And the errata I'm aware of is that 'rep bsf' acts as tzcnt (ie "write 64 to
destination instead of leave unmodified"), so even with the xor you actually
get undefined behavior (0 or 64 depending on CPU).
So both (a) and (b) argue for that xor being wrong.
Now, of course, there may be some other errata that I'm not aware of. Can
somebody point to it?
(And yes, on old CPUs that don't have tzcnt at all the added xor will break a
false dependency and maybe help performance, but should gcc really care about
old CPUs like that? Particularly when it eats a register and makes it
impossible to have the same source and destination register?)
> There is a different bug already recording the issue with cltq (and should
> be fixed soon or was already committed, there is a patch).
Ok, thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (4 preceding siblings ...)
2022-07-29 0:08 ` torvalds@linux-foundation.org
@ 2022-07-29 0:16 ` torvalds@linux-foundation.org
2022-07-29 7:58 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-29 0:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #6 from Linus Torvalds <torvalds@linux-foundation.org> ---
Ahh, crossed comments.
(In reply to Andrew Pinski from comment #3)
> The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting:
>
> /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
> for bit-manipulation instructions. */
Ahh, ok, so it is indeed for that false dependency for old cpus.
Intel claims that's only for POPCNT on more recent CPU's.
But yes, the same thing definitely happened for BSF for much older cores (ie
pre-TZCNT).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] Strange code generation for __builtin_ctzl()
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (5 preceding siblings ...)
2022-07-29 0:16 ` torvalds@linux-foundation.org
@ 2022-07-29 7:58 ` rguenth at gcc dot gnu.org
2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-29 7:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
Keywords| |missed-optimization
CC| |sayle at gcc dot gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Roger is working on the sign-/zero-extension issue. The false dependence thing
would be for adjusting generic tuning for more modern CPUs by default, I'm not
sure what's the oldest families still affected here.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (6 preceding siblings ...)
2022-07-29 7:58 ` rguenth at gcc dot gnu.org
@ 2023-05-18 23:30 ` pinskia at gcc dot gnu.org
2023-05-19 1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
2023-05-19 1:53 ` crazylht at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-18 23:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-05-18
Status|UNCONFIRMED |NEW
Summary|Strange code generation for |-mpcu=generic might want to
|__builtin_ctzl() |remove
| |X86_TUNE_AVOID_FALSE_DEP_FO
| |R_BMI now
Ever confirmed|0 |1
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. I will let someone else make the decision on when this happens.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] -march=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (7 preceding siblings ...)
2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
@ 2023-05-19 1:52 ` crazylht at gmail dot com
2023-05-19 1:53 ` crazylht at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2023-05-19 1:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since icelake.
At least for icelake and later intel Core processors, the errata tune is not
needed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/106471] -march=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
` (8 preceding siblings ...)
2023-05-19 1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
@ 2023-05-19 1:53 ` crazylht at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2023-05-19 1:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
The tune is added in PR62011
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-19 1:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
2022-07-28 23:56 ` pinskia at gcc dot gnu.org
2022-07-29 0:01 ` pinskia at gcc dot gnu.org
2022-07-29 0:08 ` torvalds@linux-foundation.org
2022-07-29 0:16 ` torvalds@linux-foundation.org
2022-07-29 7:58 ` rguenth at gcc dot gnu.org
2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
2023-05-19 1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
2023-05-19 1:53 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).