public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106471] New: Strange code generation for __builtin_ctzl()
@ 2022-07-28 23:37 torvalds@linux-foundation.org
  2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-28 23:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

            Bug ID: 106471
           Summary: Strange code generation for __builtin_ctzl()
           Product: gcc
           Version: 12.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: torvalds@linux-foundation.org
  Target Milestone: ---

So this actually started out as a clang issue report about bad inline asm input
argument behavior at

   https://github.com/llvm/llvm-project/issues/56789

but as part of that I was looking at __builtin_ctzl() and while gcc DTRT for
the inline asm version we use in the kernel, the builtin acts very oddly
indeed.

IOW, this code:

        unsigned long test(unsigned long arg)
        {
                return __builtin_ctzl(arg);
        }

generates this odd result with 'gcc -O2 -S':

        xorl    %eax, %eax
        rep bsfq        %rdi, %rax
        cltq
        ret

where the xor and the cltq both just confuse me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
@ 2022-07-28 23:39 ` torvalds@linux-foundation.org
  2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-28 23:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #1 from Linus Torvalds <torvalds@linux-foundation.org> ---
Created attachment 53379
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53379&action=edit
Silly test-case as an attachment too

I expected just

        rep bsfq %rdi, %rax
        ret

from this, but that's not what gcc generates.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
  2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
@ 2022-07-28 23:48 ` pinskia at gcc dot gnu.org
  2022-07-28 23:56 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-28 23:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The xor is needed because of an errata in some Intel cores.

There is a different bug already recording the issue with cltq (and should be
fixed soon or was already committed, there is a patch).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
  2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
  2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
@ 2022-07-28 23:56 ` pinskia at gcc dot gnu.org
  2022-07-29  0:01 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-28 23:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting:

/* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
   for bit-manipulation instructions.  */
DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
          m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_LUJIAZUI
     | m_GENERIC)

See PR 62011 for more details on that one.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (2 preceding siblings ...)
  2022-07-28 23:56 ` pinskia at gcc dot gnu.org
@ 2022-07-29  0:01 ` pinskia at gcc dot gnu.org
  2022-07-29  0:08 ` torvalds@linux-foundation.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-29  0:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598930.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (3 preceding siblings ...)
  2022-07-29  0:01 ` pinskia at gcc dot gnu.org
@ 2022-07-29  0:08 ` torvalds@linux-foundation.org
  2022-07-29  0:16 ` torvalds@linux-foundation.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-29  0:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #5 from Linus Torvalds <torvalds@linux-foundation.org> ---
(In reply to Andrew Pinski from comment #2)
> The xor is needed because of an errata in some Intel cores.

The only errata I'm aware of is that tzcnt can act as tzcnt even when cpuid
doesn't enumerate it (so it would be expected to act as just bsf). Errata 010
for Gen 8/9 cores.

And yes, that's an errata, but the xor doesn't really help.

Sure, the xor means that on old machines, where 'rep' is ignored, and tzcnt
will always act as bsf, the result register is now going to be zero if the
input is zero.

But that's

 (a) not what tzcnt does (it sets the result to 64 when the input is zero)

 (b) not what __builtin_ctzl() is documented to do anyway

In particular, wrt (b), the documentation already states

 "If x is 0, the result is undefined"

which is exactly the old legacy 'bsf' behavior.

And the errata I'm aware of is that 'rep bsf' acts as tzcnt (ie "write 64 to
destination instead of leave unmodified"), so even with the xor you actually
get undefined behavior (0 or 64 depending on CPU).

So both (a) and (b) argue for that xor being wrong.

Now, of course, there may be some other errata that I'm not aware of. Can
somebody point to it?

(And yes, on old CPUs that don't have tzcnt at all the added xor will break a
false dependency and maybe help performance, but should gcc really care about
old CPUs like that? Particularly when it eats a register and makes it
impossible to have the same source and destination register?)

> There is a different bug already recording the issue with cltq (and should
> be fixed soon or was already committed, there is a patch).

Ok, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (4 preceding siblings ...)
  2022-07-29  0:08 ` torvalds@linux-foundation.org
@ 2022-07-29  0:16 ` torvalds@linux-foundation.org
  2022-07-29  7:58 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: torvalds@linux-foundation.org @ 2022-07-29  0:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #6 from Linus Torvalds <torvalds@linux-foundation.org> ---
Ahh, crossed comments.

(In reply to Andrew Pinski from comment #3)
> The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting:
> 
> /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
>    for bit-manipulation instructions.  */

Ahh, ok, so it is indeed for that false dependency for old cpus.

Intel claims that's only for POPCNT on more recent CPU's.

But yes, the same thing definitely happened for BSF for much older cores (ie
pre-TZCNT).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] Strange code generation for __builtin_ctzl()
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (5 preceding siblings ...)
  2022-07-29  0:16 ` torvalds@linux-foundation.org
@ 2022-07-29  7:58 ` rguenth at gcc dot gnu.org
  2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-29  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
           Keywords|                            |missed-optimization
                 CC|                            |sayle at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Roger is working on the sign-/zero-extension issue.  The false dependence thing
would be for adjusting generic tuning for more modern CPUs by default, I'm not
sure what's the oldest families still affected here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (6 preceding siblings ...)
  2022-07-29  7:58 ` rguenth at gcc dot gnu.org
@ 2023-05-18 23:30 ` pinskia at gcc dot gnu.org
  2023-05-19  1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
  2023-05-19  1:53 ` crazylht at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-18 23:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-05-18
             Status|UNCONFIRMED                 |NEW
            Summary|Strange code generation for |-mpcu=generic might want to
                   |__builtin_ctzl()            |remove
                   |                            |X86_TUNE_AVOID_FALSE_DEP_FO
                   |                            |R_BMI now
     Ever confirmed|0                           |1

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. I will let someone else make the decision on when this happens.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] -march=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (7 preceding siblings ...)
  2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
@ 2023-05-19  1:52 ` crazylht at gmail dot com
  2023-05-19  1:53 ` crazylht at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2023-05-19  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since icelake.
At least for icelake and later intel Core processors, the errata tune is not
needed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/106471] -march=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now
  2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
                   ` (8 preceding siblings ...)
  2023-05-19  1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
@ 2023-05-19  1:53 ` crazylht at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2023-05-19  1:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
The tune is added in PR62011

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-19  1:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-28 23:37 [Bug c/106471] New: Strange code generation for __builtin_ctzl() torvalds@linux-foundation.org
2022-07-28 23:39 ` [Bug c/106471] " torvalds@linux-foundation.org
2022-07-28 23:48 ` [Bug target/106471] " pinskia at gcc dot gnu.org
2022-07-28 23:56 ` pinskia at gcc dot gnu.org
2022-07-29  0:01 ` pinskia at gcc dot gnu.org
2022-07-29  0:08 ` torvalds@linux-foundation.org
2022-07-29  0:16 ` torvalds@linux-foundation.org
2022-07-29  7:58 ` rguenth at gcc dot gnu.org
2023-05-18 23:30 ` [Bug target/106471] -mpcu=generic might want to remove X86_TUNE_AVOID_FALSE_DEP_FOR_BMI now pinskia at gcc dot gnu.org
2023-05-19  1:52 ` [Bug target/106471] -march=generic " crazylht at gmail dot com
2023-05-19  1:53 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).