* [Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
@ 2011-08-23 17:00 ` jakub at gcc dot gnu.org
2011-08-23 18:07 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-23 17:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-23 16:41:34 UTC ---
I'll look at this.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
2011-08-23 17:00 ` [Bug c/50168] " jakub at gcc dot gnu.org
@ 2011-08-23 18:07 ` jakub at gcc dot gnu.org
2011-08-23 22:01 ` gpiez at web dot de
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-23 18:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |uros at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-23 17:58:52 UTC ---
Those aren't equivalent unfortunately, because bsf and bsr insns on x86 have
undefined value if the source is zero. While __builtin_c[lt]z* documentation
says that the result is undefined in that case, I wonder if it would be fine
even if long l = (int) __builtin_c[lt]z* (x); gave a value that wasn't actually
sign-extended to 64 bits.
The combiner already simplifies zero or sign extension of popcount/parity/ffs
and, if ctz or clz value is defined at zero, also those, but if it is undefined
it assumes anything in any of the bits and thus can't optimize the sign/zero
extension away. With -mbmi it will be optimized just fine, because for tzcnt
(and lzcnt for -mlzcnt) insns are well defined even for source operand zero.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
2011-08-23 17:00 ` [Bug c/50168] " jakub at gcc dot gnu.org
2011-08-23 18:07 ` jakub at gcc dot gnu.org
@ 2011-08-23 22:01 ` gpiez at web dot de
2011-08-23 23:53 ` gpiez at web dot de
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gpiez at web dot de @ 2011-08-23 22:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #3 from Gunther Piez <gpiez at web dot de> 2011-08-23 21:54:40 UTC ---
On 23.08.2011 19:58, jakub at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
>
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |uros at gcc dot gnu.org
>
> --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-23 17:58:52 UTC ---
> Those aren't equivalent unfortunately, because bsf and bsr insns on x86 have
> undefined value if the source is zero. While __builtin_c[lt]z* documentation
> says that the result is undefined in that case, I wonder if it would be fine
> even if long l = (int) __builtin_c[lt]z* (x); gave a value that wasn't actually
> sign-extended to 64 bits.
> The combiner already simplifies zero or sign extension of popcount/parity/ffs
> and, if ctz or clz value is defined at zero, also those, but if it is undefined
> it assumes anything in any of the bits and thus can't optimize the sign/zero
> extension away. With -mbmi it will be optimized just fine, because for tzcnt
> (and lzcnt for -mlzcnt) insns are well defined even for source operand zero.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (2 preceding siblings ...)
2011-08-23 22:01 ` gpiez at web dot de
@ 2011-08-23 23:53 ` gpiez at web dot de
2011-08-24 8:18 ` [Bug middle-end/50168] " rguenth at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gpiez at web dot de @ 2011-08-23 23:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #4 from Gunther Piez <gpiez at web dot de> 2011-08-23 22:00:31 UTC ---
On 23.08.2011 19:58, jakub at gcc dot gnu.org wrote:
> While __builtin_c[lt]z* documentation
> says that the result is undefined in that case, I wonder if it would be fine
> even if long l = (int) __builtin_c[lt]z* (x); gave a value that wasn't actually
> sign-extended to 64 bits.
So that software operating on the assumption that the value return by
__builtin_c[lt]z* is always int, even in the undefined case, would break
as soon at it sees a value outside the int range. Which could very well
be the case, AFAIK in the zero case the value of the target register is
just unchanged.
IMHO this is ok, I doubt that such code exists and even if, it is very
broken by design :-)
Just my 2 cent.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (3 preceding siblings ...)
2011-08-23 23:53 ` gpiez at web dot de
@ 2011-08-24 8:18 ` rguenth at gcc dot gnu.org
2011-08-24 9:41 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-24 8:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |middle-end
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-24 08:16:06 UTC ---
I'd say "undefined" is a good enough reason to optimize the case (as opposed
to "target implementation defined" whose behavior we'd need to preserve).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (4 preceding siblings ...)
2011-08-24 8:18 ` [Bug middle-end/50168] " rguenth at gcc dot gnu.org
@ 2011-08-24 9:41 ` jakub at gcc dot gnu.org
2011-08-24 9:47 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-24 9:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-24 09:21:11 UTC ---
But it is just undefined value, not undefined behavior. If you do:
--- rtlanal.c 2011-08-23 19:46:13.000000000 +0200
+++ rtlanal.c 2011-08-24 11:18:01.720582231 +0200
@@ -4256,21 +4256,17 @@ nonzero_bits1 (const_rtx x, enum machine
case CLZ:
/* If CLZ has a known value at zero, then the nonzero bits are
that value, plus the number of bits in the mode minus one. */
- if (CLZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
- nonzero
- |= ((unsigned HOST_WIDE_INT) 1 << (floor_log2 (mode_width))) - 1;
- else
- nonzero = -1;
+ if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
+ nonzero = 0;
+ nonzero |= ((unsigned HOST_WIDE_INT) 1 << (floor_log2 (mode_width))) -
1;
break;
case CTZ:
/* If CTZ has a known value at zero, then the nonzero bits are
that value, plus the number of bits in the mode minus one. */
- if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
- nonzero
- |= ((unsigned HOST_WIDE_INT) 1 << (floor_log2 (mode_width))) - 1;
- else
- nonzero = -1;
+ if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
+ nonzero = 0;
+ nonzero |= ((unsigned HOST_WIDE_INT) 1 << (floor_log2 (mode_width))) -
1;
break;
case CLRSB:
then __builtin_clzl (x) & 0xff will happily give 0xdeadbeef as result if x
happens to be 0. Similarly long y = (short) __builtin_clzl (x) will result in
y being any value from LONG_MIN to LONG_MAX, instead of any value from
SHORT_MIN to SHORT_MAX.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (5 preceding siblings ...)
2011-08-24 9:41 ` jakub at gcc dot gnu.org
@ 2011-08-24 9:47 ` rguenth at gcc dot gnu.org
2011-08-24 9:49 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-24 9:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011-08-24
Ever Confirmed|0 |1
--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-24 09:40:44 UTC ---
Hm, so the best thing we can do is a peephole recognizing that
10: 48 0f bc ff bsf %rdi,%rdi
14: 48 63 ff movslq %edi,%rdi
when rdi is zero it will stay so and the movslq is redundant?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (6 preceding siblings ...)
2011-08-24 9:47 ` rguenth at gcc dot gnu.org
@ 2011-08-24 9:49 ` jakub at gcc dot gnu.org
2011-08-24 9:53 ` jakub at gcc dot gnu.org
2021-08-08 22:48 ` [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate extra sign extend " pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-24 9:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-24 09:45:56 UTC ---
What we IMHO should optimize and don't currently is the redundant sign
extension when using __builtin_ffsl - as it internally uses bsf + cmove,
nonzero_bits isn't able to figure out that the result of the sequence is
guaranteed to have nonzero-bits. Perhaps we should in that case add a
REG_EQUAL note to the last insn in the sequence and perhaps nonzero_bits could
also look at REG_EQUAL notes. Doing that could perhaps help even testcases
like:
int foo (long x)
{
return __builtin_popcountl (x) & 0xff;
}
where the andl $0x255, %eax could be optimized away, etc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (7 preceding siblings ...)
2011-08-24 9:49 ` jakub at gcc dot gnu.org
@ 2011-08-24 9:53 ` jakub at gcc dot gnu.org
2021-08-08 22:48 ` [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate extra sign extend " pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-08-24 9:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-24 09:48:20 UTC ---
(In reply to comment #7)
> Hm, so the best thing we can do is a peephole recognizing that
>
> 10: 48 0f bc ff bsf %rdi,%rdi
> 14: 48 63 ff movslq %edi,%rdi
>
> when rdi is zero it will stay so and the movslq is redundant?
I don't think so. While the SandyBridge CPU seems to behave that way, there is
no such guarantee in the Intel manuals which say that the destination value is
undefined if zero-flag is cleared (i.e. the source was zero) and thus some
older or future CPUs might behave differently.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate extra sign extend on x86_64
2011-08-23 16:40 [Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64 gpiez at web dot de
` (8 preceding siblings ...)
2011-08-24 9:53 ` jakub at gcc dot gnu.org
@ 2021-08-08 22:48 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-08 22:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |DUPLICATE
--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 29776.
*** This bug has been marked as a duplicate of bug 29776 ***
^ permalink raw reply [flat|nested] 11+ messages in thread