public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
@ 2011-12-24  8:01 ` pinskia at gcc dot gnu.org
  2012-07-29  8:16 ` pinskia at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-12-24  8:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-12-24
     Ever Confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-12-24 06:46:16 UTC ---
Trying 9 -> 10:
Failed to match this instruction:
(set (reg:DI 69 [ D.1710 ])
    (sign_extend:DI (ctz:SI (reg:SI 67 [ x ]))))

Should be changed to zero_extend and then that could be matched really.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
  2011-12-24  8:01 ` [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended pinskia at gcc dot gnu.org
@ 2012-07-29  8:16 ` pinskia at gcc dot gnu.org
  2012-07-29 13:24 ` steven at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-07-29  8:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |neleai at seznam dot cz

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 08:15:14 UTC ---
*** Bug 54115 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
  2011-12-24  8:01 ` [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended pinskia at gcc dot gnu.org
  2012-07-29  8:16 ` pinskia at gcc dot gnu.org
@ 2012-07-29 13:24 ` steven at gcc dot gnu.org
  2012-07-30 11:37 ` ubizjak at gmail dot com
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-29 13:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2011-12-24 00:00:00         |2012-07-29 0:00
                 CC|                            |uros at gcc dot gnu.org

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-29 13:23:30 UTC ---
Tested with trunk r189929 on x86-64 (generic).

#define MODE signed // or unsigned
unsigned mask[8];

unsigned foo(unsigned y, unsigned char x)
{
  MODE int c = (MODE) __builtin_ctz(x);
  unsigned int m = mask[c];
  return y & m;
}


With MODE==signed:
        movzbl  %sil, %esi
        movl    %edi, %eax
        rep; bsfl       %esi, %esi
        movslq  %esi, %rsi
        andl    mask(,%rsi,4), %eax
        ret

With MODE==unsigned:
        movzbl  %sil, %esi
        movl    %edi, %eax
        rep; bsfl       %esi, %esi
        movl    %esi, %esi
        andl    mask(,%rsi,4), %eax
        ret

The "movl %esi, %esi" is a zeroextendsidi:
#(insn:TI 9 2 10 2 (parallel [
#            (set (reg:SI 4 si [orig:60 D.1716 ] [60])
#                (ctz:SI (reg:SI 4 si [orig:68 D.1715 ] [68])))
#            (clobber (reg:CC 17 flags))
#        ]) t.c:6 666 {ctzsi2}
#     (expr_list:REG_UNUSED (reg:CC 17 flags)
#        (nil)))
        rep; bsfl       %esi, %esi      # 9     ctzsi2  [length = 4]
#(insn:TI 10 9 12 2 (set (reg:DI 4 si [orig:70 D.1716 ] [70])
#        (zero_extend:DI (reg:SI 4 si [orig:60 D.1716 ] [60]))) t.c:7 113
{*zero_extendsidi2_rex64}
#     (nil))
        movl    %esi, %esi      # 10    *zero_extendsidi2_rex64/1       [length
= 2]

Perhaps REE can be taught about ctz giving a non-negative result.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2012-07-29 13:24 ` steven at gcc dot gnu.org
@ 2012-07-30 11:37 ` ubizjak at gmail dot com
  2012-07-30 12:49 ` ubizjak at gmail dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-30 11:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-30 11:36:23 UTC ---
> Perhaps REE can be taught about ctz giving a non-negative result.

Maybe we need some VRP pass to remove these extensions. Please note an example
from (duplicate) PR54115, where we generate:

#include <stdint.h>
uint64_t foo(long x){
 return __builtin_ctzl(x);
}

foo:
  bsfq  %rdi, %rax
  cltq

this is DImode -> DImode operation, followed by sign-extend from SImode partial
reg.

Your examples in comment #3:

  bsfl %esi, %esi
  movslq  %esi, %rsi

can be "fixed" by slapping any_extend:DI to CTZ pattern, to consume either
ZERO_EXTEND or SIGN_EXTEND of the value (it doesn't matter for ranges [0..(some
small number)]).

The DImode example above can be "fixed" by adding SUBREG to all patterns. But,
I think that there is more optimal implementation than exploding the number of
bit manipulating patterns.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2012-07-30 11:37 ` ubizjak at gmail dot com
@ 2012-07-30 12:49 ` ubizjak at gmail dot com
  2013-07-03 22:40 ` sgunderson at bigfoot dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2012-07-30 12:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2012-07-30 12:48:19 UTC ---
Created attachment 27895
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27895
Target patch that handles ctz extensions

x86 target patch that teaches gcc about extensions for ctz.

Patched gcc generates no extension for:

long long foo (int x) { return __builtin_ctz (x); }

and

int foo (long long x) { return __builtin_ctzll (x); }

(and their unsigned variants) on x86.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2012-07-30 12:49 ` ubizjak at gmail dot com
@ 2013-07-03 22:40 ` sgunderson at bigfoot dot com
  2013-07-04 10:39 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: sgunderson at bigfoot dot com @ 2013-07-03 22:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #7 from sgunderson at bigfoot dot com ---
Wait, sorry, someone's already pointed that out. Ignore me, then...

I can at least confirm it still happens with GCC 4.8.1.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-07-03 22:40 ` sgunderson at bigfoot dot com
@ 2013-07-04 10:39 ` jakub at gcc dot gnu.org
  2013-07-04 18:44 ` glisse at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-04 10:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 30453
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30453&action=edit
gcc49-pr29776.patch

Untested VRP patch.  There is SSA_NAME_RANGE_INFO support being reviewed
currently which will make VRP info persistent and some zero/sign-extension pass
that makes use of that info, perhaps that will take care of the rest.

I'm not 100% sure about CLZ/CTZ in the patch, because it could return any value
for argument of 0, but as we document it as undefined behavior, perhaps it is
fine.

With more code we could do even better than this and define from argument's VR
smaller VR_RANGE than this patch does, say for ffs if 0 isn't in the VR_RANGE
of the argument, the minimum value of the builtin won't be 0, but 1, and from
maximum value we could take floor_log of the maximum value plus 1 as the
maximum of the resulting range.  Ditto for popcount, for parity the current [0,
1] is sufficient, for clz we could use precision minus floor_log of the maximum
value
as the minimum of resulting vr, for ctz derive from it maximum.  Dunno if it is
worth it though.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-07-04 10:39 ` jakub at gcc dot gnu.org
@ 2013-07-04 18:44 ` glisse at gcc dot gnu.org
  2013-07-04 19:08 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-07-04 18:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to dean from comment #10)
> this is unfortunate really -- the newer LZCNT/TZCNT instructions on x86 
> explicitly return 8*sizeof(type) for input of zero

Same as the power cntlz* instructions. And in both cases we implement
intrinsics as a call to __builtin_clz*. So it would make sense to assume this
behavior in the optimizations (if it is undefined behavior, picking this
behavior won't break more things than assuming it never happens).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-07-04 18:44 ` glisse at gcc dot gnu.org
@ 2013-07-04 19:08 ` jakub at gcc dot gnu.org
  2013-07-05  6:52 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-04 19:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Not all CPUs that have defined behavior for 0 define it to the precision
though, and even on i?86 it is undefined even when using lzcnt/tzcnt on older
CPUs.
Even the libgcc routines provide various return values for 0 depending on
target (look for COUNT_LEADING_ZEROS_0).
So it is not an option to redefine the builtins, they are undefined behavior
for 0 and that can't change.
There is __builtin_ffs that provides defined behavior for 0.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2013-07-04 19:08 ` jakub at gcc dot gnu.org
@ 2013-07-05  6:52 ` jakub at gcc dot gnu.org
  2013-07-05  7:38 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-05  6:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Defining the value at this point would mean that we'd need to generate
slower/larger code on various targets (including i386/x86_64 when not
targetting the really most recent CPUs only) and possibly create new libgcc
entrypoints, at least for targets where the current library routine doesn't
return the value that would be newly defined.  That would slow down users that
know they aren't passing 0 to the builtins (all correct ones).
If you want builtins with defined behavior for 0, the only way is to add new
ones, but the question is what the return value should be for 0.
Or __builtin_ffs can be used, but that is 0 for 0 and otherwise 1 +
__builtin_ctz, so if you wanted ctz from that, you'd subtract one, which would
imply that for 0 this ctz variant would return -1.

BTW, the patch broke regtest on x86_64, because libgcc contained code like:
void foo (__int128_t u)
{
  if ((long long) u == u)
    return something;
  long long hi = u >> 64;
  if (hi < 0)
    hi = -hi;
  int count = __builtin_clzll (hi);
  if (count == 0)
    return somethingelse;
  ...
}

VRP figured out that count can't be 0 in a valid program (which is true), so I
had to fix it up to do hi = -(unsigned long long) hi; so that it doesn't
trigger undefined behavior for the minimum value of u.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2013-07-05  6:52 ` jakub at gcc dot gnu.org
@ 2013-07-05  7:38 ` jakub at gcc dot gnu.org
  2013-07-05  8:02 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-05  7:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Unfortunately looking at longlong.h, the header on some targets uses
__builtin_clz* or __builtin_ctz* even with defining COUNT_LEADING_ZEROS_0 to
some value (e.g. on alpha, arm, avr, coldfire, mips).
libgcc itself doesn't use COUNT_LEADING_ZEROS_0 macro, nor does glibc, but
wonder if other projects don't use it.
Code that happened to work (albeit, according to gcc documentation (but not
longlong.h) undefined) could break with my patch.
So, either we'd need to change longlong.h not to define COUNT_LEADING_ZEROS_0
when count_leading_zeros is implemented using the builtin, or we'd need to be
more conservative in the VRP patch.  The problem is what is more conservative.
The C?Z_DEFINED_VALUE_AT_ZERO macros aren't unfortunately immediately usable,
because they are well defined just for some integer modes (the ones for which
there is some hw insn or at least optab), while the builtin might be for some
other mode (larger or smaller).  If the builtin is for smaller mode and HW
supports only wider insns, generally clz is expanded as clz<larger> +
(<largerbitsize> - <thisbitsize>), ctz just as doing larger ctz and not
adjusting in any way.  So perhaps VRP could check whether there is an optab
for the mode of the builtin and if yes, look at its C.Z_DEFINED_VALUE_AT_ZERO,
otherwise assume undefined.  That would let the longlong.h work, supposedly
it uses the builtins only for the modes which are in hw (otherwise the libgcc
implementation using those macros would be not working).
Because trying to handle all the cases would be a mess, looking at
CLZ_DEFINED_VALUE_AT_ZERO when it is defined it is usually number of bits in
the mode and even the various extra clz expansions (both the widening which is
expanded as wider clz + difference of the wider and narrower mode bitsize and
doubleword narrowing which is expanded as runtime non-zero check on the first
word and either clz of the first word or bitsize difference + clz of second
word) probably honor it, but for ctz some targets use bitsize, some targets use
-1, but e.g. when widening we'd happily return bitsize of the larger mode and
not adjust, or when ctz is expanded using clz it can be again different.

So, thoughts on this?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2013-07-05  7:38 ` jakub at gcc dot gnu.org
@ 2013-07-05  8:02 ` jakub at gcc dot gnu.org
  2013-07-05  8:28 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-05  8:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or yet another option is just to always use [0, prec] VR for CLZ (and lower it
for defined range, as the patch does right now), i.e. for CLZ
change maxi = prec - 1; to maxi = prec; and drop if (maxi == prec) maxi = prec
- 1;, looking for all targets where the value for 0 is defined for CLZ it is
mode size, and it ought to behave well even with the widening/narrowing
expansions.

And for CTZ just leave 0 as undefined, because there really are just too many
possibilities what you can get, and it isn't relevant to longlong.h either,
because there is no COUNT_TRAILING_ZEROS_0 macro.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2013-07-05  8:02 ` jakub at gcc dot gnu.org
@ 2013-07-05  8:28 ` jakub at gcc dot gnu.org
  2013-07-06  9:54 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-05  8:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Also, perhaps expand_builtin_unop for the int bitops builtin
(ffs/parity/popcount/clz/ctz/clrsb) optabs, if target returned by expand_unop
is wider than target_mode, perhaps we could set SUBREG_PROMOTED_VAR_P
and !SUBREG_PROMOTED_UNSIGNED_P.  When the builtins are expanded using libcall,
the libcalls return int and so does the builtin, thus the target_mode should
then be equal to what we have, and otherwise I'd think the result should be
already extended, as in target should be the return value, not some value with
target value only in the low bits.  But that would require verification for all
targets.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2013-07-05  8:28 ` jakub at gcc dot gnu.org
@ 2013-07-06  9:54 ` jakub at gcc dot gnu.org
  2021-08-08 22:48 ` pinskia at gcc dot gnu.org
  2023-03-21 11:54 ` adam.warner.nz at gmail dot com
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-06  9:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Sat Jul  6 09:34:17 2013
New Revision: 200731

URL: http://gcc.gnu.org/viewcvs?rev=200731&root=gcc&view=rev
Log:
    PR target/29776
    * fold-const.c (tree_call_nonnegative_warnv_p): Return true
    for BUILT_IN_C{LZ,LRSB}*.
    * tree.h (CASE_INT_FN): Add FN##IMAX case.
    * tree-vrp.c (extract_range_basic): Handle
    BUILT_IN_{FFS,PARITY,POPCOUNT,C{LZ,TZ,LRSB}}*.  For
    BUILT_IN_CONSTANT_P if argument isn't (D) of PARM_DECL,
    fall thru to code calling set_value*.
    * builtins.c (expand_builtin): Remove *IMAX cases.
    (fold_builtin_bitop): For BUILT_IN_CLRSB* return NULL_TREE
    if width is bigger than 2*HWI.

    * libgcc2.c (__floattisf): Avoid undefined signed overflow.

    * gcc.dg/tree-ssa/vrp89.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/vrp89.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vrp.c
    trunk/gcc/tree.h
    trunk/libgcc/ChangeLog
    trunk/libgcc/libgcc2.c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2013-07-06  9:54 ` jakub at gcc dot gnu.org
@ 2021-08-08 22:48 ` pinskia at gcc dot gnu.org
  2023-03-21 11:54 ` adam.warner.nz at gmail dot com
  15 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-08 22:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gpiez at web dot de

--- Comment #24 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 50168 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended
       [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-08-08 22:48 ` pinskia at gcc dot gnu.org
@ 2023-03-21 11:54 ` adam.warner.nz at gmail dot com
  15 siblings, 0 replies; 16+ messages in thread
From: adam.warner.nz at gmail dot com @ 2023-03-21 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #25 from Adam Warner <adam.warner.nz at gmail dot com> ---
Documenting a workaround I've found for the unnecessary sign extension. I'm
still perplexed at the improbability of this appearing to work!

workaround_bsr_sign_extension.c:

#include <stdint.h>

uint64_t bsr_u64(uint64_t a) {
  if (sizeof(unsigned long) == 8) return 63 - __builtin_clzl(a);
  if (sizeof(unsigned long long) == 8) return 63 - __builtin_clzll(a);
}

uint64_t bsr_u64_alt(uint64_t a) {
  if (sizeof(unsigned long) == 8) return UINT64_C(63) - __builtin_clzl(a);
  if (sizeof(unsigned long long) == 8) return UINT64_C(63) -
__builtin_clzll(a);
}

int main(void) {return 0;}

$ gcc -O3 workaround_bsr_sign_extension.c && objdump -d -m i386:x86-64:intel
a.out|less

Relevant output:

0000000000001140 <bsr_u64>:
    1140:       48 0f bd c7             bsr    rax,rdi
    1144:       48 98                   cdqe
    1146:       c3                      ret
    1147:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
    114e:       00 00 

0000000000001150 <bsr_u64_alt>:
    1150:       48 0f bd c7             bsr    rax,rdi
    1154:       c3                      ret

In the alternative implementation the superfluous 32- to 64-bit sign extension
instruction CDQE is no longer generated.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-21 11:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-29776-4@http.gcc.gnu.org/bugzilla/>
2011-12-24  8:01 ` [Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended pinskia at gcc dot gnu.org
2012-07-29  8:16 ` pinskia at gcc dot gnu.org
2012-07-29 13:24 ` steven at gcc dot gnu.org
2012-07-30 11:37 ` ubizjak at gmail dot com
2012-07-30 12:49 ` ubizjak at gmail dot com
2013-07-03 22:40 ` sgunderson at bigfoot dot com
2013-07-04 10:39 ` jakub at gcc dot gnu.org
2013-07-04 18:44 ` glisse at gcc dot gnu.org
2013-07-04 19:08 ` jakub at gcc dot gnu.org
2013-07-05  6:52 ` jakub at gcc dot gnu.org
2013-07-05  7:38 ` jakub at gcc dot gnu.org
2013-07-05  8:02 ` jakub at gcc dot gnu.org
2013-07-05  8:28 ` jakub at gcc dot gnu.org
2013-07-06  9:54 ` jakub at gcc dot gnu.org
2021-08-08 22:48 ` pinskia at gcc dot gnu.org
2023-03-21 11:54 ` adam.warner.nz at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).