public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz
@ 2022-02-04  1:10 gabravier at gmail dot com
  2022-02-04  4:26 ` [Bug tree-optimization/104376] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2022-02-04  1:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

            Bug ID: 104376
           Summary: Failure to optimize clz equivalent to clz
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include <stdint.h>

uint32_t countLeadingZeros32(uint32_t x)
{
    if (x == 0)
        return 32;
    return (31 - __builtin_clz(x)) ^ 31;
}

On x86, with `-mlzcnt`, GCC outputs this:

countLeadingZeros32(unsigned int):
  mov eax, 32
  test edi, edi
  je .L1
  mov eax, 31
  lzcnt edi, edi
  sub eax, edi
  xor eax, 31
.L1:
  ret

LLVM instead outputs this:

countLeadingZeros32(unsigned int):
  lzcnt eax, edi
  ret

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
@ 2022-02-04  4:26 ` pinskia at gcc dot gnu.org
  2022-02-04  4:27 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-04  4:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2022-02-04
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
             Target|                            |aarch64-*-* x86_64-*-*
                   |                            |(with -mlzcnt)

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There are two issues, both are tree level issues, though the second one works
on the RTL level just fine.

Right now we have:
  _1 = __builtin_clz (x_5(D));
  _2 = 31 - _1;
  _3 = _2 ^ 31;

But the _3 can be optimized to just _1.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
  2022-02-04  4:26 ` [Bug tree-optimization/104376] " pinskia at gcc dot gnu.org
@ 2022-02-04  4:27 ` pinskia at gcc dot gnu.org
  2022-02-04  7:01 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-04  4:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The second issue can be seen with:
#include <stdint.h>

uint32_t countLeadingZeros32(uint32_t x)
{
    if (x == 0)
        return 32;
    return (__builtin_clz(x)) ;
}

This gets optimized for aarch64 at the rtl level but not for x86_64 with
-mlzcnt.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
  2022-02-04  4:26 ` [Bug tree-optimization/104376] " pinskia at gcc dot gnu.org
  2022-02-04  4:27 ` pinskia at gcc dot gnu.org
@ 2022-02-04  7:01 ` pinskia at gcc dot gnu.org
  2022-02-09  6:10 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-04  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |104378

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Filed PR 104378 for the (31 - x) ^ 31 issue.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104378
[Bug 104378] (N - x) ^ N should be optimized to x if x <= N (unsigned) and N is
a pow2 - 1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2022-02-04  7:01 ` pinskia at gcc dot gnu.org
@ 2022-02-09  6:10 ` pinskia at gcc dot gnu.org
  2022-02-09  7:40 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-09  6:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> The second issue can be seen with:
> #include <stdint.h>
> 
> uint32_t countLeadingZeros32(uint32_t x)
> {
>     if (x == 0)
>         return 32;
>     return (__builtin_clz(x)) ;
> }

cond_removal_in_builtin_zero_pattern should have optimized the above but does
not for some reason.
Let me take a look.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2022-02-09  6:10 ` pinskia at gcc dot gnu.org
@ 2022-02-09  7:40 ` pinskia at gcc dot gnu.org
  2023-05-06 21:23 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-09  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=101822,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=99997,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=71016

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> cond_removal_in_builtin_zero_pattern should have optimized the above but
> does not for some reason.
> Let me take a look.

So one problem is we have:
  <bb 2> [local count: 1073741824]:
  if (x_3(D) == 0)
    goto <bb 4>; [21.72%]
  else
    goto <bb 3>; [78.28%]

  <bb 3> [local count: 840525097]:
  _1 = __builtin_clz (x_3(D));
  _4 = (uint32_t) _1;

  <bb 4> [local count: 1073741824]:
  # _2 = PHI <32(2), _4(3)>


Which we don't handle in cond_removal_in_builtin_zero_pattern, this similar to
PR 99997 and PR 101822, that is the code which added to fix PR 71016 is getting
in the way.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2022-02-09  7:40 ` pinskia at gcc dot gnu.org
@ 2023-05-06 21:23 ` pinskia at gcc dot gnu.org
  2023-10-15 18:58 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-06 21:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
         Depends on|                            |101822

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The cast issue is basically PR 101822.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101822
[Bug 101822] Codegen bug for popcount

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2023-05-06 21:23 ` pinskia at gcc dot gnu.org
@ 2023-10-15 18:58 ` pinskia at gcc dot gnu.org
  2023-10-15 19:21 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-15 18:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I have a fix for the secondary issue which does not cause PR 71016 to show up
again.
Basically we should allow nop conversions always in
factor_out_conditional_operation .

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2023-10-15 18:58 ` pinskia at gcc dot gnu.org
@ 2023-10-15 19:21 ` pinskia at gcc dot gnu.org
  2023-10-24 11:17 ` cvs-commit at gcc dot gnu.org
  2023-10-24 11:20 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-15 19:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56117
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56117&action=edit
Patch which I am testing to fix the second issue

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2023-10-15 19:21 ` pinskia at gcc dot gnu.org
@ 2023-10-24 11:17 ` cvs-commit at gcc dot gnu.org
  2023-10-24 11:20 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-10-24 11:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:0fc13e8c0e39c51e82deb93f324d9d86ad8d7460

commit r14-4889-g0fc13e8c0e39c51e82deb93f324d9d86ad8d7460
Author: Andrew Pinski <pinskia@gmail.com>
Date:   Sun Oct 15 19:15:38 2023 +0000

    Improve factor_out_conditional_operation for conversions and constants

    In the case of a NOP conversion (precisions of the 2 types are equal),
    factoring out the conversion can be done even if int_fits_type_p returns
    false and even when the conversion is defined by a statement inside the
    conditional. Since it is a NOP conversion there is no zero/sign extending
    happening which is why it is ok to be done here; we were trying to prevent
    an extra sign/zero extend from being moved away from definition which no-op
    conversions are not.

    Bootstrapped and tested on x86_64-linux-gnu with no regressions.

    gcc/ChangeLog:

            PR tree-optimization/104376
            PR tree-optimization/101541
            * tree-ssa-phiopt.cc (factor_out_conditional_operation):
            Allow nop conversions even if it is defined by a statement
            inside the conditional.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/101541
            * gcc.dg/tree-ssa/phi-opt-39.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/104376] Failure to optimize clz equivalent to clz
  2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2023-10-24 11:17 ` cvs-commit at gcc dot gnu.org
@ 2023-10-24 11:20 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-24 11:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104376
Bug 104376 depends on bug 101822, which changed state.

Bug 101822 Summary: Codegen bug for popcount
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101822

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-24 11:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04  1:10 [Bug tree-optimization/104376] New: Failure to optimize clz equivalent to clz gabravier at gmail dot com
2022-02-04  4:26 ` [Bug tree-optimization/104376] " pinskia at gcc dot gnu.org
2022-02-04  4:27 ` pinskia at gcc dot gnu.org
2022-02-04  7:01 ` pinskia at gcc dot gnu.org
2022-02-09  6:10 ` pinskia at gcc dot gnu.org
2022-02-09  7:40 ` pinskia at gcc dot gnu.org
2023-05-06 21:23 ` pinskia at gcc dot gnu.org
2023-10-15 18:58 ` pinskia at gcc dot gnu.org
2023-10-15 19:21 ` pinskia at gcc dot gnu.org
2023-10-24 11:17 ` cvs-commit at gcc dot gnu.org
2023-10-24 11:20 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).