public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/95752] New: Failure to optimize complicated usage of __builtin_ctz with conditionals properly
@ 2020-06-18 20:18 gabravier at gmail dot com
  2021-08-20  5:38 ` [Bug tree-optimization/95752] " pinskia at gcc dot gnu.org
  2023-11-10 22:47 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: gabravier at gmail dot com @ 2020-06-18 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95752

            Bug ID: 95752
           Summary: Failure to optimize complicated usage of __builtin_ctz
                    with conditionals properly
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

unsigned long f(uint64_t value)
{
    unsigned int result;

    if ((value & 0xFFFFFFFF) == 0)
    {
        result = __builtin_ctz(value >> 32) + 32;
    }
    else
    {
        if ((unsigned int)value != 0)
            result = __builtin_ctz((unsigned int)value);
    }

    return result;
}

With -O3 -mbmi, LLVM outputs this :

f(unsigned long):
  mov rax, rdi
  shr rax, 32
  tzcnt ecx, eax
  or ecx, 32
  tzcnt eax, edi
  cmovb eax, ecx
  ret

GCC outputs this :

f(unsigned long):
  test edi, edi
  jne .L2

  shr rdi, 32
  xor eax, eax
  tzcnt eax, edi
  add eax, 32
  mov eax, eax
  ret

.L2:
  xor edx, edx
  mov eax, 0
  tzcnt edx, edi
  test edi, edi
  cmovne eax, edx
  mov eax, eax
  ret

This may be related to how GCC handles undefined behaviour in relation to
`__builtin_ctz` and uninitialized variables, but this still seems like it could
be heavily optimized. At least, it could emit something like this if the
`cmovcc` is not the best behaviour here :

f(unsigned long):
  test edi, edi
  jne .L2

  shr rdi, 32
  tzcnt eax, edi
  add eax, 32
  ret

.L1:
  tzcnt eax, edi
  ret

Using this code :

unsigned long f(uint64_t value)
{
    unsigned int result;

    if ((value & 0xFFFFFFFF) == 0)
    {
        result = __builtin_ctz(value >> 32) + 32;
    }
    else
    {
        if ((unsigned int)value != 0)
            result = __builtin_ctz((unsigned int)value);
        else
            __builtin_unreachable();
    }

    return result;
}

(i.e. adding __builtin_unreachable where an undefined value is created)
generates better code :

f(unsigned long):
  xor eax, eax
  tzcnt eax, edi
  test edi, edi
  jne .L3
  shr rdi, 32
  tzcnt edi, edi
  lea eax, [rdi+32]
.L3:
  mov eax, eax
  ret

This looks like something tree-ssa optimizers could do (inserting
__builtin_unreachable when invoking UB through usage of undefined values) since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94861 indicates that GCC doesn't
do this even for the simplest cases (and, looking at tree dumps, tree-ssa
doesn't look like it makes any assumptions on the initial value of variables).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/95752] Failure to optimize complicated usage of __builtin_ctz with conditionals properly
  2020-06-18 20:18 [Bug tree-optimization/95752] New: Failure to optimize complicated usage of __builtin_ctz with conditionals properly gabravier at gmail dot com
@ 2021-08-20  5:38 ` pinskia at gcc dot gnu.org
  2023-11-10 22:47 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-20  5:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95752

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-08-20
     Ever confirmed|0                           |1
         Depends on|                            |56711

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Well clang regressioned in clang 12 :).
PR 56711 is also related.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56711
[Bug 56711] missed optimization for __uint128_t of (unsigned long long)x != x

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/95752] Failure to optimize complicated usage of __builtin_ctz with conditionals properly
  2020-06-18 20:18 [Bug tree-optimization/95752] New: Failure to optimize complicated usage of __builtin_ctz with conditionals properly gabravier at gmail dot com
  2021-08-20  5:38 ` [Bug tree-optimization/95752] " pinskia at gcc dot gnu.org
@ 2023-11-10 22:47 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-10 22:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95752
Bug 95752 depends on bug 56711, which changed state.

Bug 56711 Summary: missed optimization for __uint128_t of (unsigned long long)x != x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56711

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-10 22:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 20:18 [Bug tree-optimization/95752] New: Failure to optimize complicated usage of __builtin_ctz with conditionals properly gabravier at gmail dot com
2021-08-20  5:38 ` [Bug tree-optimization/95752] " pinskia at gcc dot gnu.org
2023-11-10 22:47 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).