[Bug target/104773] New: compare with 1 not merged with subtract 1

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/104773] New: compare with 1 not merged with subtract 1
@ 2022-03-03 16:02 peter at cordes dot ca
  2022-03-04  1:19 ` [Bug target/104773] " crazylht at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: peter at cordes dot ca @ 2022-03-03 16:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

            Bug ID: 104773
           Summary: compare with 1 not merged with subtract 1
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*, arm-*-*

std::bit_ceil(x) involves if(x == 0 || x == 1) return 1;
and 1u << (32-clz(x-1)).

The compare of course compiles to an unsigned <= 1, which can be done with a
sub instead of cmp, producing the value we need as an input for the
leading-zero count.  But GCC does *not* do this.  (Neither does clang for
x86-64).  I trimmed down the libstdc++ <bit> code into something I could
compile even when Godbolt is doesn't have working headers for some ISAs:
https://godbolt.org/z/3EE7W5bna

// cut down from libstdc++ for normal integer cases; compiles the same
  template<typename _Tp>
    constexpr _Tp
    bit_ceil(_Tp __x) noexcept
    {
      constexpr auto _Nd = std::numeric_limits<_Tp>::digits;
      if (__x == 0 || __x == 1)
        return 1;
      auto __shift_exponent = _Nd - __builtin_clz((_Tp)(__x - 1u));
      // using __promoted_type = decltype(__x << 1); ... // removed check for
x<<n widening the result
      return (_Tp)1u << __shift_exponent;
    }
}

for x86-64 with GCC trunk -O3 -march=ivybridge, we get this inefficient code:

roundup(unsigned int):
        mov     eax, 1
        cmp     edi, 1
        jbe     .L1
        sub     edi, 1        # could have just done a sub in the first place
        bsr     edi, edi      # correctly avoiding a false dependency by *not*
using ECX as the destination
        lea     ecx, [rdi+1]  # could have shifted  2<<n instead of  1<<(n+1)
        sal     eax, cl       # 3 uops, vs. 1 for bts is a more efficient way
to materialize 1<<n
.L1:
        ret

Also, Ivybridge has no problem with DEC instead of SUB 1, IDK why it's avoiding
DEC here but not for Haswell for example.  (Haswell pessimizes by using
32-lzcnt instead of lzcnt^31 or something, or still just BSR because it
performs identically on actual Haswell; lzcnt is only faster on AMD)

But this bug report is just about sub/cmp combining, not how to materialize
1<<(n+1) or other stuff:  Better would be

    sub  edi, 1
    jbe  .L1
    bsr  edi, edi
    xor  eax, eax
    inc  edi
    bts  eax, edi      # EAX |= 1<<EDI
    ret
.L1:
    mov  eax, 1
    ret

Intel SnB-family can macro-fuse sub/jbe.  AMD can't, so the change is
break-even for front-end uops when the branch is not-taken, and worse when it
is taken.  But it's still smaller code-size.

For ARM, clang finds a very clever way to combine it:

roundup(unsigned int):
        subs    r0, r0, #1
        clz     r0, r0
        rsb     r1, r0, #32         @ 32-clz
        mov     r0, #1
        lslhi   r0, r0, r1          @ using flags set by SUBS
        bx      lr                  @ 1<<(32-clz) or just 1

GCC on the other hand does much worse with -O3 -std=gnu++20 -mcpu=cortex-a53
-mthumb

roundup(unsigned int):
        cmp     r0, #1
        itttt   hi
        addhi   r3, r0, #-1
        movhi   r0, #1
        clzhi   r3, r3
        rsbhi   r3, r3, #32
        ite     hi
        lslhi   r0, r0, r3
        movls   r0, #1
        bx      lr

I suspect we could do better by combining the cmp and addhi, and doing `mov r0,
#1` outside of predication.  (I think that's a separate bug, planning to report
it separately.)  Then one `it` would be enough to cover things, I think.

That would basically reduce it to clang's strategy, although the predication of
the clz and rsb are optional.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/104773] compare with 1 not merged with subtract 1
  2022-03-03 16:02 [Bug target/104773] New: compare with 1 not merged with subtract 1 peter at cordes dot ca
@ 2022-03-04  1:19 ` crazylht at gmail dot com
  2022-03-07  8:19 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2022-03-04  1:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
It looks like the same issue as PR98977.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/104773] compare with 1 not merged with subtract 1
  2022-03-03 16:02 [Bug target/104773] New: compare with 1 not merged with subtract 1 peter at cordes dot ca
  2022-03-04  1:19 ` [Bug target/104773] " crazylht at gmail dot com
@ 2022-03-07  8:19 ` rguenth at gcc dot gnu.org
  2023-09-25 10:45 ` [Bug rtl-optimization/104773] " cptarse-luke at yahoo dot com
  2023-10-25 22:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-07  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-03-07
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Depending on how the unsigned <= 1 is represented on RTL this might
or might not be "easily" taught to RTL PRE (likely RTL CSE is too local to
catch it, but at least the fallthru path might form an EBB)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/104773] compare with 1 not merged with subtract 1
  2022-03-03 16:02 [Bug target/104773] New: compare with 1 not merged with subtract 1 peter at cordes dot ca
  2022-03-04  1:19 ` [Bug target/104773] " crazylht at gmail dot com
  2022-03-07  8:19 ` rguenth at gcc dot gnu.org
@ 2023-09-25 10:45 ` cptarse-luke at yahoo dot com
  2023-10-25 22:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: cptarse-luke at yahoo dot com @ 2023-09-25 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

Luke <cptarse-luke at yahoo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cptarse-luke at yahoo dot com

--- Comment #3 from Luke <cptarse-luke at yahoo dot com> ---
*** Bug 111500 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/104773] compare with 1 not merged with subtract 1
  2022-03-03 16:02 [Bug target/104773] New: compare with 1 not merged with subtract 1 peter at cordes dot ca
                   ` (2 preceding siblings ...)
  2023-09-25 10:45 ` [Bug rtl-optimization/104773] " cptarse-luke at yahoo dot com
@ 2023-10-25 22:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-25 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56314
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56314&action=edit
testcase

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-25 22:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 16:02 [Bug target/104773] New: compare with 1 not merged with subtract 1 peter at cordes dot ca
2022-03-04  1:19 ` [Bug target/104773] " crazylht at gmail dot com
2022-03-07  8:19 ` rguenth at gcc dot gnu.org
2023-09-25 10:45 ` [Bug rtl-optimization/104773] " cptarse-luke at yahoo dot com
2023-10-25 22:05 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).