[Bug rtl-optimization/111378] New: Missed optimization for comparing with exact

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants
@ 2023-09-11 19:06 lis8215 at gmail dot com
  2023-09-12  8:47 ` [Bug middle-end/111378] " ktkachov at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: lis8215 at gmail dot com @ 2023-09-11 19:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

            Bug ID: 111378
           Summary: Missed optimization for comparing with exact_log2
                    constants
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lis8215 at gmail dot com
  Target Milestone: ---

The simple example below produces suboptimal code on many targets where
exact_log2 constant can't be represented as immediate operand.
(confirmed MIPS/PPC64/SPARC/RISC-V)

extern void do_something(char* p);
extern void do_something_other(char* p);

void test(char* p, uint32_t ch)
{
        if (ch < 0x10000)
        {
            do_something(p);
        }
        else /* ch >= 0x10000 */
        {
            do_something_other(p);
        }
}

However, instead of direct comparing with constant we can use shift & compare
to zero:
e.g. (ch < 0x10000) can be transformed into ((ch >> 16) == 0) which is usually
shorter & faster on many targets.

The condition appears in real world rarely AFAIK - 20-30 occurences per million
asm instructions. Fun fact: many of them related to unicode transformations.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants
  2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
@ 2023-09-12  8:47 ` ktkachov at gcc dot gnu.org
  2023-09-12 15:09 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-09-12  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-09-12
                 CC|                            |ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. On aarch64 GCC generates:
test:
        mov     w2, 65535
        cmp     w1, w2
        bhi     .L2
        b       do_something
.L2:
        b       do_something_other

but LLVM generates the shorter:
test:                                   // @test
        lsr     w8, w1, #16
        cbnz    w8, .LBB0_2
        b       do_something
.LBB0_2:
        b       do_something_other

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants
  2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
  2023-09-12  8:47 ` [Bug middle-end/111378] " ktkachov at gcc dot gnu.org
@ 2023-09-12 15:09 ` pinskia at gcc dot gnu.org
  2023-09-12 15:10 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-12 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I thought I saw another bug requesting the same thing but I could not find
it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants
  2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
  2023-09-12  8:47 ` [Bug middle-end/111378] " ktkachov at gcc dot gnu.org
  2023-09-12 15:09 ` pinskia at gcc dot gnu.org
@ 2023-09-12 15:10 ` pinskia at gcc dot gnu.org
  2024-01-12 22:44 ` law at gcc dot gnu.org
  2024-01-13  1:29 ` gabravier at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-12 15:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> Note I thought I saw another bug requesting the same thing but I could not
> find it.

PR 85234 is mostly requesting the opposite way though ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants
  2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
                   ` (2 preceding siblings ...)
  2023-09-12 15:10 ` pinskia at gcc dot gnu.org
@ 2024-01-12 22:44 ` law at gcc dot gnu.org
  2024-01-13  1:29 ` gabravier at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: law at gcc dot gnu.org @ 2024-01-12 22:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Whether or not this is an optimization or a pessimization is dependent on the
target -- some targets can express the constant trivially in a branch
conditions, others can not.  Some targets have barrel shifters, others do not. 
In some cases the constant might get hoisted out of a loop, reducing the cost,
but increasing register pressure, etc.

If you look at a typical RISC target the transformation just trades constant
construction for a shift.  For constants that can be constructed in a single
instruction neither sequence is inherently faster than the other.  However, the
shift sequence has an additional data dependency relative to the constant
construction approach, so the shifted input approach would be considered
slightly less desirable.

For RISC-V we can construct any power of two constant in a single instruction
if we have the zbs extension (which should be common in modern
implementations).  I suspect it would not be profitable on MIPS or PPC either
as I think they have branches with ordered comparisons of two registers and the
ability to construct 2^n in a single instrution.

aarch64 seems to be the one target where this could be the helpful based on
c#1.  I guess it doesn't have branches with ordered comparisons of two
registers and instead has to do an explicit comparison.

Anyway, just wanted to get various thoughts recorded.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants
  2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
                   ` (3 preceding siblings ...)
  2024-01-12 22:44 ` law at gcc dot gnu.org
@ 2024-01-13  1:29 ` gabravier at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2024-01-13  1:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

--- Comment #5 from Gabriel Ravier <gabravier at gmail dot com> ---
It does seem as though this transformation is not particularly favorable on
most platforms. In fact, it seems as though the opposite transformation (which
Clang does on many targets, along with MSVC) would be useful on most target,
with some exceptions, including:

- PowerPC, on which llvm-mca appears to consider `srdi.` to be faster than
`cmplwi`

- MIPS16, though I am unsure of this - GCC code generation is messy on there
and I have trouble getting llvm-mca to parse GCC's output, but it seems to
consider loading the constant from memory to be far slower than even doing the
shift in two steps (which MIPS16 apparently requires, given GCC emits two `srl
$4, $4, 8` instructions to do the shift)

- Loongarch, which seems to give code for `x < 0x10000` that I would have a
hard time imagining being faster than a single shift given that it outputs
this:
  lu12i.w $r12,61440>>12 # 0xf000
  ori $r5,$r12,4095
  sltu $r4,$r5,$r4
  xori $r6,$r4,1
  andi $r4,$r6,1
whereas a shift outputs this:
  bstrpick.d $r4,$r4,31,16
  sltui $r4,$r4,1
(note: I am not too certain for some of these, but it also seems like Alpha,
C6x, FR-V, RISC-V 64 and Sparc emit much smaller code sequences (i.e. 2-3 times
smaller) that look faster at first glance for the shifting version as compared
to the comparing version)


(PS: Given I do not have a server farm containing every single target GCC
supports for the purposes of benchmarking this, I'm mostly assuming this from
manually peeking at assembly output to try and guess which would be better and
what from looking at what llvm-mca considers to be the faster instruction
sequence on the targets it supports, so potentially llvm-mca and me could just
be wrong, though I would hope LLVM correctly models the performance of the
chips it targets...)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-13  1:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 19:06 [Bug rtl-optimization/111378] New: Missed optimization for comparing with exact_log2 constants lis8215 at gmail dot com
2023-09-12  8:47 ` [Bug middle-end/111378] " ktkachov at gcc dot gnu.org
2023-09-12 15:09 ` pinskia at gcc dot gnu.org
2023-09-12 15:10 ` pinskia at gcc dot gnu.org
2024-01-12 22:44 ` law at gcc dot gnu.org
2024-01-13  1:29 ` gabravier at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).