public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/113651] New: The GCC optimizer performs poorly on a very simple code snippet.
@ 2024-01-29 11:01 cuking998244353 at gmail dot com
  2024-01-29 12:38 ` [Bug middle-end/113651] " rguenth at gcc dot gnu.org
  2024-01-30  7:09 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: cuking998244353 at gmail dot com @ 2024-01-29 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651

            Bug ID: 113651
           Summary: The GCC optimizer performs poorly on a very simple
                    code snippet.
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cuking998244353 at gmail dot com
  Target Milestone: ---

I would like to report an issue with the GCC optimizer related to a specific
code snippet. The optimizer exhibits suboptimal behavior when handling a very
simple code segment. Below is the core code snippet:


#include <stdint.h>

constexpr uint32_t M = 0x04c11db7;

uint32_t calc_CRC1(const uint8_t *data, int len, uint32_t init_value = 0)
{
    uint32_t r = init_value;
    for (int i = 0; i < len; i++)
    {
        for (int j = 0; j < 8; j++)
        {
            // Issue in Code1:
            r = (r << 1) ^ (data[i] >> (7 - j) & 1) ^ (r >> 31 ? M : 0);

            // The following Code2 works as expected:
            // uint32_t flag = r >> 31 ? M : 0;
            // r = (r << 1) ^ (data[i] >> (7 - j) & 1);
            // r ^= flag;
        }
    }
    for (int i = 0; i < 32; i++)
    {
        r = (r << 1) ^ (r >> 31 ? M : 0);
    }
    return r;
}


When I replace the segment with Code2 instead of Code1, there is a noticeable
improvement in performance.

When I read the assembly code, I noticed that in the first approach, GCC
chooses to compute the result of (r << 1) ^ (data[i] >> (7 - j) & 1) and then
XOR this result with M, obtaining the desired value through cmov.

However, it is evident that there is no dependency among the three
sub-expressions here. Each sub-expression can be computed independently and
then XORed together. This optimization approach, on the contrary, results in a
lengthening of the dependency chain.

If the second code snippet is used, this issue does not arise. Such a simple
modification leads to significant differences in results, and when I switch to
compiling with Clang, there is no noticeable performance difference between the
two. I believe this may indicate the presence of a bug in the GCC optimizer.

GCC version:
gcc.exe (Rev2, Built by MSYS2 project) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

system:
windows11 (it can still be reproduced on Linux.)

cmd:
-O2 -Wall -Wextra

You can view the issue directly through this
link:https://godbolt.org/z/PG6b7xveo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/113651] The GCC optimizer performs poorly on a very simple code snippet.
  2024-01-29 11:01 [Bug c++/113651] New: The GCC optimizer performs poorly on a very simple code snippet cuking998244353 at gmail dot com
@ 2024-01-29 12:38 ` rguenth at gcc dot gnu.org
  2024-01-30  7:09 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-29 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-01-29
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.   This is a missed phiopt (or operation sinking) of

  if (r.1_90 < 0)
    goto <bb 6>; [41.00%]
  else
    goto <bb 7>; [59.00%]

  <bb 6> [local count: 391324129]:
  _91 = _89 ^ 79764919;

  <bb 7> [local count: 954449104]:
  # prephitmp_92 = PHI <_91(6), _89(5)>

to sth like

  if (r.1_90 < 0)
    goto <bb 6>; [41.00%]
  else
    goto <bb 7>; [59.00%]

  <bb 6> [local count: 391324129]:

  <bb 7> [local count: 954449104]:
  # prephitmp_91 = PHI <79764919(6), 0(5)>
  _92 = _89 ^ prephitmp_xx;

on some archs the conditional constant might be generated by a
conditional add of 79764919 to zero.

Whether this is better suited for GIMPLE or RTL if-conversion remains to be
seen.

That splitting the expression helps is just luck.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/113651] The GCC optimizer performs poorly on a very simple code snippet.
  2024-01-29 11:01 [Bug c++/113651] New: The GCC optimizer performs poorly on a very simple code snippet cuking998244353 at gmail dot com
  2024-01-29 12:38 ` [Bug middle-end/113651] " rguenth at gcc dot gnu.org
@ 2024-01-30  7:09 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-30  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Whether this is better suited for GIMPLE or RTL if-conversion remains to be
> seen.

I suspect we could do something in isel. phiopt has something similar for casts
already though too. I have some ideas on how we undo the conditional xor and
then see if we do another phiopt if so don't put it back as conditional. And
then in isel if we see `a ? x ^ CST : 0` do it as `x ^ (a ? CST : 0)` if the
target has cmov but there needs to be some cost model; I am not sure how
though.

(I still wonder if x86's cmov has improved in recent years so that doing 2 cmov
back to back still worse than a branch; LLVM seems not to care about doing cmov
cost model and the performance there is ok).

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-30  7:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-29 11:01 [Bug c++/113651] New: The GCC optimizer performs poorly on a very simple code snippet cuking998244353 at gmail dot com
2024-01-29 12:38 ` [Bug middle-end/113651] " rguenth at gcc dot gnu.org
2024-01-30  7:09 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).