public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path
@ 2021-06-20  9:37 steinar+gcc at gunderson dot no
  2021-06-20  9:41 ` [Bug tree-optimization/101139] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-20  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

            Bug ID: 101139
           Summary: Unable to remove double byteswap in fast path
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

The following code is reduced from a real interpreter:

extern void (*a[])();
int d, e, h, l;
typedef struct {
  char ab;
} f;
f g;
short i();
short m68ki_read_imm_16() {
  short j, k;
  int b = d;
  f f = g;
  if (b < h)
    return __builtin_bswap16((&f.ab)[0]);
  k = i();
  short c = k;
  j = __builtin_bswap16(c);
  return j;
}
int b() {
  short m;
  do {
    m = m68ki_read_imm_16();
    short c = m;
    l = __builtin_bswap16(c);
    a[l]();
  } while (e);
  return e;
}

Compiling with arm-linux-gnueabihf-gcc-10 -O2 yields this interesting sequence
in the function:

        b       .L11
.L15:
        ldrb    r3, [r5, #8]    @ zero_extendqisi2
        rev16   r3, r3
        uxth    r3, r3
.L10:
        rev16   r3, r3
        uxth    r3, r3

The original code intention was to have a reusable function that returned in
big-endian, but that a specific use of it would be able to ignore endianness
into a table lookup, removing the double-swap entirely. GCC can normally do
that, but it seems that the branch in m68ki_read_imm_16() somehow gets in the
way. Just to be clear, I expect zero rev16 instructions altogether in b() when
m68ki_read_imm_16() is inlined.

The problem is not ARM-specific; x86 shows a similar problematic sequence:

        leaq    a(%rip), %rbx
        jmp     .L11
        .p2align 4,,10
        .p2align 3
.L15:
        movsbw  g(%rip), %ax
        rolw    $8, %ax
.L10:
        rolw    $8, %ax
        movzwl  %ax, %edx

Also verified with

gcc version 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] (Debian
20210527-1)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/101139] Unable to remove double byteswap in fast path
  2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
@ 2021-06-20  9:41 ` pinskia at gcc dot gnu.org
  2021-06-20  9:48 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-20  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-06-20
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
         Depends on|                            |13563
             Status|UNCONFIRMED                 |ASSIGNED
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
  if (b_13 < h.0_15)
    goto <bb 4>; [51.12%]
  else
    goto <bb 5>; [48.88%]

  <bb 4> [local count: 548896825]:
  _16 = (short unsigned int) f$ab_14;
  _17 = (int) _16;
  _18 = __builtin_bswap16 (_17);
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 524845000]:
  k_22 = i ();
  c.1_23 = (short unsigned int) k_22;
  _24 = (int) c.1_23;
  _25 = __builtin_bswap16 (_24);

  <bb 6> [local count: 1073741824]:
  # prephitmp_32 = PHI <_18(4), _25(5)>

Basically the same issue as PR 13563.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13563
[Bug 13563] if-conversion not agressive enough

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/101139] Unable to remove double byteswap in fast path
  2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
  2021-06-20  9:41 ` [Bug tree-optimization/101139] " pinskia at gcc dot gnu.org
@ 2021-06-20  9:48 ` pinskia at gcc dot gnu.org
  2021-06-21  6:37 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-20  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
>   if (b_13 < h.0_15)
>     goto <bb 4>; [51.12%]
>   else
>     goto <bb 5>; [48.88%]
> 
>   <bb 4> [local count: 548896825]:
>   _16 = (short unsigned int) f$ab_14;
>   _17 = (int) _16;
>   _18 = __builtin_bswap16 (_17);
>   goto <bb 6>; [100.00%]
> 
>   <bb 5> [local count: 524845000]:
>   k_22 = i ();
>   c.1_23 = (short unsigned int) k_22;
>   _24 = (int) c.1_23;
>   _25 = __builtin_bswap16 (_24);
> 
>   <bb 6> [local count: 1073741824]:
>   # prephitmp_32 = PHI <_18(4), _25(5)>
> 
> Basically the same issue as PR 13563.

Once that issue is fixed we should get:
  if (b_13 < h.0_15)
    goto <bb 4>; [51.12%]
  else
    goto <bb 5>; [48.88%]

  <bb 4> [local count: 548896825]:
  _16 = (short unsigned int) f$ab_14;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 524845000]:
  k_22 = i ();
  c.1_23 = (short unsigned int) k_22;

  <bb 6> [local count: 1073741824]:
  # c.1_23 = PHI <_16(4), c.1_23(5)>

  _24 = (int) c.1_23;
  prephitmp_32 = __builtin_bswap16 (_32);
  _2 = (int) prephitmp_32;
  _3 = __builtin_bswap16 (_2);
  _4 = (int) _3;

Which then should just optimize later on to:  if (b_13 < h.0_15)
    goto <bb 4>; [51.12%]
  else
    goto <bb 5>; [48.88%]

  <bb 4> [local count: 548896825]:
  _16 = (short unsigned int) f$ab_14;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 524845000]:
  k_22 = i ();
  c.1_23 = (short unsigned int) k_22;

  <bb 6> [local count: 1073741824]:
  # c.1_23 = PHI <_16(4), c.1_23(5)>

  _24 = (int) c.1_23;
  _4 = _24;
(If I did this conversion right)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/101139] Unable to remove double byteswap in fast path
  2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
  2021-06-20  9:41 ` [Bug tree-optimization/101139] " pinskia at gcc dot gnu.org
  2021-06-20  9:48 ` pinskia at gcc dot gnu.org
@ 2021-06-21  6:37 ` rguenth at gcc dot gnu.org
  2021-06-26 16:51 ` steinar+gcc at gunderson dot no
  2024-04-20  7:25 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-21  6:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
One odd thing is that while __builtin_bswap16 is declared as taking an uint16
argument the frontend promotes it to 'int' and that stays that way in GIMPLE:

  _16 = (short unsigned int) f$ab_14;
  _17 = (int) _16;
  _18 = __builtin_bswap16 (_17);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/101139] Unable to remove double byteswap in fast path
  2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
                   ` (2 preceding siblings ...)
  2021-06-21  6:37 ` rguenth at gcc dot gnu.org
@ 2021-06-26 16:51 ` steinar+gcc at gunderson dot no
  2024-04-20  7:25 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: steinar+gcc at gunderson dot no @ 2021-06-26 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

--- Comment #4 from Steinar H. Gunderson <steinar+gcc at gunderson dot no> ---
Yes, the integer promotion actually costs some performance. It happens on both
x86 and Arm. Should I file that as a separate bug?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/101139] Unable to remove double byteswap in fast path
  2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
                   ` (3 preceding siblings ...)
  2021-06-26 16:51 ` steinar+gcc at gunderson dot no
@ 2024-04-20  7:25 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-20  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101139

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57993
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57993&action=edit
Patch but it does not work for the code in this testcase

I have to look into why it is not working for the testcase in comment #0
(factor_out_conditional_operation is not even called) but it does work for:
```
short f(int a, short b, short c)
{
  if (a)
    return __builtin_bswap16(b);
  return __builtin_bswap16(c);
}
```

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-20  7:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20  9:37 [Bug tree-optimization/101139] New: Unable to remove double byteswap in fast path steinar+gcc at gunderson dot no
2021-06-20  9:41 ` [Bug tree-optimization/101139] " pinskia at gcc dot gnu.org
2021-06-20  9:48 ` pinskia at gcc dot gnu.org
2021-06-21  6:37 ` rguenth at gcc dot gnu.org
2021-06-26 16:51 ` steinar+gcc at gunderson dot no
2024-04-20  7:25 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).