[Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
@ 2024-06-15 14:41 Explorer09 at gmail dot com
  2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Explorer09 at gmail dot com @ 2024-06-15 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

            Bug ID: 115506
           Summary: Possible but missed "cmp" instruction merging (x86 &
                    ARM, optimization)
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

I'm not sure if the GCC developers are interested in this optimization or if
this issue has been reported before.

----

(Compiler Explorer link: https://godbolt.org/z/vqqf865cb)

```c
#include <stdint.h>

// Modified from <https://gitlab.com/-/snippets/3718423>
uint32_t utf8_to_code_point(const uint8_t **sequence) {
    if (**sequence <= 0x7F) {
        return *(*sequence)++;
    }

    unsigned int max_length;
    uint32_t min_code_point;
    if ((**sequence & 0xF0) == 0xE0) { /* NOTE 0 */
        max_length = 3;
        min_code_point = 0x0800;
    } else if ((**sequence & 0xF0) < 0xE0) { /* NOTE 1 */
        max_length = 2;
        min_code_point = 0x80;
    } else {
        max_length = 4;
        min_code_point = 0x10000;
    }

    uint32_t code_point = (uint32_t)**sequence - (0xFF & ~(0xFF >>
max_length));
    while (1) {
        (*sequence)++;
        if (--max_length == 0) {
            break;
        }
        unsigned int code_offset = (unsigned int)**sequence - 0x80;
        if (code_offset > 0x3F) {
            return (uint32_t)-1;
        }
        code_point = (code_point << 6) + code_offset;
    }

    if (code_point < min_code_point) {
        return (uint32_t)-1;
    }
    if (code_point >= 0xD800 && code_point <= 0xDFFF) {
        return (uint32_t)-1;
    }
    return code_point;
}
```

I wrote this code with the expectation that the compiler can optimize the "cmp"
 instructions so that the same compare status can be used twice.

But instead I get this:

```x86asm
#   ...
    movl    %eax, %ecx
    andl    $-16, %ecx
    cmpb    $-32, %cl
    je      .L8
    cmpb    $-33, %cl
    ja      .L9
#   ...
```

The immediate constant for the second "cmp" instruction gets modified
off-by-one (I can guess the reason for that) and missed this particular chance
of merging the comparisons.

A workaround is to modify the line marked with the `/* NOTE 1 */` comment to
use `<=` instead of `<`. But I wish GCC can detect this optimization
opportunity better.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
  2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
@ 2024-06-15 14:52 ` pinskia at gcc dot gnu.org
  2024-06-17  5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-15 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|rtl-optimization            |middle-end

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
  2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
  2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
@ 2024-06-17  5:55 ` rguenth at gcc dot gnu.org
  2024-06-17  9:27 ` ubizjak at gmail dot com
  2024-06-17  9:41 ` Explorer09 at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-17  5:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |rtl-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
On GIMPLE this cannot be realized since we do not represent condition codes so
this is ideally dealt with on RTL.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
  2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
  2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
  2024-06-17  5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
@ 2024-06-17  9:27 ` ubizjak at gmail dot com
  2024-06-17  9:41 ` Explorer09 at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2024-06-17  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
For the original testcase tree optimizers optimize to:

  <bb 4> [local count: 114863530]:
  _30 = _2 & 240;
  if (_30 == 224)
    goto <bb 7>; [34.00%]
  else
    goto <bb 5>; [66.00%]

  <bb 5> [local count: 75809929]:
  if (_30 <= 223)
    goto <bb 6>; [50.00%]
  else
    goto <bb 7>; [50.00%]


and for /* NOTE 1 */ workaround:

  <bb 4> [local count: 114863530]:
  _30 = _2 & 240;
  if (_30 == 224)
    goto <bb 7>; [34.00%]
  else
    goto <bb 5>; [66.00%]

  <bb 5> [local count: 75809929]:
  if (_30 <= 224)
    goto <bb 6>; [50.00%]
  else
    goto <bb 7>; [50.00%]


If the tree optimizer didn't over-optimize the original case and left:

  <bb 5> [local count: 75809929]:
  if (_30 < 224)
    goto <bb 6>; [50.00%]
  else
    goto <bb 7>; [50.00%]

then RTL CSE2 pass would be able to merge:

(insn 31 30 32 4 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 111 [ _30 ])
            (const_int -32 [0xffffffffffffffe0]))) "pr115506.c":11:8 9
{*cmpqi_1}
     (nil))

and

(insn 36 33 37 5 (set (reg:CC 17 flags)
        (compare:CC (reg:QI 111 [ _30 ])
            (const_int -33 [0xffffffffffffffdf]))) "pr115506.c":14:15 9
{*cmpqi_1}
     (expr_list:REG_DEAD (reg:QI 111 [ _30 ])
        (nil)))

Is there a way to avoid the over-optimization with tree optimizers? RTL part
has no way to update the flags user during CSE2 pass:

(jump_insn 37 36 38 5 (set (pc)
        (if_then_else (gtu (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref:DI 90)
            (pc))) "pr115506.c":14:15 1130 {*jcc}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (int_list:REG_BR_PROB 536870916 (nil)))

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
  2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
                   ` (2 preceding siblings ...)
  2024-06-17  9:27 ` ubizjak at gmail dot com
@ 2024-06-17  9:41 ` Explorer09 at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: Explorer09 at gmail dot com @ 2024-06-17  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

--- Comment #3 from Kang-Che Sung <Explorer09 at gmail dot com> ---
I'm not sure if this helps, but the idea is to recognize three-way comparison
as a special case.

My code was originally written in this ordering:

```c
if (x < c) {
  do_action_a();
} else if (x == c) {
  do_action_b();
} else {
  do_action_c();
}
```

But it should work no differently from this:

```c
if (x == c) {
  do_action_b();
} else if (x < c) {
  do_action_a();
} else {
  do_action_c();
}
```

Or this:

```c
if (x == c) {
  do_action_b();
} else if (x <= c) {
  do_action_a();
} else {
  do_action_c();
}
```

Or even this:

```c
if (x >= c) {
  if (x == c) {
    do_action_b();
  } else {
    do_action_c();
  }
} else {
  do_action_a();
}
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-17  9:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
2024-06-17  5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
2024-06-17  9:27 ` ubizjak at gmail dot com
2024-06-17  9:41 ` Explorer09 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).