public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
@ 2024-06-15 14:41 Explorer09 at gmail dot com
2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Explorer09 at gmail dot com @ 2024-06-15 14:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506
Bug ID: 115506
Summary: Possible but missed "cmp" instruction merging (x86 &
ARM, optimization)
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: Explorer09 at gmail dot com
Target Milestone: ---
I'm not sure if the GCC developers are interested in this optimization or if
this issue has been reported before.
----
(Compiler Explorer link: https://godbolt.org/z/vqqf865cb)
```c
#include <stdint.h>
// Modified from <https://gitlab.com/-/snippets/3718423>
uint32_t utf8_to_code_point(const uint8_t **sequence) {
if (**sequence <= 0x7F) {
return *(*sequence)++;
}
unsigned int max_length;
uint32_t min_code_point;
if ((**sequence & 0xF0) == 0xE0) { /* NOTE 0 */
max_length = 3;
min_code_point = 0x0800;
} else if ((**sequence & 0xF0) < 0xE0) { /* NOTE 1 */
max_length = 2;
min_code_point = 0x80;
} else {
max_length = 4;
min_code_point = 0x10000;
}
uint32_t code_point = (uint32_t)**sequence - (0xFF & ~(0xFF >>
max_length));
while (1) {
(*sequence)++;
if (--max_length == 0) {
break;
}
unsigned int code_offset = (unsigned int)**sequence - 0x80;
if (code_offset > 0x3F) {
return (uint32_t)-1;
}
code_point = (code_point << 6) + code_offset;
}
if (code_point < min_code_point) {
return (uint32_t)-1;
}
if (code_point >= 0xD800 && code_point <= 0xDFFF) {
return (uint32_t)-1;
}
return code_point;
}
```
I wrote this code with the expectation that the compiler can optimize the "cmp"
instructions so that the same compare status can be used twice.
But instead I get this:
```x86asm
# ...
movl %eax, %ecx
andl $-16, %ecx
cmpb $-32, %cl
je .L8
cmpb $-33, %cl
ja .L9
# ...
```
The immediate constant for the second "cmp" instruction gets modified
off-by-one (I can guess the reason for that) and missed this particular chance
of merging the comparisons.
A workaround is to modify the line marked with the `/* NOTE 1 */` comment to
use `<=` instead of `<`. But I wish GCC can detect this optimization
opportunity better.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug middle-end/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
@ 2024-06-15 14:52 ` pinskia at gcc dot gnu.org
2024-06-17 5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-15 14:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Component|rtl-optimization |middle-end
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
@ 2024-06-17 5:55 ` rguenth at gcc dot gnu.org
2024-06-17 9:27 ` ubizjak at gmail dot com
2024-06-17 9:41 ` Explorer09 at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-17 5:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |rtl-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
On GIMPLE this cannot be realized since we do not represent condition codes so
this is ideally dealt with on RTL.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
2024-06-17 5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
@ 2024-06-17 9:27 ` ubizjak at gmail dot com
2024-06-17 9:41 ` Explorer09 at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2024-06-17 9:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
For the original testcase tree optimizers optimize to:
<bb 4> [local count: 114863530]:
_30 = _2 & 240;
if (_30 == 224)
goto <bb 7>; [34.00%]
else
goto <bb 5>; [66.00%]
<bb 5> [local count: 75809929]:
if (_30 <= 223)
goto <bb 6>; [50.00%]
else
goto <bb 7>; [50.00%]
and for /* NOTE 1 */ workaround:
<bb 4> [local count: 114863530]:
_30 = _2 & 240;
if (_30 == 224)
goto <bb 7>; [34.00%]
else
goto <bb 5>; [66.00%]
<bb 5> [local count: 75809929]:
if (_30 <= 224)
goto <bb 6>; [50.00%]
else
goto <bb 7>; [50.00%]
If the tree optimizer didn't over-optimize the original case and left:
<bb 5> [local count: 75809929]:
if (_30 < 224)
goto <bb 6>; [50.00%]
else
goto <bb 7>; [50.00%]
then RTL CSE2 pass would be able to merge:
(insn 31 30 32 4 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:QI 111 [ _30 ])
(const_int -32 [0xffffffffffffffe0]))) "pr115506.c":11:8 9
{*cmpqi_1}
(nil))
and
(insn 36 33 37 5 (set (reg:CC 17 flags)
(compare:CC (reg:QI 111 [ _30 ])
(const_int -33 [0xffffffffffffffdf]))) "pr115506.c":14:15 9
{*cmpqi_1}
(expr_list:REG_DEAD (reg:QI 111 [ _30 ])
(nil)))
Is there a way to avoid the over-optimization with tree optimizers? RTL part
has no way to update the flags user during CSE2 pass:
(jump_insn 37 36 38 5 (set (pc)
(if_then_else (gtu (reg:CC 17 flags)
(const_int 0 [0]))
(label_ref:DI 90)
(pc))) "pr115506.c":14:15 1130 {*jcc}
(expr_list:REG_DEAD (reg:CC 17 flags)
(int_list:REG_BR_PROB 536870916 (nil)))
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
` (2 preceding siblings ...)
2024-06-17 9:27 ` ubizjak at gmail dot com
@ 2024-06-17 9:41 ` Explorer09 at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: Explorer09 at gmail dot com @ 2024-06-17 9:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506
--- Comment #3 from Kang-Che Sung <Explorer09 at gmail dot com> ---
I'm not sure if this helps, but the idea is to recognize three-way comparison
as a special case.
My code was originally written in this ordering:
```c
if (x < c) {
do_action_a();
} else if (x == c) {
do_action_b();
} else {
do_action_c();
}
```
But it should work no differently from this:
```c
if (x == c) {
do_action_b();
} else if (x < c) {
do_action_a();
} else {
do_action_c();
}
```
Or this:
```c
if (x == c) {
do_action_b();
} else if (x <= c) {
do_action_a();
} else {
do_action_c();
}
```
Or even this:
```c
if (x >= c) {
if (x == c) {
do_action_b();
} else {
do_action_c();
}
} else {
do_action_a();
}
```
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-06-17 9:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-15 14:41 [Bug other/115506] New: Possible but missed "cmp" instruction merging (x86 & ARM, optimization) Explorer09 at gmail dot com
2024-06-15 14:52 ` [Bug middle-end/115506] " pinskia at gcc dot gnu.org
2024-06-17 5:55 ` [Bug rtl-optimization/115506] " rguenth at gcc dot gnu.org
2024-06-17 9:27 ` ubizjak at gmail dot com
2024-06-17 9:41 ` Explorer09 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).