public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output
@ 2023-04-12 8:54 ubizjak at gmail dot com
2023-04-12 8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-12 8:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
Bug ID: 109483
Summary: Unoptimal jump threading with assembler flag output
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase (int3 mnemonic is for marker only):
--cut here--
_Bool foo (int cnt)
{
if (cnt == -1)
{
_Bool success;
asm volatile("int3" : "=@ccz" (success));
if (!success)
return 0;
}
asm volatile ("" ::: "memory");
return 1;
}
--cut here--
compiles w/ -O2 on x86_64 to:
0000000000000000 <foo>:
0: 83 ff ff cmp $0xffffffff,%edi
3: 74 0b je 10 <foo+0x10>
5: b8 01 00 00 00 mov $0x1,%eax
a: c3 retq
b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
10: cc int3
11: 0f 94 c0 sete %al
14: 74 ef je 5 <foo+0x5>
16: c3 retq
Please note setting of %al before conditional jump. The instruction could be
moved after the jump, where the register could be cleared using "xor %eax,
%eax", similar to what clang creates:
0000000000000000 <foo>:
0: 83 ff ff cmp $0xffffffff,%edi
3: 75 06 jne b <foo+0xb>
5: cc int3
6: 74 03 je b <foo+0xb>
8: 31 c0 xor %eax,%eax
a: c3 retq
b: b0 01 mov $0x1,%al
d: c3 retq
Also note that for ZF=1 gcc sets %al to 1, jumps to *5 where the register is
again set to 1. This is not the case in the clang code.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/109483] Unoptimal jump threading with assembler flag output
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
@ 2023-04-12 8:59 ` pinskia at gcc dot gnu.org
2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 8:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |inline-asm,
| |missed-optimization
Component|rtl-optimization |middle-end
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this is phiopt getting in the way either that or out of ssa.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
2023-04-12 8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
@ 2023-04-12 10:47 ` rguenth at gcc dot gnu.org
2023-04-12 10:48 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 10:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-04-12
Summary|Unoptimal jump threading |Unoptimal uncprop with
|with assembler flag output |assembler flag output
Ever confirmed|0 |1
Component|middle-end |tree-optimization
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand from
__asm__ __volatile__("int3" : "=@ccz" success_4);
if (success_4 != 0)
goto <bb 4>; [66.00%]
else
goto <bb 5>; [34.00%]
;; succ: 5
;; 4
;; basic block 4, loop depth 0
;; pred: 3
;; 2
__asm__ __volatile__("" : : : "memory");
;; succ: 5
;; basic block 5, loop depth 0
;; pred: 3
;; 4
# _1 = PHI <success_4(3), 1(4)>
return _1;
and it's not PHI-opt "getting in the way" but instead RTL expansion
placing the edge 3->4 copy of 'success_4' before the conditional branch
rather than to a new BB. I suppose if we'd split critical edges that
would fix it (at the expense of some extra blocks and unconditional
jumps).
Note that clang seems to propagate the constant equivalence which we
instead un-propagate. With -fdisable-tree-uncprop1 you'll get the
expected code:
foo:
.LFB0:
.cfi_startproc
cmpl $-1, %edi
je .L8
.L2:
movl $1, %eax
ret
.p2align 4,,10
.p2align 3
.L8:
xorl %eax, %eax
#APP
# 6 "t.c" 1
int3
# 0 "" 2
#NO_APP
je .L2
ret
what uncprop doesn't understand is that copying success requires to
materialize it (it's just in CC), that's the reason it prefers that
over a zero (because zero also needs materializing).
And the RTL pipeline is not good enough in scheduling/sinking a
CC consumer across another CC consumer it seems (or even realizing
the result is constant on the only needed edge).
It might be possible to just special-case (bool) ASM defs in uncprop,
but that would be a heuristic. Not sure if we can portably identify
CC mode constraints.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
2023-04-12 8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
@ 2023-04-12 10:48 ` rguenth at gcc dot gnu.org
2023-04-12 14:40 ` ubizjak at gmail dot com
2023-04-12 19:43 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 10:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, the uncprop pass is gated with flag_tree_dom, so -fno-tree-dominator-opts
also "fixes" this.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
` (2 preceding siblings ...)
2023-04-12 10:48 ` rguenth at gcc dot gnu.org
@ 2023-04-12 14:40 ` ubizjak at gmail dot com
2023-04-12 19:43 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-12 14:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> Note that clang seems to propagate the constant equivalence which we
> instead un-propagate. With -fdisable-tree-uncprop1 you'll get the
> expected code:
>
> foo:
> .LFB0:
> .cfi_startproc
> cmpl $-1, %edi
> je .L8
> .L2:
> movl $1, %eax
> ret
> .p2align 4,,10
> .p2align 3
> .L8:
> xorl %eax, %eax
> #APP
> # 6 "t.c" 1
> int3
> # 0 "" 2
> #NO_APP
> je .L2
> ret
Even the above is not optimal. I'd expect:
...
.L8:
#APP
# 6 "t.c" 1
int3
# 0 "" 2
#NO_APP
je .L2
xorl %eax, %eax
ret
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
` (3 preceding siblings ...)
2023-04-12 14:40 ` ubizjak at gmail dot com
@ 2023-04-12 19:43 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 19:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #4)
> Even the above is not optimal. I'd expect:
>
> ...
> .L8:
> #APP
> # 6 "t.c" 1
> int3
> # 0 "" 2
> #NO_APP
> je .L2
> xorl %eax, %eax
> ret
IF-CASE-2 found, start 3, else 5
verify found no changes in insn with uid = 15.
changing bb of uid 4
from 5 to 3
deleting block 5
Conversion succeeded on pass 1.
Adding -fno-if-conversion gets what you want too. But I am not 100% sure if it
is "better".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-04-12 19:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
2023-04-12 8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
2023-04-12 10:48 ` rguenth at gcc dot gnu.org
2023-04-12 14:40 ` ubizjak at gmail dot com
2023-04-12 19:43 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).