public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output
@ 2023-04-12  8:54 ubizjak at gmail dot com
  2023-04-12  8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-12  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

            Bug ID: 109483
           Summary: Unoptimal jump threading with assembler flag output
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase (int3 mnemonic is for marker only):


--cut here--
_Bool foo (int cnt)
{
  if (cnt == -1)
    {
      _Bool success;
      asm volatile("int3" : "=@ccz" (success));

      if (!success)
        return 0;
    }

  asm volatile ("" ::: "memory");
  return 1;
}
--cut here--

compiles w/ -O2 on x86_64 to:

0000000000000000 <foo>:
   0:   83 ff ff                cmp    $0xffffffff,%edi
   3:   74 0b                   je     10 <foo+0x10>
   5:   b8 01 00 00 00          mov    $0x1,%eax
   a:   c3                      retq   
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  10:   cc                      int3   
  11:   0f 94 c0                sete   %al
  14:   74 ef                   je     5 <foo+0x5>
  16:   c3                      retq   

Please note setting of %al before conditional jump. The instruction could be
moved after the jump, where the register could be cleared using "xor %eax,
%eax", similar to what clang creates:

0000000000000000 <foo>:
   0:   83 ff ff                cmp    $0xffffffff,%edi
   3:   75 06                   jne    b <foo+0xb>
   5:   cc                      int3   
   6:   74 03                   je     b <foo+0xb>
   8:   31 c0                   xor    %eax,%eax
   a:   c3                      retq   
   b:   b0 01                   mov    $0x1,%al
   d:   c3                      retq   

Also note that for ZF=1 gcc sets %al to 1, jumps to *5 where the register is
again set to 1. This is not the case in the clang code.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/109483] Unoptimal jump threading with assembler flag output
  2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
@ 2023-04-12  8:59 ` pinskia at gcc dot gnu.org
  2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |inline-asm,
                   |                            |missed-optimization
          Component|rtl-optimization            |middle-end

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this is phiopt getting in the way either that or out of ssa.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
  2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
  2023-04-12  8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
@ 2023-04-12 10:47 ` rguenth at gcc dot gnu.org
  2023-04-12 10:48 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 10:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-04-12
            Summary|Unoptimal jump threading    |Unoptimal uncprop with
                   |with assembler flag output  |assembler flag output
     Ever confirmed|0                           |1
          Component|middle-end                  |tree-optimization

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand from

  __asm__ __volatile__("int3" : "=@ccz" success_4);
  if (success_4 != 0)
    goto <bb 4>; [66.00%]
  else
    goto <bb 5>; [34.00%]
;;    succ:       5 
;;                4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  __asm__ __volatile__("" :  :  : "memory");
;;    succ:       5

;;   basic block 5, loop depth 0
;;    pred:       3
;;                4
  # _1 = PHI <success_4(3), 1(4)>
  return _1;

and it's not PHI-opt "getting in the way" but instead RTL expansion
placing the edge 3->4 copy of 'success_4' before the conditional branch
rather than to a new BB.  I suppose if we'd split critical edges that
would fix it (at the expense of some extra blocks and unconditional
jumps).

Note that clang seems to propagate the constant equivalence which we
instead un-propagate.  With -fdisable-tree-uncprop1 you'll get the
expected code:

foo:
.LFB0:
        .cfi_startproc
        cmpl    $-1, %edi
        je      .L8
.L2:
        movl    $1, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L8:
        xorl    %eax, %eax
#APP
# 6 "t.c" 1
        int3
# 0 "" 2
#NO_APP
        je      .L2
        ret

what uncprop doesn't understand is that copying success requires to
materialize it (it's just in CC), that's the reason it prefers that
over a zero (because zero also needs materializing).

And the RTL pipeline is not good enough in scheduling/sinking a
CC consumer across another CC consumer it seems (or even realizing
the result is constant on the only needed edge).

It might be possible to just special-case (bool) ASM defs in uncprop,
but that would be a heuristic.  Not sure if we can portably identify
CC mode constraints.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
  2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
  2023-04-12  8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
  2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
@ 2023-04-12 10:48 ` rguenth at gcc dot gnu.org
  2023-04-12 14:40 ` ubizjak at gmail dot com
  2023-04-12 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, the uncprop pass is gated with flag_tree_dom, so -fno-tree-dominator-opts
also "fixes" this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
  2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2023-04-12 10:48 ` rguenth at gcc dot gnu.org
@ 2023-04-12 14:40 ` ubizjak at gmail dot com
  2023-04-12 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-12 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #2)

> Note that clang seems to propagate the constant equivalence which we
> instead un-propagate.  With -fdisable-tree-uncprop1 you'll get the
> expected code:
> 
> foo:
> .LFB0:
>         .cfi_startproc
>         cmpl    $-1, %edi
>         je      .L8
> .L2:
>         movl    $1, %eax
>         ret
>         .p2align 4,,10
>         .p2align 3
> .L8:
>         xorl    %eax, %eax
> #APP
> # 6 "t.c" 1
>         int3
> # 0 "" 2
> #NO_APP
>         je      .L2
>         ret

Even the above is not optimal. I'd expect:

...
.L8:
#APP
# 6 "t.c" 1
        int3
# 0 "" 2
#NO_APP
        je      .L2
        xorl    %eax, %eax
        ret

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109483] Unoptimal uncprop with assembler flag output
  2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2023-04-12 14:40 ` ubizjak at gmail dot com
@ 2023-04-12 19:43 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 19:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109483

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #4)
> Even the above is not optimal. I'd expect:
> 
> ...
> .L8:
> #APP
> # 6 "t.c" 1
>         int3
> # 0 "" 2
> #NO_APP
>         je      .L2
>         xorl    %eax, %eax
>         ret

IF-CASE-2 found, start 3, else 5
verify found no changes in insn with uid = 15.
changing bb of uid 4
  from 5 to 3
deleting block 5
Conversion succeeded on pass 1.

Adding -fno-if-conversion gets what you want too. But I am not 100% sure if it
is "better".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-12 19:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12  8:54 [Bug rtl-optimization/109483] New: Unoptimal jump threading with assembler flag output ubizjak at gmail dot com
2023-04-12  8:59 ` [Bug middle-end/109483] " pinskia at gcc dot gnu.org
2023-04-12 10:47 ` [Bug tree-optimization/109483] Unoptimal uncprop " rguenth at gcc dot gnu.org
2023-04-12 10:48 ` rguenth at gcc dot gnu.org
2023-04-12 14:40 ` ubizjak at gmail dot com
2023-04-12 19:43 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).