[Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0)
@ 2022-04-21 17:47 denis.campredon at gmail dot com
  2022-04-22  6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: denis.campredon at gmail dot com @ 2022-04-21 17:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

            Bug ID: 105338
           Summary: Regression: jump or cmove generated for pattern (x ?
                    CST : 0)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

int f(int i) {
    return i ? 5 : 0;
}

int g(int i) {
    return i ? -2 : 0;
}

int h(int b) {
    return !!b * -2;
}

int i(int b) {
    return !!b * 5;
}

int j(int b) {
    if (!b) return 0;
    return -2;
}
-------------
With -02 gcc 11.2 the five functions above output branchless code like:

f(int):
        neg     edi
        sbb     eax, eax
        and     eax, 5
        ret
g(int):
        neg     edi
        sbb     eax, eax
        and     eax, -2
        ret
...
--------------

Whereas -02 gcc 12 now outputs a cmov or jump depending of sign of the
constant:
f(int):
        mov     eax, edi
        test    edi, edi
        mov     edx, 5
        cmovne  eax, edx
        ret

g(int):
        mov     eax, edi
        test    edi, edi
        jne     .L11
        ret
.L11:
        mov     eax, -2
        ret

h(int):
        mov     eax, edi
        test    edi, edi
        jne     .L17
        ret
.L17:
        mov     eax, -2
        ret

i(int):
        mov     eax, edi
        test    edi, edi
        mov     edx, 5
        cmovne  eax, edx
        ret

j(int):
        mov     eax, edi
        test    edi, edi
        mov     edx, -2
        cmovne  eax, edx
        ret
---------------------------

Alternatively, with the following code

int k(int b) {
    bool b2 = b;
    return b2 * 5;
}
----------------

Both gcc 12 and 11.2 are outputing
k(int):
        xor     eax, eax
        test    edi, edi
        setne   al
        lea     eax, [rax+rax*4]
        ret

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
@ 2022-04-22  6:13 ` rguenth at gcc dot gnu.org
  2022-04-22  9:17 ` jakub at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22  6:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Regression: jump or cmove   |[12 Regression] Regression:
                   |generated for pattern (x ?  |jump or cmove generated for
                   |CST : 0)                    |pattern (x ? CST : 0)
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
  2022-04-22  6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
@ 2022-04-22  9:17 ` jakub at gcc dot gnu.org
  2022-04-22  9:48 ` pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r12-7687-g3a7ba8fd0cda387809e4902328af2473662b6a4a
All the *.optimized dump differences with that change look like:
@@ -8,14 +8,14 @@ int f (int i)

   <bb 2> [local count: 1073741824]:
   if (i_2(D) != 0)
-    goto <bb 4>; [35.00%]
+    goto <bb 3>; [35.00%]
   else
-    goto <bb 3>; [65.00%]
+    goto <bb 4>; [65.00%]

-  <bb 3> [local count: 697932184]:
+  <bb 3> [local count: 375809640]:

   <bb 4> [local count: 1073741824]:
-  # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
+  # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
   return iftmp.0_1;

 }

i.e. swapping whether true or false edge goes through the empty bb.  Wonder why
we emit different code based on that...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
  2022-04-22  6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
  2022-04-22  9:17 ` jakub at gcc dot gnu.org
@ 2022-04-22  9:48 ` pinskia at gcc dot gnu.org
  2022-04-22 11:00 ` jakub at gcc dot gnu.org
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-22  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this is the same issue as PR 105314 really.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (2 preceding siblings ...)
  2022-04-22  9:48 ` pinskia at gcc dot gnu.org
@ 2022-04-22 11:00 ` jakub at gcc dot gnu.org
  2022-04-22 11:11 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, if I change passes.def
      NEXT_PASS (pass_sink_code, true /* unsplit edges */);
line
      NEXT_PASS (pass_sink_code, false /* unsplit edges */);
I get back the 11.x code.
Before ce1 pass, in that case say f looks like:
    8: flags:CCZ=cmp(r83:SI,0)
    9: pc={(flags:CCZ!=0)?L22:pc}
      REG_DEAD flags:CCZ
      REG_BR_PROB 375809644
   10: NOTE_INSN_BASIC_BLOCK 3
    5: r82:SI=0
      REG_DEAD r83:SI
      ; pc falls through to BB 5
   22: L22:
   21: NOTE_INSN_BASIC_BLOCK 4
    4: r82:SI=0x5
   15: L15:
   18: NOTE_INSN_BASIC_BLOCK 5
use (r82)
i.e. essentially
  if (tmp83 != 0)
    tmp82 = 5;
  else
    tmp82 = 0;
But with vanilla trunk, it is instead:
    7: flags:CCZ=cmp(r83:SI,0)
    8: pc={(flags:CCZ==0)?L10:pc}
      REG_DEAD flags:CCZ
      REG_BR_PROB 697932188
    9: NOTE_INSN_BASIC_BLOCK 4
    4: r83:SI=0x5
   10: L10:
   11: NOTE_INSN_BASIC_BLOCK 5
use (r83)
i.e.
  if (tmp83 != 0)
    tmp83 = 5;

Before fwprop1, the code looks roughly the same except for the swapped
branches,
so in C
  if (tmp83 != 0)
    tmp82 = 5;
  else
    tmp82 = tmp83;
vs.
  if (tmp83 == 0)
    tmp82 = tmp83;
  else
    tmp82 = 5;
I think fwprop1 only works on extended basic blocks and therefore turns the
latter into
  if (tmp83 == 0)
    tmp82 = 0;
  else
    tmp82 = 5;
and not the former.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (3 preceding siblings ...)
  2022-04-22 11:00 ` jakub at gcc dot gnu.org
@ 2022-04-22 11:11 ` rguenth at gcc dot gnu.org
  2022-04-22 11:14 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-04-22

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think none is more canonical so I wonder if there's an easy way to have ifcvt
recognize both?  But yes, it's probably a duplicate, but at least for x86 and
with complete testcases.

Note one of the issues is that creating/removing forwarders can end up swapping
things.  To avoid this (for GCC 13) we can try to not as aggressively remove
them (basically keep critical edges split) and maybe mark (temporarily) created
forwarders with a flag so that CFG cleanup can pick the "correct" one when
it is used as vehicle to remove them again (as in this case).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (4 preceding siblings ...)
  2022-04-22 11:11 ` rguenth at gcc dot gnu.org
@ 2022-04-22 11:14 ` rguenth at gcc dot gnu.org
  2022-04-22 11:20 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> Before fwprop1, the code looks roughly the same except for the swapped
> branches,
> so in C
>   if (tmp83 != 0)
>     tmp82 = 5;
>   else
>     tmp82 = tmp83;
> vs.
>   if (tmp83 == 0)
>     tmp82 = tmp83;
>   else
>     tmp82 = 5;

Specifically for this we could also look to canonicalize this in
uncprop which introduces the copies, making sure to rewrite this into
an equality compare?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (5 preceding siblings ...)
  2022-04-22 11:14 ` rguenth at gcc dot gnu.org
@ 2022-04-22 11:20 ` rguenth at gcc dot gnu.org
  2022-04-22 12:28 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-22 11:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> (In reply to Jakub Jelinek from comment #3)
> > Before fwprop1, the code looks roughly the same except for the swapped
> > branches,
> > so in C
> >   if (tmp83 != 0)
> >     tmp82 = 5;
> >   else
> >     tmp82 = tmp83;
> > vs.
> >   if (tmp83 == 0)
> >     tmp82 = tmp83;
> >   else
> >     tmp82 = 5;
> 
> Specifically for this we could also look to canonicalize this in
> uncprop which introduces the copies, making sure to rewrite this into
> an equality compare?

When un-propagating into a PHI in uncprop_into_successor_phis mark the
BB of the PHI node.

In uncprop_dom_walker::before_dom_children, when it is marked, match
a diamond/half-diamond with the immediate dominator and do the rewrite
of the conditional and swapping of edge flags.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (6 preceding siblings ...)
  2022-04-22 11:20 ` rguenth at gcc dot gnu.org
@ 2022-04-22 12:28 ` jakub at gcc dot gnu.org
  2022-04-22 13:59 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 12:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/config/i386/i386-expand.cc.jj   2022-04-13 15:42:39.000000000 +0200
+++ gcc/config/i386/i386-expand.cc      2022-04-22 14:18:27.347135185 +0200
@@ -3136,6 +3136,8 @@ ix86_expand_int_movcc (rtx operands[])
   bool sign_bit_compare_p = false;
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];

   if (GET_MODE (op0) == TImode
       || (GET_MODE (op0) == DImode
@@ -3153,17 +3155,29 @@ ix86_expand_int_movcc (rtx operands[])
       || (op1 == constm1_rtx && (code == GT || code == LE)))
     sign_bit_compare_p = true;

+  /* op0 == op1 ? op0 : op3 is equivalent to op0 == op1 ? op1 : op3,
+     but if op1 is a constant, the latter form allows more optimizations,
+     either through the last 2 ops being constant handling, or the one
+     constant and one variable cases.  On the other side, for cmov the
+     former might be better as we don't need to load the constant into
+     another register.  */
+  if (code == EQ && CONST_INT_P (op1) && rtx_equal_p (op0, op2))
+    op2 = op1;
+  /* Similarly for op0 != op1 ? op2 : op0 and op0 != op1 ? op2 : op1.  */
+  else if (code == NE && CONST_INT_P (op1) && rtx_equal_p (op0, op3))
+    op3 = op1;
+
   /* Don't attempt mode expansion here -- if we had to expand 5 or 6
      HImode insns, we'd be swallowed in word prefix ops.  */

   if ((mode != HImode || TARGET_FAST_PREFIX)
       && (mode != (TARGET_64BIT ? TImode : DImode))
-      && CONST_INT_P (operands[2])
-      && CONST_INT_P (operands[3]))
+      && CONST_INT_P (op2)
+      && CONST_INT_P (op3))
     {
       rtx out = operands[0];
-      HOST_WIDE_INT ct = INTVAL (operands[2]);
-      HOST_WIDE_INT cf = INTVAL (operands[3]);
+      HOST_WIDE_INT ct = INTVAL (op2);
+      HOST_WIDE_INT cf = INTVAL (op3);
       HOST_WIDE_INT diff;

       diff = ct - cf;
@@ -3559,6 +3573,9 @@ ix86_expand_int_movcc (rtx operands[])
       if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
        return false;

+      operands[2] = op2;
+      operands[3] = op3;
+
       /* If one of the two operands is an interesting constant, load a
         constant with the above and mask it in with a logical operation.  */

fixes the first and last 2 functions, now to look what's going on with the
second and third one...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (7 preceding siblings ...)
  2022-04-22 12:28 ` jakub at gcc dot gnu.org
@ 2022-04-22 13:59 ` jakub at gcc dot gnu.org
  2022-04-22 14:11 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 13:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, so at least for the 2nd function, the problem is that we reject it from
noce_try_cmove as well from noce_try_cmove_arith based on costs.
That is the case for the NEXT_PASS (pass_sink_code, false /* unsplit edges */);
compilation too, though it seems the cost is 4 higher in the vanilla trunk
case.

I think that can be perhaps fixed with:
--- gcc/config/i386/i386-expand.cc.jj   2022-04-22 14:18:27.000000000 +0200
+++ gcc/config/i386/i386-expand.cc      2022-04-22 15:13:47.263829089 +0200
@@ -3224,8 +3224,7 @@ ix86_expand_int_movcc (rtx operands[])
                }
              diff = ct - cf;

-             if (reg_overlap_mentioned_p (out, op0)
-                 || reg_overlap_mentioned_p (out, op1))
+             if (reg_overlap_mentioned_p (out, compare_op))
                tmp = gen_reg_rtx (mode);

              if (mode == DImode)
- at least I don't really see the point of using yet another temporary when we
already emitted the comparison earlier and all we emit is compare_op and
assignment to out.

Anyway, with NEXT_PASS (pass_sink_code, false /* unsplit edges */); it succeeds
then with cond_move_process_if_block.  But it doesn't with vanilla trunk,
because it punts on:
4060          /* Don't try to handle this if the condition uses the
4061             destination register.  */
4062          if (reg_overlap_mentioned_p (dest, cond))
4063            return FALSE;
I'd say it is reasonable to punt on that, because the whole
cond_move_process_if_block is meant to handle multiple cmoves, not just one,
and we handle all of them with the same cond.  The only case that could be
handled is if it is the very last set in then_bb and else_bb is not present, or
if it is the last set in else_bb.  I must say I'm also quite surprised we don't
really check any costs in the cond_move_process_if_block and just blindly
assume it will always be a win, so it seems it happily handles even the cases
of a single dest assignment that earlier noce_* routines attempt (but those do
check the costs and in this case punt).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (8 preceding siblings ...)
  2022-04-22 13:59 ` jakub at gcc dot gnu.org
@ 2022-04-22 14:11 ` jakub at gcc dot gnu.org
  2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 14:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, in both the first and second function if_info->original_cost is smaller
than seq_cost of the new sequence, but in the first function
if_info->max_seq_cost is larger than seq_cost of the new sequence, in the
second it is 0.
And the reason for that difference is that predictable_p is true in the second
function and not the first one, I presume because of the negative return value
predictor that predicts returning -2 as unlikely.  But if the negative value
return heuristic is right (at least most of the time), then using conditional
jump seems to be the right choice.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (9 preceding siblings ...)
  2022-04-22 14:11 ` jakub at gcc dot gnu.org
@ 2022-04-22 14:24 ` jakub at gcc dot gnu.org
  2022-04-23  8:26 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52850
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52850&action=edit
gcc12-pr105338.patch

Full untested patch for everything but the predictable cases.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (10 preceding siblings ...)
  2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
@ 2022-04-23  8:26 ` cvs-commit at gcc dot gnu.org
  2022-04-25 14:48 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-23  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1ceddd7497e15d262ead6f673f8f5ce79dd63714

commit r12-8233-g1ceddd7497e15d262ead6f673f8f5ce79dd63714
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Apr 23 10:25:31 2022 +0200

    i386: Improve ix86_expand_int_movcc [PR105338]

    The following testcase regressed on x86_64 on the trunk, due to some GIMPLE
    pass changes (r12-7687) we end up an *.optimized dump difference of:
    @@ -8,14 +8,14 @@ int foo (int i)

       <bb 2> [local count: 1073741824]:
       if (i_2(D) != 0)
    -    goto <bb 4>; [35.00%]
    +    goto <bb 3>; [35.00%]
       else
    -    goto <bb 3>; [65.00%]
    +    goto <bb 4>; [65.00%]

    -  <bb 3> [local count: 697932184]:
    +  <bb 3> [local count: 375809640]:

       <bb 4> [local count: 1073741824]:
    -  # iftmp.0_1 = PHI <5(2), i_2(D)(3)>
    +  # iftmp.0_1 = PHI <5(3), i_2(D)(2)>
       return iftmp.0_1;

     }
    and similarly for the other functions.  That is functionally equivalent and
    there is no canonical form for those.  The reason for i_2(D) in the PHI
    argument as opposed to 0 is the uncprop pass, that is in many cases
    beneficial for expansion as we don't need to load the value into some
pseudo
    in one of the if blocks.
    Now, for the 11.x ordering we have the pseudo = i insn in the extended
basic
    block (it comes first) and so forwprop1 undoes what uncprop does by
    propagating constant 0 there.  But for the 12.x ordering, the extended
basic
    block contains pseudo = 5 and pseudo = i is in the other bb and so fwprop1
    doesn't change it.
    During the ce1 pass, we attempt to emit a conditional move and we have very
    nice code for the cases where both last operands of ?: are constant, and
yet
    another for !TARGET_CMOVE if at least one of them is.

    The following patch will undo the uncprop behavior during
    ix86_expand_int_movcc, but just for those spots that can benefit from both
    or at least one operands being constant, leaving the pure cmov case as is
    (because then it is useful not to have to load a constant into a pseudo
    as it already is in one).  We can do that in the
    op0 == op1 ? op0 : op3
    or
    op0 != op1 ? op2 : op0
    cases if op1 is a CONST_INT by pretending it is
    op0 == op1 ? op1 : op3
    or
    op0 != op1 ? op2 : op1

    2022-04-23  Jakub Jelinek  <jakub@redhat.com>

            PR target/105338
            * config/i386/i386-expand.cc (ix86_expand_int_movcc): Handle
            op0 == cst1 ? op0 : op3 like op0 == cst1 ? cst1 : op3 for the
non-cmov
            cases.

            * gcc.target/i386/pr105338.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (11 preceding siblings ...)
  2022-04-23  8:26 ` cvs-commit at gcc dot gnu.org
@ 2022-04-25 14:48 ` jakub at gcc dot gnu.org
  2022-04-26  5:05 ` denis.campredon at gmail dot com
  2022-04-26  8:21 ` jakub at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-25 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.  I think you can use if (__builtin_expect_with_probability (!b, 0, 0.5))
return -2; else return 0; if you want to override the return negative
heuristics.
Or if you use it in places where that heuristics doesn't trigger, say:
int bar (int);
int g(int i) {
    return bar (i ? -2 : 0);
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (12 preceding siblings ...)
  2022-04-25 14:48 ` jakub at gcc dot gnu.org
@ 2022-04-26  5:05 ` denis.campredon at gmail dot com
  2022-04-26  8:21 ` jakub at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: denis.campredon at gmail dot com @ 2022-04-26  5:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #13 from denis.campredon at gmail dot com ---
Thanks a lots.

I have a question though: foo and bar are similar, foo produces a branchless
code whereas bar uses a jump.

int foo(int i) {
    return !i ? 0 : -2;
}

int bar(int i) {
    return i ? -2 : 0;
}

If I'm readding correctly in the two functions the probabilities are the same.
Is this "normal" or worth a new ticket ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/105338] [12 Regression] Regression: jump or cmove generated for pattern (x ? CST : 0)
  2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
                   ` (13 preceding siblings ...)
  2022-04-26  5:05 ` denis.campredon at gmail dot com
@ 2022-04-26  8:21 ` jakub at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-26  8:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105338

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to denis.campredon from comment #13)
> Thanks a lots.
> 
> I have a question though: foo and bar are similar, foo produces a branchless
> code whereas bar uses a jump.
> 
> int foo(int i) {
>     return !i ? 0 : -2;
> }
> 
> int bar(int i) {
>     return i ? -2 : 0;
> }
> 
> If I'm readding correctly in the two functions the probabilities are the
> same. Is this "normal" or worth a new ticket ?

There is the bug where cond_move_process_if_block ignores the costs and another
where if the cost checks are added, it might be worth for the destination
overlap with comparison operand case force the comparison operand or
destination into a new temporary and add one more first or last instead of
giving up.  The costs comparison would then catch that.
So yes, it is worth a new PR, but not something that will be addressed for GCC
12.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-26  8:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 17:47 [Bug rtl-optimization/105338] New: Regression: jump or cmove generated for pattern (x ? CST : 0) denis.campredon at gmail dot com
2022-04-22  6:13 ` [Bug rtl-optimization/105338] [12 Regression] " rguenth at gcc dot gnu.org
2022-04-22  9:17 ` jakub at gcc dot gnu.org
2022-04-22  9:48 ` pinskia at gcc dot gnu.org
2022-04-22 11:00 ` jakub at gcc dot gnu.org
2022-04-22 11:11 ` rguenth at gcc dot gnu.org
2022-04-22 11:14 ` rguenth at gcc dot gnu.org
2022-04-22 11:20 ` rguenth at gcc dot gnu.org
2022-04-22 12:28 ` jakub at gcc dot gnu.org
2022-04-22 13:59 ` jakub at gcc dot gnu.org
2022-04-22 14:11 ` jakub at gcc dot gnu.org
2022-04-22 14:24 ` [Bug target/105338] " jakub at gcc dot gnu.org
2022-04-23  8:26 ` cvs-commit at gcc dot gnu.org
2022-04-25 14:48 ` jakub at gcc dot gnu.org
2022-04-26  5:05 ` denis.campredon at gmail dot com
2022-04-26  8:21 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).