[PATCH 0/7] ifcvt: Convert multiple

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/7] ifcvt: Convert multiple
@ 2021-06-25 16:08 Robin Dapp
  2021-06-25 16:08 ` [PATCH 1/7] ifcvt: Check if cmovs are needed Robin Dapp
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:08 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Hi,

finally I got around to reworking this patch set for
if conversion so it is rather a v3 already.  Since so
much time has passed, I'm not replying to the old thread.

There are several problems with noce_convert_multiple
and most are due to the instructions costs not being
properly counted:

- We create too many temporaries that, by default,
  are counted as instruction even though they will
  be removed by later passes.
- We emit a new compare for every cmov.
- Original costs are unfairly low.

These are tackled in this patch set.

As mentioned in the last discussion, what still remains
is to properly unify some parts of ifcvt.  We have several
"sub-passes" that can handle similar situations but the
restrictions of each of them are not clearly documented.
Costing also differs in that we sometimes just count
emitted cmovs and sometimes we indeed use pattern_cost
or insn_cost.

Ideally, we will also unify passing value comparisons
as well as CC comparisons to backends in the future.
This will allow a backend to choose whether to emit
a comparison, re-use a comparison or even get rid of
a comparison by emitting a min/max.

Nonetheless, this patch set improves the situation on
s390 and, in general, allows more accurate costing.

Regards
 Robin

Robin Dapp (7):
  ifcvt: Check if cmovs are needed.
  ifcvt: Allow constants for noce_convert_multiple.
  ifcvt: Improve costs handling for noce_convert_multiple.
  ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  ifcvt: Try re-using CC for conditional moves.
  testsuite/s390: Add tests for noce_convert_multiple.
  s390: Increase costs for load on condition and change movqicc
    expander.

 gcc/config/s390/s390.c                        |   2 +-
 gcc/config/s390/s390.md                       |   4 +-
 gcc/ifcvt.c                                   | 273 +++++++++++++++---
 gcc/optabs.c                                  | 163 +++++++----
 gcc/optabs.h                                  |   1 +
 gcc/testsuite/gcc.dg/ifcvt-4.c                |   2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c    |  39 +++
 .../gcc.target/s390/ifcvt-two-insns-int.c     |  39 +++
 .../gcc.target/s390/ifcvt-two-insns-long.c    |  39 +++
 9 files changed, 464 insertions(+), 98 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

-- 
2.31.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
@ 2021-06-25 16:08 ` Robin Dapp
  2021-07-15 20:10   ` Richard Sandiford
  2021-06-25 16:09 ` [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple Robin Dapp
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:08 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

When if-converting multiple SETs and we encounter a swap-style idiom

  if (a > b)
    {
      tmp = c;   // [1]
      c = d;
      d = tmp;
    }

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
---
 gcc/ifcvt.c | 104 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 87 insertions(+), 17 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..eef6490626a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
 			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void check_need_cmovs (basic_block, hash_map<rtx, bool> *);
 \f
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec<rtx_insn *> unmodified_insns;
   int count = 0;
 
+  hash_map<rtx, bool> need_cmovs;
+
+  check_need_cmovs (then_bb, &need_cmovs);
+
   FOR_BB_INSNS (then_bb, insn)
     {
       /* Skip over non-insns.  */
@@ -3213,26 +3218,38 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
       gcc_checking_assert (set);
 
       rtx target = SET_DEST (set);
-      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx temp;
       rtx new_val = SET_SRC (set);
       rtx old_val = target;
 
-      /* If we were supposed to read from an earlier write in this block,
-	 we've changed the register allocation.  Rewire the read.  While
-	 we are looking, also try to catch a swap idiom.  */
-      for (int i = count - 1; i >= 0; --i)
-	if (reg_overlap_mentioned_p (new_val, targets[i]))
-	  {
-	    /* Catch a "swap" style idiom.  */
-	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
-	      /* The write to targets[i] is only live until the read
-		 here.  As the condition codes match, we can propagate
-		 the set to here.  */
-	      new_val = SET_SRC (single_set (unmodified_insns[i]));
-	    else
-	      new_val = temporaries[i];
-	    break;
-	  }
+      /* As we are transforming
+	 if (x > y)
+	   a = b;
+	   c = d;
+	 into
+	   a = (x > y) ...
+	   c = (x > y) ...
+
+	 we potentially check x > y before every set here.
+	 (Even though might be removed by subsequent passes.)
+	 We cannot transform
+	   if (x > y)
+	     x = y;
+	     ...
+	 into
+	   x = (x > y) ...
+	   ...
+	 since this would invalidate x.  Therefore we introduce a temporary
+	 every time we are about to overwrite a variable used in the
+	 check.  Costing of a sequence with these is going to be inaccurate.  */
+      if (reg_overlap_mentioned_p (target, cond))
+	temp = gen_reg_rtx (GET_MODE (target));
+      else
+	temp = target;
+
+      bool need_cmov = true;
+      if (need_cmovs.get (insn))
+	need_cmov = false;
 
       /* If we had a non-canonical conditional jump (i.e. one where
 	 the fallthrough is to the "else" case) we need to reverse
@@ -3808,6 +3825,59 @@ check_cond_move_block (basic_block bb,
   return TRUE;
 }
 
+/* Find local swap-style idioms in BB and mark the first insn (1)
+   that is only a temporary as not needing a conditional move as
+   it is going to be dead afterwards anyway.
+
+     (1) int tmp = a;
+	 a = b;
+	 b = tmp;
+
+	 ifcvt
+	 -->
+
+	 load tmp,a
+	 cmov a,b
+	 cmov b,tmp   */
+
+static void
+check_need_cmovs (basic_block bb, hash_map<rtx, bool> *need_cmov)
+{
+  rtx_insn *insn;
+  int count = 0;
+  auto_vec<rtx> insns;
+  auto_vec<rtx> dests;
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      rtx set, src, dest;
+
+      if (!active_insn_p (insn))
+	continue;
+
+      set = single_set (insn);
+      if (set == NULL_RTX)
+	continue;
+
+      src = SET_SRC (set);
+      dest = SET_DEST (set);
+
+      for (int i = count - 1; i >= 0; --i)
+	{
+	  if (reg_overlap_mentioned_p (src, dests[i])
+	      && find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
+	    {
+	      need_cmov->put (insns[i], false);
+	    }
+	}
+
+      insns.safe_push (insn);
+      dests.safe_push (dest);
+
+      count++;
+    }
+}
+
 /* Given a basic block BB suitable for conditional move conversion,
    a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
    the register values depending on COND, emit the insns in the block as
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
  2021-06-25 16:08 ` [PATCH 1/7] ifcvt: Check if cmovs are needed Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-07-15 20:25   ` Richard Sandiford
  2021-06-25 16:09 ` [PATCH 3/7] ifcvt: Improve costs handling " Robin Dapp
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

This lifts the restriction of not allowing constants for
noce_convert_multiple.  The code later checks if a valid sequence
is produced anyway.
---
 gcc/ifcvt.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index eef6490626a..6006055f26a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3269,7 +3269,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
 	 Wrap the two cmove operands into subregs if appropriate to prevent
 	 that.  */
-      if (GET_MODE (new_val) != GET_MODE (temp))
+
+      if (!CONSTANT_P (new_val)
+	  && GET_MODE (new_val) != GET_MODE (temp))
 	{
 	  machine_mode src_mode = GET_MODE (new_val);
 	  machine_mode dst_mode = GET_MODE (temp);
@@ -3280,7 +3282,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	    }
 	  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
 	}
-      if (GET_MODE (old_val) != GET_MODE (temp))
+      if (!CONSTANT_P (old_val)
+	  && GET_MODE (old_val) != GET_MODE (temp))
 	{
 	  machine_mode src_mode = GET_MODE (old_val);
 	  machine_mode dst_mode = GET_MODE (temp);
@@ -3409,9 +3412,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
       if (!REG_P (dest))
 	return false;
 
-      if (!(REG_P (src)
-	   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
-	       && subreg_lowpart_p (src))))
+      if (!((REG_P (src) || CONSTANT_P (src))
+	    || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
+	      && subreg_lowpart_p (src))))
 	return false;
 
       /* Destination must be appropriate for a conditional write.  */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
  2021-06-25 16:08 ` [PATCH 1/7] ifcvt: Check if cmovs are needed Robin Dapp
  2021-06-25 16:09 ` [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-07-15 20:42   ` Richard Sandiford
  2021-06-25 16:09 ` [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move Robin Dapp
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

When noce_convert_multiple is called the original costs are not yet
initialized.  Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2).  This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.

This patch temporarily initialized the original costs by counting
a compare and all the sets inside the then_bb.
---
 gcc/ifcvt.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 6006055f26a..ac0c142c9fe 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3382,14 +3382,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
    (SET (REG) (REG)) insns suitable for conversion to a series
    of conditional moves.  Also check that we have more than one set
    (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+   through the insns store the sum of their potential costs in COST.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = param_max_rtl_if_conversion_insns;
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+  unsigned potential_cost = 0;
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3425,9 +3428,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
       if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
+      potential_cost += pattern_cost (set, speed_p);
+
       count++;
     }
 
+  *cost += potential_cost;
+
   /* If we would only put out one conditional move, the other strategies
      this pass tries are better optimized and will be more appropriate.
      Some targets want to strictly limit the number of conditional moves
@@ -3475,11 +3482,23 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  */
+  /* First look for multiple SETS.  The original costs already
+     include a compare that we will be needing either way.  When
+     comparing costs we want to use the branch instruction cost and
+     the sets vs. the cmovs generated here.  Therefore subtract
+     the costs of the compare before checking.
+     ??? Actually, instead of the branch instruction costs we might want
+     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.*/
+
+  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
-      && bb_ok_for_noce_convert_multiple_sets (then_bb))
+      && bb_ok_for_noce_convert_multiple_sets (then_bb, &potential_cost))
     {
+      /* Temporarily set the original costs to what we estimated so
+	 we can determine if the transformation is worth it.  */
+      if_info->original_cost = potential_cost;
       if (noce_convert_multiple_sets (if_info))
 	{
 	  if (dump_file && if_info->transform_name)
@@ -3487,6 +3506,9 @@ noce_process_if_block (struct noce_if_info *if_info)
 		     if_info->transform_name);
 	  return TRUE;
 	}
+
+      /* Restore the original costs.  */
+      if_info->original_cost = old_cost;
     }
 
   bool speed_p = optimize_bb_for_speed_p (test_bb);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
                   ` (2 preceding siblings ...)
  2021-06-25 16:09 ` [PATCH 3/7] ifcvt: Improve costs handling " Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-07-15 20:54   ` Richard Sandiford
  2021-06-25 16:09 ` [PATCH 5/7] ifcvt: Try re-using CC for conditional moves Robin Dapp
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.
---
 gcc/ifcvt.c  |  16 +++--
 gcc/optabs.c | 163 ++++++++++++++++++++++++++++++++++-----------------
 gcc/optabs.h |   1 +
 3 files changed, 121 insertions(+), 59 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index ac0c142c9fe..c5b8641e2aa 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -771,7 +771,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-			    rtx, rtx, rtx);
+			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1710,7 +1710,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+		 rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1756,9 +1757,14 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
   unsignedp = (code == LTU || code == GEU
 	       || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
-				  vtrue, vfalse, GET_MODE (x),
-				  unsignedp);
+  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
+    target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
+				    vtrue, vfalse, GET_MODE (x));
+  else
+    target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
+				    vtrue, vfalse, GET_MODE (x),
+				    unsignedp);
+
   if (target)
     return target;
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59..6bf486b9b50 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -52,6 +52,8 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
 static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
 static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
 
+static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
+
 /* Debug facility for use in GDB.  */
 void debug_optab_libfuncs (void);
 \f
@@ -4747,7 +4749,6 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
 		       machine_mode mode, int unsignedp)
 {
   rtx comparison;
-  rtx_insn *last;
   enum insn_code icode;
   enum rtx_code reversed;
 
@@ -4774,6 +4775,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   /* get_condition will prefer to generate LT and GT even if the old
      comparison was against zero, so undo that canonicalization here since
      comparisons against zero are cheaper.  */
+
   if (code == LT && op1 == const1_rtx)
     code = LE, op1 = const0_rtx;
   else if (code == GT && op1 == constm1_rtx)
@@ -4782,17 +4784,29 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (cmode == VOIDmode)
     cmode = GET_MODE (op0);
 
-  enum rtx_code orig_code = code;
+  /* If the first source operand is constant and the second is not, swap
+     it into the second.  In that case we also need to reverse the
+     comparison.  It is possible, though, that the conditional move
+     will not expand with operands in this order, so we might also need
+     to revert to the original comparison and operand order.  */
+
+  rtx rev_comparison = NULL_RTX;
   bool swapped = false;
-  if (swap_commutative_operands_p (op2, op3)
-      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
-          != UNKNOWN))
+
+  code = unsignedp ? unsigned_condition (code) : code;
+  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
+
+  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
+      != UNKNOWN)
     {
-      std::swap (op2, op3);
-      code = reversed;
-      swapped = true;
+      reversed = unsignedp ? unsigned_condition (reversed) : reversed;
+      rev_comparison = simplify_gen_relational (reversed, VOIDmode, cmode,
+						op0, op1);
     }
 
+  if (swap_commutative_operands_p (op2, op3) && reversed != UNKNOWN)
+    swapped = true;
+
   if (mode == VOIDmode)
     mode = GET_MODE (op2);
 
@@ -4804,58 +4818,99 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (!target)
     target = gen_reg_rtx (mode);
 
-  for (int pass = 0; ; pass++)
+  if (comparison && COMPARISON_P (comparison))
+    prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
+		      GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
+		      &comparison, &cmode);
+  else
+    return NULL_RTX;
+
+  if (rev_comparison && COMPARISON_P (rev_comparison))
+    prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
+		      GET_CODE (rev_comparison), NULL_RTX,
+		      unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);
+
+  if (!swapped)
+    return emit_conditional_move (target, comparison, rev_comparison,
+				  op2, op3, mode);
+  else
+    return emit_conditional_move (target, rev_comparison, comparison,
+				  op3, op2, mode);
+}
+
+/* Helper function for emitting a conditional move.  Given a COMPARISON
+   and a reversed REV_COMPARISON it will try to expand a conditional move
+   with COMPARISON first and try with REV_COMPARISON if that fails.  */
+
+rtx
+emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+
+  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
+
+  if (res != NULL_RTX)
+    return res;
+
+  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
+}
+
+/* Helper for emitting a conditional move.  */
+
+static rtx
+emit_conditional_move (rtx target, rtx comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+  rtx_insn *last;
+  enum insn_code icode;
+
+  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
+    return NULL_RTX;
+
+  /* If the two source operands are identical, that's just a move.  */
+  if (rtx_equal_p (op2, op3))
     {
-      code = unsignedp ? unsigned_condition (code) : code;
-      comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
+      if (!target)
+	target = gen_reg_rtx (mode);
 
-      /* We can get const0_rtx or const_true_rtx in some circumstances.  Just
-	 punt and let the caller figure out how best to deal with this
-	 situation.  */
-      if (COMPARISON_P (comparison))
-	{
-	  saved_pending_stack_adjust save;
-	  save_pending_stack_adjust (&save);
-	  last = get_last_insn ();
-	  do_pending_stack_adjust ();
-	  machine_mode cmpmode = cmode;
-	  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
-			    GET_CODE (comparison), NULL_RTX, unsignedp,
-			    OPTAB_WIDEN, &comparison, &cmpmode);
-	  if (comparison)
-	    {
-	      class expand_operand ops[4];
+      emit_move_insn (target, op3);
+      return target;
+    }
 
-	      create_output_operand (&ops[0], target, mode);
-	      create_fixed_operand (&ops[1], comparison);
-	      create_input_operand (&ops[2], op2, mode);
-	      create_input_operand (&ops[3], op3, mode);
-	      if (maybe_expand_insn (icode, 4, ops))
-		{
-		  if (ops[0].value != target)
-		    convert_move (target, ops[0].value, false);
-		  return target;
-		}
-	    }
-	  delete_insns_since (last);
-	  restore_pending_stack_adjust (&save);
-	}
+  if (mode == VOIDmode)
+    mode = GET_MODE (op2);
 
-      if (pass == 1)
-	return NULL_RTX;
+  icode = direct_optab_handler (movcc_optab, mode);
 
-      /* If the preferred op2/op3 order is not usable, retry with other
-	 operand order, perhaps it will expand successfully.  */
-      if (swapped)
-	code = orig_code;
-      else if ((reversed = reversed_comparison_code_parts (orig_code, op0, op1,
-							   NULL))
-	       != UNKNOWN)
-	code = reversed;
-      else
-	return NULL_RTX;
-      std::swap (op2, op3);
+  if (icode == CODE_FOR_nothing)
+    return NULL_RTX;
+
+  if (!target)
+    target = gen_reg_rtx (mode);
+
+  saved_pending_stack_adjust save;
+  save_pending_stack_adjust (&save);
+  last = get_last_insn ();
+  do_pending_stack_adjust ();
+
+  class expand_operand ops[4];
+
+  create_output_operand (&ops[0], target, mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_input_operand (&ops[2], op2, mode);
+  create_input_operand (&ops[3], op3, mode);
+
+  if (maybe_expand_insn (icode, 4, ops))
+    {
+      if (ops[0].value != target)
+	convert_move (target, ops[0].value, false);
+      return target;
     }
+
+  delete_insns_since (last);
+  restore_pending_stack_adjust (&save);
+
+  return NULL_RTX;
 }
 
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 3bbceff92d9..f853b93f37f 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
 /* Emit a conditional move operation.  */
 rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
 			   rtx, rtx, machine_mode, int);
+rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
 
 /* Emit a conditional negate or bitwise complement operation.  */
 rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/7] ifcvt: Try re-using CC for conditional moves.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
                   ` (3 preceding siblings ...)
  2021-06-25 16:09 ` [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-07-22 12:12   ` Robin Dapp
  2021-06-25 16:09 ` [PATCH 6/7] testsuite/s390: Add tests for noce_convert_multiple Robin Dapp
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences:  The same one as before
and a second one that tries to re-use the existing CC.  Then their costs
are compared and the cheaper one is selected.
---
 gcc/ifcvt.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 86 insertions(+), 8 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index c5b8641e2aa..55405ce9ef7 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3142,6 +3142,47 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* Helper function to emit a cmov sequence.  */
+
+static rtx_insn*
+try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
+		    rtx cond, rtx target,
+		    rtx new_val, rtx old_val, bool need_cmov,
+		    unsigned *cost, rtx *temp_dest,
+		    rtx cc_cmp = NULL, rtx rev_cc_cmp = NULL)
+{
+  rtx_insn *seq = NULL;
+  *cost = 0;
+
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  start_sequence ();
+
+  if (need_cmov)
+    *temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				 x, y, new_val, old_val, cc_cmp, rev_cc_cmp);
+  else
+    {
+      *temp_dest = target;
+      if (if_info->then_else_reversed)
+	noce_emit_move_insn (target, old_val);
+      else
+	noce_emit_move_insn (target, new_val);
+    }
+
+  if (*temp_dest != NULL_RTX)
+    {
+      seq = get_insns ();
+      *cost = seq_cost (seq, if_info->speed_p);
+    }
+
+  end_sequence ();
+
+  return seq;
+}
+
 /* We have something like:
 
      if (x > y)
@@ -3199,7 +3240,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx cond = noce_get_condition (jump, &cond_earliest, false);
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
-  rtx_code cond_code = GET_CODE (cond);
+
+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = noce_get_condition (jump, &cond_earliest, true);
 
   /* The true targets for a conditional move.  */
   auto_vec<rtx> targets;
@@ -3301,18 +3344,52 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
 	}
 
-      /* Actually emit the conditional move.  */
-      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-				       x, y, new_val, old_val);
+      /* Try emitting a conditional move passing the backend the
+	 canonicalized comparison.  The backend is then able to
+	 recognize expressions like
 
-      /* If we failed to expand the conditional move, drop out and don't
-	 try to continue.  */
-      if (temp_dest == NULL_RTX)
+	   if (x > y)
+	     y = x;
+
+	 as min/max emit an insn, accordingly.
+	 We will still emit a superfluous CC comparison before the
+	 min/max, though, which complicates costing.  */
+      unsigned cost1 = 0, cost2 = 0;
+      rtx_insn *seq, *seq1, *seq2;
+      rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+
+      seq1 = try_emit_cmove_seq (if_info, temp, cond, target,
+			     new_val, old_val, need_cmov, &cost1, &temp_dest1);
+
+      /* Here, we try to pass the backend a non-canonicalized cc comparison
+	 as well.  This allows the backend to emit a cmov directly without
+	 creating an additional compare for each.  If successful, costing
+	 is easier and this sequence is usually preferred.  */
+      seq2 = try_emit_cmove_seq (if_info, target, cond, target,
+			     new_val, old_val, need_cmov, &cost2, &temp_dest2,
+			     cc_cmp, rev_cc_cmp);
+
+      /* Check which version is less expensive.  */
+      if (seq1 != NULL_RTX && cost1 <= cost2)
+	{
+	  seq = seq1;
+	  temp_dest = temp_dest1;
+	}
+      else if (seq2 != NULL_RTX)
+	{
+	  seq = seq2;
+	  temp_dest = temp_dest2;
+	}
+      else
 	{
+	  /* Nothing worked, bail out.  */
 	  end_sequence ();
 	  return FALSE;
 	}
 
+      /* End the sub sequence and emit to the main sequence.  */
+      emit_insn (seq);
+
       /* Bookkeeping.  */
       count++;
       targets.safe_push (target);
@@ -3326,7 +3403,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   /* Now fixup the assignments.  */
   for (int i = 0; i < count; i++)
-    noce_emit_move_insn (targets[i], temporaries[i]);
+    if (targets[i] != temporaries[i])
+      noce_emit_move_insn (targets[i], temporaries[i]);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 6/7] testsuite/s390: Add tests for noce_convert_multiple.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
                   ` (4 preceding siblings ...)
  2021-06-25 16:09 ` [PATCH 5/7] ifcvt: Try re-using CC for conditional moves Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-06-25 16:09 ` [PATCH 7/7] s390: Increase costs for load on condition and change movqicc expander Robin Dapp
  2021-07-13 12:42 ` [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
  7 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Add new s390-specific tests that check if we convert two SETs into two
loads on condition.  Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
---
 gcc/testsuite/gcc.dg/ifcvt-4.c                |  2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c    | 39 +++++++++++++++++++
 .../gcc.target/s390/ifcvt-two-insns-int.c     | 39 +++++++++++++++++++
 .../gcc.target/s390/ifcvt-two-insns-long.c    | 39 +++++++++++++++++++
 4 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index ec142cfd943..871ab950756 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
 /* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* } }  */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* visium-*-*" riscv*-*-* msp430-*-* } }  */
 /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */
 
 typedef int word __attribute__((mode(word)));
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
new file mode 100644
index 00000000000..4415cb49f11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include <stdbool.h>
+#include <limits.h>
+#include <stdio.h>
+#include <assert.h>
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 999999;
+  bool bla = false;
+  for (int i = 0; i < n; i++)
+    {
+      if (a[i] < min)
+	{
+	  min = a[i];
+	  bla = true;
+	}
+    }
+
+  if (bla)
+    min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
new file mode 100644
index 00000000000..db8df84cced
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include <stdbool.h>
+#include <limits.h>
+#include <stdio.h>
+#include <assert.h>
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 999999;
+  int bla = 0;
+  for (int i = 0; i < n; i++)
+    {
+      if (a[i] < min)
+	{
+	  min = a[i];
+	  bla = 1;
+	}
+    }
+
+  if (bla)
+    min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
new file mode 100644
index 00000000000..bee63328729
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "locghinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locgrnhe\t.*" } } */
+#include <stdbool.h>
+#include <limits.h>
+#include <stdio.h>
+#include <assert.h>
+
+__attribute__ ((noinline))
+long foo (long *a, unsigned long n)
+{
+  long min = 999999;
+  long bla = 0;
+  for (int i = 0; i < n; i++)
+    {
+      if (a[i] < min)
+	{
+	  min = a[i];
+	  bla = 1;
+	}
+    }
+
+  if (bla)
+    min += 1;
+  return min;
+}
+
+int main()
+{
+  long a[] = {2, 1, -13, LONG_MAX, LONG_MIN, 0};
+
+  long res = foo (a, sizeof (a));
+
+  assert (res == (LONG_MIN + 1));
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 7/7] s390: Increase costs for load on condition and change movqicc expander.
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
                   ` (5 preceding siblings ...)
  2021-06-25 16:09 ` [PATCH 6/7] testsuite/s390: Add tests for noce_convert_multiple Robin Dapp
@ 2021-06-25 16:09 ` Robin Dapp
  2021-07-13 12:42 ` [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
  7 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2021-06-25 16:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

---
 gcc/config/s390/s390.c  | 2 +-
 gcc/config/s390/s390.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e3b1f99669f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3636,7 +3636,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
 	/* It is going to be a load/store on condition.  Make it
 	   slightly more expensive than a normal load.  */
-	*total = COSTS_N_INSNS (1) + 1;
+	*total = COSTS_N_INSNS (1) + 2;
 
 	rtx dst = SET_DEST (x);
 	rtx then = XEXP (SET_SRC (x), 1);
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0c5b4dc9029..ee4fac2f9ad 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -7006,9 +7006,9 @@
   if (!CONSTANT_P (els))
     els = simplify_gen_subreg (E_SImode, els, <MODE>mode, 0);
 
-  rtx tmp_target = gen_reg_rtx (E_SImode);
+  rtx tmp_target = simplify_gen_subreg (E_SImode, operands[0], <MODE>mode, 0);
+
   emit_insn (gen_movsicc (tmp_target, operands[1], then, els));
-  emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp_target));
   DONE;
 })
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/7] ifcvt: Convert multiple
  2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
                   ` (6 preceding siblings ...)
  2021-06-25 16:09 ` [PATCH 7/7] s390: Increase costs for load on condition and change movqicc expander Robin Dapp
@ 2021-07-13 12:42 ` Robin Dapp
  7 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2021-07-13 12:42 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Ping :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-06-25 16:08 ` [PATCH 1/7] ifcvt: Check if cmovs are needed Robin Dapp
@ 2021-07-15 20:10   ` Richard Sandiford
  2021-07-22 12:06     ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-07-15 20:10 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Sorry for the slow review.

Robin Dapp <rdapp@linux.ibm.com> writes:
> When if-converting multiple SETs and we encounter a swap-style idiom
>
>   if (a > b)
>     {
>       tmp = c;   // [1]
>       c = d;
>       d = tmp;
>     }
>
> ifcvt should not generate a conditional move for the instruction at
> [1].
>
> In order to achieve that, this patch goes through all relevant SETs
> and marks the relevant instructions.  This helps to evaluate costs.
>
> On top, only generate temporaries if the current cmov is going to
> overwrite one of the comparands of the initial compare.
> ---
>  gcc/ifcvt.c | 104 +++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 87 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 017944f4f79..eef6490626a 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
>  			       edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void check_need_cmovs (basic_block, hash_map<rtx, bool> *);
>  \f
>  /* Count the number of non-jump active insns in BB.  */
>  
> @@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>    auto_vec<rtx_insn *> unmodified_insns;
>    int count = 0;
>  
> +  hash_map<rtx, bool> need_cmovs;

A hash_set might be simpler, given that the code only enters insns for
which the bool is false.  “rtx_insn *” would be better than rtx.

> +
> +  check_need_cmovs (then_bb, &need_cmovs);
> +
>    FOR_BB_INSNS (then_bb, insn)
>      {
>        /* Skip over non-insns.  */
> @@ -3213,26 +3218,38 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>        gcc_checking_assert (set);
>  
>        rtx target = SET_DEST (set);
> -      rtx temp = gen_reg_rtx (GET_MODE (target));
> +      rtx temp;
>        rtx new_val = SET_SRC (set);
>        rtx old_val = target;
>  
> -      /* If we were supposed to read from an earlier write in this block,
> -	 we've changed the register allocation.  Rewire the read.  While
> -	 we are looking, also try to catch a swap idiom.  */
> -      for (int i = count - 1; i >= 0; --i)
> -	if (reg_overlap_mentioned_p (new_val, targets[i]))
> -	  {
> -	    /* Catch a "swap" style idiom.  */
> -	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> -	      /* The write to targets[i] is only live until the read
> -		 here.  As the condition codes match, we can propagate
> -		 the set to here.  */
> -	      new_val = SET_SRC (single_set (unmodified_insns[i]));
> -	    else
> -	      new_val = temporaries[i];
> -	    break;
> -	  }
> +      /* As we are transforming
> +	 if (x > y)
> +	   a = b;
> +	   c = d;

Looks like some missing braces here.

> +	 into
> +	   a = (x > y) ...
> +	   c = (x > y) ...
> +
> +	 we potentially check x > y before every set here.
> +	 (Even though might be removed by subsequent passes.)

Do you mean the sets might be removed or that the checks might be removed?

> +	 We cannot transform
> +	   if (x > y)
> +	     x = y;
> +	     ...
> +	 into
> +	   x = (x > y) ...
> +	   ...
> +	 since this would invalidate x.  Therefore we introduce a temporary
> +	 every time we are about to overwrite a variable used in the
> +	 check.  Costing of a sequence with these is going to be inaccurate.  */
> +      if (reg_overlap_mentioned_p (target, cond))
> +	temp = gen_reg_rtx (GET_MODE (target));
> +      else
> +	temp = target;
> +
> +      bool need_cmov = true;
> +      if (need_cmovs.get (insn))
> +	need_cmov = false;

The patch is quite hard to review on its own, since nothing actually uses
this variable.  It's also not obvious how the reg_overlap_mentioned_p
code works if the old target is referenced later.

Could you refactor the series a bit so that each patch is self-contained?
It's OK if that means fewer patches.

Thanks,
Richard

>        /* If we had a non-canonical conditional jump (i.e. one where
>  	 the fallthrough is to the "else" case) we need to reverse
> @@ -3808,6 +3825,59 @@ check_cond_move_block (basic_block bb,
>    return TRUE;
>  }
>  
> +/* Find local swap-style idioms in BB and mark the first insn (1)
> +   that is only a temporary as not needing a conditional move as
> +   it is going to be dead afterwards anyway.
> +
> +     (1) int tmp = a;
> +	 a = b;
> +	 b = tmp;
> +
> +	 ifcvt
> +	 -->
> +
> +	 load tmp,a
> +	 cmov a,b
> +	 cmov b,tmp   */
> +
> +static void
> +check_need_cmovs (basic_block bb, hash_map<rtx, bool> *need_cmov)
> +{
> +  rtx_insn *insn;
> +  int count = 0;
> +  auto_vec<rtx> insns;
> +  auto_vec<rtx> dests;
> +
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      rtx set, src, dest;
> +
> +      if (!active_insn_p (insn))
> +	continue;
> +
> +      set = single_set (insn);
> +      if (set == NULL_RTX)
> +	continue;
> +
> +      src = SET_SRC (set);
> +      dest = SET_DEST (set);
> +
> +      for (int i = count - 1; i >= 0; --i)
> +	{
> +	  if (reg_overlap_mentioned_p (src, dests[i])
> +	      && find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
> +	    {
> +	      need_cmov->put (insns[i], false);
> +	    }
> +	}
> +
> +      insns.safe_push (insn);
> +      dests.safe_push (dest);
> +
> +      count++;
> +    }
> +}
> +
>  /* Given a basic block BB suitable for conditional move conversion,
>     a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
>     the register values depending on COND, emit the insns in the block as

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple.
  2021-06-25 16:09 ` [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple Robin Dapp
@ 2021-07-15 20:25   ` Richard Sandiford
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2021-07-15 20:25 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> This lifts the restriction of not allowing constants for
> noce_convert_multiple.  The code later checks if a valid sequence
> is produced anyway.

OK, thanks.

I was initially worried that this might trump later, more targetted
optimisations, but it looks like that's already accounted for:

  /* If we would only put out one conditional move, the other strategies
     this pass tries are better optimized and will be more appropriate.
     Some targets want to strictly limit the number of conditional moves
     that are emitted, they set this through PARAM, we need to respect
     that.  */
  return count > 1 && count <= param;

Richard


> ---
>  gcc/ifcvt.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index eef6490626a..6006055f26a 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3269,7 +3269,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>  	 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
>  	 Wrap the two cmove operands into subregs if appropriate to prevent
>  	 that.  */
> -      if (GET_MODE (new_val) != GET_MODE (temp))
> +
> +      if (!CONSTANT_P (new_val)
> +	  && GET_MODE (new_val) != GET_MODE (temp))
>  	{
>  	  machine_mode src_mode = GET_MODE (new_val);
>  	  machine_mode dst_mode = GET_MODE (temp);
> @@ -3280,7 +3282,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>  	    }
>  	  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
>  	}
> -      if (GET_MODE (old_val) != GET_MODE (temp))
> +      if (!CONSTANT_P (old_val)
> +	  && GET_MODE (old_val) != GET_MODE (temp))
>  	{
>  	  machine_mode src_mode = GET_MODE (old_val);
>  	  machine_mode dst_mode = GET_MODE (temp);
> @@ -3409,9 +3412,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
>        if (!REG_P (dest))
>  	return false;
>  
> -      if (!(REG_P (src)
> -	   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> -	       && subreg_lowpart_p (src))))
> +      if (!((REG_P (src) || CONSTANT_P (src))
> +	    || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> +	      && subreg_lowpart_p (src))))
>  	return false;
>  
>        /* Destination must be appropriate for a conditional write.  */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.
  2021-06-25 16:09 ` [PATCH 3/7] ifcvt: Improve costs handling " Robin Dapp
@ 2021-07-15 20:42   ` Richard Sandiford
  2021-07-22 12:07     ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-07-15 20:42 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> When noce_convert_multiple is called the original costs are not yet
> initialized.  Therefore, up to now, costs were only ever unfairly
> compared against COSTS_N_INSNS (2).  This would lead to
> default_noce_conversion_profitable_p () rejecting all but the most
> contrived of sequences.
>
> This patch temporarily initialized the original costs by counting
> a compare and all the sets inside the then_bb.
> ---
>  gcc/ifcvt.c | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 6006055f26a..ac0c142c9fe 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3382,14 +3382,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>     (SET (REG) (REG)) insns suitable for conversion to a series
>     of conditional moves.  Also check that we have more than one set
>     (other routines can handle a single set better than we would), and
> -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
> +   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> +   through the insns store the sum of their potential costs in COST.  */
>  
>  static bool
> -bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
> +bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
>  {
>    rtx_insn *insn;
>    unsigned count = 0;
>    unsigned param = param_max_rtl_if_conversion_insns;
> +  bool speed_p = optimize_bb_for_speed_p (test_bb);
> +  unsigned potential_cost = 0;
>  
>    FOR_BB_INSNS (test_bb, insn)
>      {
> @@ -3425,9 +3428,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
>        if (!can_conditionally_move_p (GET_MODE (dest)))
>  	return false;
>  
> +      potential_cost += pattern_cost (set, speed_p);
> +

It looks like this is an existing (potential) problem,
but default_noce_conversion_profitable_p uses seq_cost, which in turn
uses insn_cost.  And insn_cost has an optional target hook behind it,
which allows for costing based on insn attributes etc.  For a true
apples-with-apples comparison we should use insn_cost here too.

>        count++;
>      }
>  
> +  *cost += potential_cost;
> +
>    /* If we would only put out one conditional move, the other strategies
>       this pass tries are better optimized and will be more appropriate.
>       Some targets want to strictly limit the number of conditional moves
> @@ -3475,11 +3482,23 @@ noce_process_if_block (struct noce_if_info *if_info)
>       to calculate a value for x.
>       ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  */
> +  /* First look for multiple SETS.  The original costs already
> +     include a compare that we will be needing either way.

I think the detail that COSTS_N_INSNS (2) is the default is useful here.
(In other words, I'd forgotten by the time I'd poked around other bits of
ifcvt and was about to ask why we didn't cost the condition “properly”.)
So how about something like:

   The original costs already include a base cost of COSTS_N_INSNS (2):
   one instruction for the compare (which we will be needing either way)
   and one instruction for the branch.

> +     When
> +     comparing costs we want to use the branch instruction cost and
> +     the sets vs. the cmovs generated here.  Therefore subtract
> +     the costs of the compare before checking.
> +     ??? Actually, instead of the branch instruction costs we might want
> +     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.*/

Hmm, not sure about the ??? either way.  The units of BRANCH_COST aren't
entirely clear.  But it's a ???, so keeping it is fine.

Formatting nit: should be two spaces between “.” and “*/”.

Looks good otherwise, thanks.

Richard

> +
> +  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned old_cost = if_info->original_cost;
>    if (!else_bb
>        && HAVE_conditional_move
> -      && bb_ok_for_noce_convert_multiple_sets (then_bb))
> +      && bb_ok_for_noce_convert_multiple_sets (then_bb, &potential_cost))
>      {
> +      /* Temporarily set the original costs to what we estimated so
> +	 we can determine if the transformation is worth it.  */
> +      if_info->original_cost = potential_cost;
>        if (noce_convert_multiple_sets (if_info))
>  	{
>  	  if (dump_file && if_info->transform_name)
> @@ -3487,6 +3506,9 @@ noce_process_if_block (struct noce_if_info *if_info)
>  		     if_info->transform_name);
>  	  return TRUE;
>  	}
> +
> +      /* Restore the original costs.  */
> +      if_info->original_cost = old_cost;
>      }
>  
>    bool speed_p = optimize_bb_for_speed_p (test_bb);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-06-25 16:09 ` [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move Robin Dapp
@ 2021-07-15 20:54   ` Richard Sandiford
  2021-07-22 12:07     ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-07-15 20:54 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> Currently we only ever call emit_conditional_move with the comparison
> (as well as its comparands) we got from the jump.  Thus, backends are
> going to emit a CC comparison for every conditional move that is being
> generated instead of re-using the existing CC.
> This, combined with emitting temporaries for each conditional move,
> causes sky-high costs for conditional moves.
>
> This patch allows to re-use a CC so the costing situation is improved a
> bit.
> ---
>  gcc/ifcvt.c  |  16 +++--
>  gcc/optabs.c | 163 ++++++++++++++++++++++++++++++++++-----------------
>  gcc/optabs.h |   1 +
>  3 files changed, 121 insertions(+), 59 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index ac0c142c9fe..c5b8641e2aa 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -771,7 +771,7 @@ static int noce_try_addcc (struct noce_if_info *);
>  static int noce_try_store_flag_constants (struct noce_if_info *);
>  static int noce_try_store_flag_mask (struct noce_if_info *);
>  static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
> -			    rtx, rtx, rtx);
> +			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
>  static int noce_try_cmove (struct noce_if_info *);
>  static int noce_try_cmove_arith (struct noce_if_info *);
>  static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> @@ -1710,7 +1710,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
>  
>  static rtx
>  noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
> -		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
> +		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
> +		 rtx rev_cc_cmp)
>  {
>    rtx target ATTRIBUTE_UNUSED;
>    int unsignedp ATTRIBUTE_UNUSED;
> @@ -1756,9 +1757,14 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
>    unsignedp = (code == LTU || code == GEU
>  	       || code == LEU || code == GTU);
>  
> -  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> -				  vtrue, vfalse, GET_MODE (x),
> -				  unsignedp);
> +  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
> +    target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
> +				    vtrue, vfalse, GET_MODE (x));
> +  else
> +    target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> +				    vtrue, vfalse, GET_MODE (x),
> +				    unsignedp);

It might make sense to move:

  /* Don't even try if the comparison operands are weird
     except that the target supports cbranchcc4.  */
  if (! general_operand (cmp_a, GET_MODE (cmp_a))
      || ! general_operand (cmp_b, GET_MODE (cmp_b)))
    {
      if (!have_cbranchcc4
	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
	  || cmp_b != const0_rtx)
	return NULL_RTX;
    }

into the “else” arm, since it seems odd to be checking cmp_a and cmp_b
when we're not going to use them.  Looks like the later call to
emit_conditional_move should get the same treatment.

> +
>    if (target)
>      return target;
>  
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 62a6bdb4c59..6bf486b9b50 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -52,6 +52,8 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
>  static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
>  static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
>  
> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
> +
>  /* Debug facility for use in GDB.  */
>  void debug_optab_libfuncs (void);
>  \f
> @@ -4747,7 +4749,6 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>  		       machine_mode mode, int unsignedp)
>  {
>    rtx comparison;
> -  rtx_insn *last;
>    enum insn_code icode;
>    enum rtx_code reversed;
>  
> @@ -4774,6 +4775,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    /* get_condition will prefer to generate LT and GT even if the old
>       comparison was against zero, so undo that canonicalization here since
>       comparisons against zero are cheaper.  */
> +
>    if (code == LT && op1 == const1_rtx)
>      code = LE, op1 = const0_rtx;
>    else if (code == GT && op1 == constm1_rtx)
> @@ -4782,17 +4784,29 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    if (cmode == VOIDmode)
>      cmode = GET_MODE (op0);
>  
> -  enum rtx_code orig_code = code;
> +  /* If the first source operand is constant and the second is not, swap
> +     it into the second.  In that case we also need to reverse the
> +     comparison.  It is possible, though, that the conditional move
> +     will not expand with operands in this order, so we might also need
> +     to revert to the original comparison and operand order.  */

Why's that the case though?  The swapped form is the canonical one,
so it's the one that the target ought to accept.

Thanks,
Richard

> +
> +  rtx rev_comparison = NULL_RTX;
>    bool swapped = false;
> -  if (swap_commutative_operands_p (op2, op3)
> -      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
> -          != UNKNOWN))
> +
> +  code = unsignedp ? unsigned_condition (code) : code;
> +  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
> +
> +  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
> +      != UNKNOWN)
>      {
> -      std::swap (op2, op3);
> -      code = reversed;
> -      swapped = true;
> +      reversed = unsignedp ? unsigned_condition (reversed) : reversed;
> +      rev_comparison = simplify_gen_relational (reversed, VOIDmode, cmode,
> +						op0, op1);
>      }
>  
> +  if (swap_commutative_operands_p (op2, op3) && reversed != UNKNOWN)
> +    swapped = true;
> +
>    if (mode == VOIDmode)
>      mode = GET_MODE (op2);
>  
> @@ -4804,58 +4818,99 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    if (!target)
>      target = gen_reg_rtx (mode);
>  
> -  for (int pass = 0; ; pass++)
> +  if (comparison && COMPARISON_P (comparison))
> +    prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> +		      GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
> +		      &comparison, &cmode);
> +  else
> +    return NULL_RTX;
> +
> +  if (rev_comparison && COMPARISON_P (rev_comparison))
> +    prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
> +		      GET_CODE (rev_comparison), NULL_RTX,
> +		      unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);
> +
> +  if (!swapped)
> +    return emit_conditional_move (target, comparison, rev_comparison,
> +				  op2, op3, mode);
> +  else
> +    return emit_conditional_move (target, rev_comparison, comparison,
> +				  op3, op2, mode);
> +}
> +
> +/* Helper function for emitting a conditional move.  Given a COMPARISON
> +   and a reversed REV_COMPARISON it will try to expand a conditional move
> +   with COMPARISON first and try with REV_COMPARISON if that fails.  */
> +
> +rtx
> +emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
> +		       rtx op2, rtx op3, machine_mode mode)
> +{
> +
> +  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
> +
> +  if (res != NULL_RTX)
> +    return res;
> +
> +  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
> +}
> +
> +/* Helper for emitting a conditional move.  */
> +
> +static rtx
> +emit_conditional_move (rtx target, rtx comparison,
> +		       rtx op2, rtx op3, machine_mode mode)
> +{
> +  rtx_insn *last;
> +  enum insn_code icode;
> +
> +  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> +    return NULL_RTX;
> +
> +  /* If the two source operands are identical, that's just a move.  */
> +  if (rtx_equal_p (op2, op3))
>      {
> -      code = unsignedp ? unsigned_condition (code) : code;
> -      comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
> +      if (!target)
> +	target = gen_reg_rtx (mode);
>  
> -      /* We can get const0_rtx or const_true_rtx in some circumstances.  Just
> -	 punt and let the caller figure out how best to deal with this
> -	 situation.  */
> -      if (COMPARISON_P (comparison))
> -	{
> -	  saved_pending_stack_adjust save;
> -	  save_pending_stack_adjust (&save);
> -	  last = get_last_insn ();
> -	  do_pending_stack_adjust ();
> -	  machine_mode cmpmode = cmode;
> -	  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> -			    GET_CODE (comparison), NULL_RTX, unsignedp,
> -			    OPTAB_WIDEN, &comparison, &cmpmode);
> -	  if (comparison)
> -	    {
> -	      class expand_operand ops[4];
> +      emit_move_insn (target, op3);
> +      return target;
> +    }
>  
> -	      create_output_operand (&ops[0], target, mode);
> -	      create_fixed_operand (&ops[1], comparison);
> -	      create_input_operand (&ops[2], op2, mode);
> -	      create_input_operand (&ops[3], op3, mode);
> -	      if (maybe_expand_insn (icode, 4, ops))
> -		{
> -		  if (ops[0].value != target)
> -		    convert_move (target, ops[0].value, false);
> -		  return target;
> -		}
> -	    }
> -	  delete_insns_since (last);
> -	  restore_pending_stack_adjust (&save);
> -	}
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op2);
>  
> -      if (pass == 1)
> -	return NULL_RTX;
> +  icode = direct_optab_handler (movcc_optab, mode);
>  
> -      /* If the preferred op2/op3 order is not usable, retry with other
> -	 operand order, perhaps it will expand successfully.  */
> -      if (swapped)
> -	code = orig_code;
> -      else if ((reversed = reversed_comparison_code_parts (orig_code, op0, op1,
> -							   NULL))
> -	       != UNKNOWN)
> -	code = reversed;
> -      else
> -	return NULL_RTX;
> -      std::swap (op2, op3);
> +  if (icode == CODE_FOR_nothing)
> +    return NULL_RTX;
> +
> +  if (!target)
> +    target = gen_reg_rtx (mode);
> +
> +  saved_pending_stack_adjust save;
> +  save_pending_stack_adjust (&save);
> +  last = get_last_insn ();
> +  do_pending_stack_adjust ();
> +
> +  class expand_operand ops[4];
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_fixed_operand (&ops[1], comparison);
> +  create_input_operand (&ops[2], op2, mode);
> +  create_input_operand (&ops[3], op3, mode);
> +
> +  if (maybe_expand_insn (icode, 4, ops))
> +    {
> +      if (ops[0].value != target)
> +	convert_move (target, ops[0].value, false);
> +      return target;
>      }
> +
> +  delete_insns_since (last);
> +  restore_pending_stack_adjust (&save);
> +
> +  return NULL_RTX;
>  }
>  
>  
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 3bbceff92d9..f853b93f37f 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
>  /* Emit a conditional move operation.  */
>  rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
>  			   rtx, rtx, machine_mode, int);
> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
>  
>  /* Emit a conditional negate or bitwise complement operation.  */
>  rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-07-15 20:10   ` Richard Sandiford
@ 2021-07-22 12:06     ` Robin Dapp
  2021-07-26 19:08       ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-07-22 12:06 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

Hi Richard,

thanks for going through the patch set.

> A hash_set might be simpler, given that the code only enters insns
> for which the bool is false.  “rtx_insn *” would be better than rtx.

Right, changed that.

> Do you mean the sets might be removed or that the checks might be
> removed?

It's actually the checks that are removed.  I clarified and amended the 
comments.

> The patch is quite hard to review on its own, since nothing actually
> uses this variable.  It's also not obvious how the
> reg_overlap_mentioned_p code works if the old target is referenced
> later.
> 
> Could you refactor the series a bit so that each patch is
> self-contained? It's OK if that means fewer patches.
The attached v2 makes use of need_cmov now, I hope this makes it a bit 
clearer.  Regarding the reg_overlap_mentioned_p: it is used to detect 
the swap idioms as well as when a cmov destination is used in the 
condition as well.  If needed this could be two separate patches (most 
of the second patch would be comments, though).

Regards
  Robin

[-- Attachment #2: v2-0001-ifcvt-Check-if-cmovs-are-needed.patch --]
[-- Type: text/x-patch, Size: 6311 bytes --]

From bf7e372d7f48e614f20676c7cf4b2fbde741d4fc Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@linux.ibm.com>
Date: Thu, 24 Jun 2021 16:40:04 +0200
Subject: [PATCH v2 1/7] ifcvt: Check if cmovs are needed.

When if-converting multiple SETs and we encounter a swap-style idiom

  if (a > b)
    {
      tmp = c;   // [1]
      c = d;
      d = tmp;
    }

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
---
 gcc/ifcvt.c | 138 ++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 118 insertions(+), 20 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..a5e55456d88 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
 			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void check_need_cmovs (basic_block, hash_set<rtx_insn *> *);
 \f
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec<rtx_insn *> unmodified_insns;
   int count = 0;
 
+  hash_set<rtx_insn *> need_no_cmov;
+
+  check_need_cmovs (then_bb, &need_no_cmov);
+
   FOR_BB_INSNS (then_bb, insn)
     {
       /* Skip over non-insns.  */
@@ -3213,26 +3218,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
       gcc_checking_assert (set);
 
       rtx target = SET_DEST (set);
-      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx temp;
       rtx new_val = SET_SRC (set);
       rtx old_val = target;
 
-      /* If we were supposed to read from an earlier write in this block,
-	 we've changed the register allocation.  Rewire the read.  While
-	 we are looking, also try to catch a swap idiom.  */
-      for (int i = count - 1; i >= 0; --i)
-	if (reg_overlap_mentioned_p (new_val, targets[i]))
-	  {
-	    /* Catch a "swap" style idiom.  */
-	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
-	      /* The write to targets[i] is only live until the read
-		 here.  As the condition codes match, we can propagate
-		 the set to here.  */
-	      new_val = SET_SRC (single_set (unmodified_insns[i]));
-	    else
-	      new_val = temporaries[i];
-	    break;
-	  }
+      /* As we are transforming
+	 if (x > y)
+	   {
+	     a = b;
+	     c = d;
+	   }
+	 into
+	   a = (x > y) ...
+	   c = (x > y) ...
+
+	 we potentially check x > y before every set.
+	 Even though the check might be removed by subsequent passes, this means
+	 that we cannot transform
+	   if (x > y)
+	     {
+	       x = y;
+	       ...
+	     }
+	 into
+	   x = (x > y) ...
+	   ...
+	 since this would invalidate x and the following to-be-removed checks.
+	 Therefore we introduce a temporary every time we are about to
+	 overwrite a variable used in the check.  Costing of a sequence with
+	 these is going to be inaccurate so only use temporaries when
+	 needed.  */
+      if (reg_overlap_mentioned_p (target, cond))
+	temp = gen_reg_rtx (GET_MODE (target));
+      else
+	temp = target;
+
+      /* We have identified swap-style idioms in check_need_cmovs.  A normal
+	 set will need to be a cmov while the first instruction of a swap-style
+	 idiom can be a regular move.  This helps with costing.  */
+      bool need_cmov = true;
+      if (need_no_cmov.contains (insn))
+	need_cmov = false;
 
       /* If we had a non-canonical conditional jump (i.e. one where
 	 the fallthrough is to the "else" case) we need to reverse
@@ -3275,9 +3301,22 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
 	}
 
-      /* Actually emit the conditional move.  */
-      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-				       x, y, new_val, old_val);
+      rtx temp_dest = NULL_RTX;
+
+      if (need_cmov)
+	{
+	  /* Actually emit the conditional move.  */
+	  temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+					   x, y, new_val, old_val);
+	}
+      else
+	{
+	  if (if_info->then_else_reverse)
+	    noce_emit_move_insn (temp, old_val);
+	  else
+	    noce_emit_move_insn (temp, new_val);
+	  temp_dest = temp;
+	}
 
       /* If we failed to expand the conditional move, drop out and don't
 	 try to continue.  */
@@ -3808,6 +3847,65 @@ check_cond_move_block (basic_block bb,
   return TRUE;
 }
 
+/* Find local swap-style idioms in BB and mark the first insn (1)
+   that is only a temporary as not needing a conditional move as
+   it is going to be dead afterwards anyway.
+
+     (1) int tmp = a;
+	 a = b;
+	 b = tmp;
+
+	 ifcvt
+	 -->
+
+	 load tmp,a
+	 cmov a,b
+	 cmov b,tmp   */
+
+static void
+check_need_cmovs (basic_block bb, hash_set<rtx_insn *> *need_no_cmov)
+{
+  rtx_insn *insn;
+  int count = 0;
+  auto_vec<rtx_insn *> insns;
+  auto_vec<rtx> dests;
+
+  /* Iterate over all SETs, storing the destinations
+     in DEST.  If we hit a SET that reads from a destination
+     that we have seen before and the corresponding register
+     is dead afterwards, it must be a swap.  */
+  FOR_BB_INSNS (bb, insn)
+    {
+      rtx set, src, dest;
+
+      if (!active_insn_p (insn))
+	continue;
+
+      set = single_set (insn);
+      if (set == NULL_RTX)
+	continue;
+
+      src = SET_SRC (set);
+      dest = SET_DEST (set);
+
+      /* Check if the current SET's source is the same
+	 as any previously seen destination.
+	 This is quadratic but the number of insns in BB
+	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */
+      for (int i = count - 1; i >= 0; --i)
+	{
+	  if (reg_overlap_mentioned_p (src, dests[i])
+	      && find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
+	    need_no_cmov->add (insns[i]);
+	}
+
+      insns.safe_push (insn);
+      dests.safe_push (dest);
+
+      count++;
+    }
+}
+
 /* Given a basic block BB suitable for conditional move conversion,
    a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
    the register values depending on COND, emit the insns in the block as
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.
  2021-07-15 20:42   ` Richard Sandiford
@ 2021-07-22 12:07     ` Robin Dapp
  2021-07-26 19:10       ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-07-22 12:07 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

> It looks like this is an existing (potential) problem,
> but default_noce_conversion_profitable_p uses seq_cost, which in turn
> uses insn_cost.  And insn_cost has an optional target hook behind it,
> which allows for costing based on insn attributes etc.  For a true
> apples-with-apples comparison we should use insn_cost here too.

Good point, fixed that.

> I think the detail that COSTS_N_INSNS (2) is the default is useful here.
> (In other words, I'd forgotten by the time I'd poked around other bits of
> ifcvt and was about to ask why we didn't cost the condition “properly”.)
> So how about something like:
> 
>     The original costs already include a base cost of COSTS_N_INSNS (2):
>     one instruction for the compare (which we will be needing either way)
>     and one instruction for the branch.

Yes, this is much clearer. I went with that wording in the attached v2.

Regards
  Robin

[-- Attachment #2: v2-0003-ifcvt-Improve-costs-handling-for-noce_convert_mul.patch --]
[-- Type: text/x-patch, Size: 3850 bytes --]

From 54508fa00fbee082fa62f4e1a8167f477938e781 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@linux.ibm.com>
Date: Wed, 27 Nov 2019 13:46:17 +0100
Subject: [PATCH v2 3/7] ifcvt: Improve costs handling for
 noce_convert_multiple.

When noce_convert_multiple is called the original costs are not yet
initialized.  Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2).  This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.

This patch temporarily initializes the original costs by counting
adding costs for all sets inside the then_bb.
---
 gcc/ifcvt.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f9d4478ec18..91b54dbea9c 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3404,14 +3404,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
    (SET (REG) (REG)) insns suitable for conversion to a series
    of conditional moves.  Also check that we have more than one set
    (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+   through the insns store the sum of their potential costs in COST.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = param_max_rtl_if_conversion_insns;
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+  unsigned potential_cost = 0;
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3447,9 +3450,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
       if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
+      potential_cost += insn_cost (insn, speed_p);
+
       count++;
     }
 
+  *cost += potential_cost;
+
   /* If we would only put out one conditional move, the other strategies
      this pass tries are better optimized and will be more appropriate.
      Some targets want to strictly limit the number of conditional moves
@@ -3497,11 +3504,24 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  */
+  /* First look for multiple SETS.  The original costs already include
+     a base cost of COSTS_N_INSNS (2): one instruction for the compare
+     (which we will be needing either way) and one instruction for the
+     branch.  When comparing costs we want to use the branch instruction
+     cost and the sets vs. the cmovs generated here.  Therefore subtract
+     the costs of the compare before checking.
+     ??? Actually, instead of the branch instruction costs we might want
+     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+
+  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
-      && bb_ok_for_noce_convert_multiple_sets (then_bb))
+      && bb_ok_for_noce_convert_multiple_sets (then_bb, &potential_cost))
     {
+      /* Temporarily set the original costs to what we estimated so
+	 we can determine if the transformation is worth it.  */
+      if_info->original_cost = potential_cost;
       if (noce_convert_multiple_sets (if_info))
 	{
 	  if (dump_file && if_info->transform_name)
@@ -3509,6 +3529,9 @@ noce_process_if_block (struct noce_if_info *if_info)
 		     if_info->transform_name);
 	  return TRUE;
 	}
+
+      /* Restore the original costs.  */
+      if_info->original_cost = old_cost;
     }
 
   bool speed_p = optimize_bb_for_speed_p (test_bb);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-07-15 20:54   ` Richard Sandiford
@ 2021-07-22 12:07     ` Robin Dapp
  2021-07-26 19:31       ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-07-22 12:07 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

> It might make sense to move:
> 
>    /* Don't even try if the comparison operands are weird
>       except that the target supports cbranchcc4.  */
>    if (! general_operand (cmp_a, GET_MODE (cmp_a))
>        || ! general_operand (cmp_b, GET_MODE (cmp_b)))
>      {
>        if (!have_cbranchcc4
> 	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
> 	  || cmp_b != const0_rtx)
> 	return NULL_RTX;
>      }
> 
> into the “else” arm, since it seems odd to be checking cmp_a and cmp_b
> when we're not going to use them.  Looks like the later call to
> emit_conditional_move should get the same treatment.

Moved it.

> Why's that the case though?  The swapped form is the canonical one,
> so it's the one that the target ought to accept.

My wording was a bit misleading in the comment and I tried to improve it 
in the attached v2.

Before, we had a loop with two iterations that tried "emit_cmov (cond, 
op2, op3)".  op2 and op3 can (or even were) If this did not succeed we 
would revert the condition as well as op3 and op3 in-place and try 
again.  I found that a bit cumbersome and intended to make this explicit 
but it's still kind of involved, particularly since cond may come in 
reversed, we additionally swap op2 and op3 and all the way back again.

I remember from the first iteration of these patches that this (double 
revert) was needed for exactly one test case in the test suite on x86. 
When re-running it today I could not reproduce it anymore, though.

Regards
  Robin

[-- Attachment #2: v2-0004-ifcvt-optabs-Allow-using-a-CC-comparison-for-emit.patch --]
[-- Type: text/x-patch, Size: 10888 bytes --]

From 0526fa57290941c8b7febd138ab4c637b83f3d05 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@linux.ibm.com>
Date: Wed, 27 Nov 2019 13:53:40 +0100
Subject: [PATCH v2 4/7] ifcvt/optabs: Allow using a CC comparison for
 emit_conditional_move.

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.
---
 gcc/ifcvt.c  |  40 ++++++++-----
 gcc/optabs.c | 164 ++++++++++++++++++++++++++++++++++-----------------
 gcc/optabs.h |   1 +
 3 files changed, 135 insertions(+), 70 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 91b54dbea9c..7ce60d0af8d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -771,7 +771,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-			    rtx, rtx, rtx);
+			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1710,7 +1710,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+		 rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1742,23 +1743,30 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
       end_sequence ();
     }
 
-  /* Don't even try if the comparison operands are weird
-     except that the target supports cbranchcc4.  */
-  if (! general_operand (cmp_a, GET_MODE (cmp_a))
-      || ! general_operand (cmp_b, GET_MODE (cmp_b)))
-    {
-      if (!have_cbranchcc4
-	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
-	  || cmp_b != const0_rtx)
-	return NULL_RTX;
-    }
-
   unsignedp = (code == LTU || code == GEU
 	       || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
-				  vtrue, vfalse, GET_MODE (x),
-				  unsignedp);
+  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
+    target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
+				    vtrue, vfalse, GET_MODE (x));
+  else
+    {
+      /* Don't even try if the comparison operands are weird
+	 except that the target supports cbranchcc4.  */
+      if (! general_operand (cmp_a, GET_MODE (cmp_a))
+	  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
+	{
+	  if (!have_cbranchcc4
+	      || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
+	      || cmp_b != const0_rtx)
+	    return NULL_RTX;
+	}
+
+      target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
+				      vtrue, vfalse, GET_MODE (x),
+				      unsignedp);
+    }
+
   if (target)
     return target;
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59..7410f23e0ad 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -52,6 +52,8 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
 static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
 static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
 
+static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
+
 /* Debug facility for use in GDB.  */
 void debug_optab_libfuncs (void);
 \f
@@ -4747,7 +4749,6 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
 		       machine_mode mode, int unsignedp)
 {
   rtx comparison;
-  rtx_insn *last;
   enum insn_code icode;
   enum rtx_code reversed;
 
@@ -4774,6 +4775,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   /* get_condition will prefer to generate LT and GT even if the old
      comparison was against zero, so undo that canonicalization here since
      comparisons against zero are cheaper.  */
+
   if (code == LT && op1 == const1_rtx)
     code = LE, op1 = const0_rtx;
   else if (code == GT && op1 == constm1_rtx)
@@ -4782,17 +4784,30 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (cmode == VOIDmode)
     cmode = GET_MODE (op0);
 
-  enum rtx_code orig_code = code;
+  /* If the first condition operand is constant and the second is not, swap
+     them.  In that case we also need to reverse the comparison.
+     If both are non-constant It is possible that the conditional move
+     will not expand with operands in this order, e.g. when we are about to
+     create a min/max expression and the operands do not match the condition.
+     In that case we also need the reversed condition code and comparison.  */
+
+  rtx rev_comparison = NULL_RTX;
   bool swapped = false;
-  if (swap_commutative_operands_p (op2, op3)
-      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
-          != UNKNOWN))
+
+  code = unsignedp ? unsigned_condition (code) : code;
+  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
+
+  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
+      != UNKNOWN)
     {
-      std::swap (op2, op3);
-      code = reversed;
-      swapped = true;
+      reversed = unsignedp ? unsigned_condition (reversed) : reversed;
+      rev_comparison = simplify_gen_relational (reversed, VOIDmode, cmode,
+						op0, op1);
     }
 
+  if (swap_commutative_operands_p (op2, op3) && reversed != UNKNOWN)
+    swapped = true;
+
   if (mode == VOIDmode)
     mode = GET_MODE (op2);
 
@@ -4804,58 +4819,99 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (!target)
     target = gen_reg_rtx (mode);
 
-  for (int pass = 0; ; pass++)
+  if (comparison && COMPARISON_P (comparison))
+    prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
+		      GET_CODE (comparison), NULL_RTX,
+		      unsignedp, OPTAB_WIDEN, &comparison, &cmode);
+  else
+    return NULL_RTX;
+
+  if (rev_comparison && COMPARISON_P (rev_comparison))
+    prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
+		      GET_CODE (rev_comparison), NULL_RTX,
+		      unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);
+
+  if (!swapped)
+    return emit_conditional_move (target, comparison, rev_comparison,
+				  op2, op3, mode);
+  else
+    return emit_conditional_move (target, rev_comparison, comparison,
+				  op3, op2, mode);
+}
+
+/* Helper function for emitting a conditional move.  Given a COMPARISON
+   and a reversed REV_COMPARISON it will try to expand a conditional move
+   with COMPARISON first and try with REV_COMPARISON if that fails.  */
+
+rtx
+emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+
+  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
+
+  if (res != NULL_RTX)
+    return res;
+
+  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
+}
+
+/* Helper for emitting a conditional move.  */
+
+static rtx
+emit_conditional_move (rtx target, rtx comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+  rtx_insn *last;
+  enum insn_code icode;
+
+  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
+    return NULL_RTX;
+
+  /* If the two source operands are identical, that's just a move.  */
+  if (rtx_equal_p (op2, op3))
     {
-      code = unsignedp ? unsigned_condition (code) : code;
-      comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
+      if (!target)
+	target = gen_reg_rtx (mode);
 
-      /* We can get const0_rtx or const_true_rtx in some circumstances.  Just
-	 punt and let the caller figure out how best to deal with this
-	 situation.  */
-      if (COMPARISON_P (comparison))
-	{
-	  saved_pending_stack_adjust save;
-	  save_pending_stack_adjust (&save);
-	  last = get_last_insn ();
-	  do_pending_stack_adjust ();
-	  machine_mode cmpmode = cmode;
-	  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
-			    GET_CODE (comparison), NULL_RTX, unsignedp,
-			    OPTAB_WIDEN, &comparison, &cmpmode);
-	  if (comparison)
-	    {
-	      class expand_operand ops[4];
+      emit_move_insn (target, op3);
+      return target;
+    }
 
-	      create_output_operand (&ops[0], target, mode);
-	      create_fixed_operand (&ops[1], comparison);
-	      create_input_operand (&ops[2], op2, mode);
-	      create_input_operand (&ops[3], op3, mode);
-	      if (maybe_expand_insn (icode, 4, ops))
-		{
-		  if (ops[0].value != target)
-		    convert_move (target, ops[0].value, false);
-		  return target;
-		}
-	    }
-	  delete_insns_since (last);
-	  restore_pending_stack_adjust (&save);
-	}
+  if (mode == VOIDmode)
+    mode = GET_MODE (op2);
 
-      if (pass == 1)
-	return NULL_RTX;
+  icode = direct_optab_handler (movcc_optab, mode);
 
-      /* If the preferred op2/op3 order is not usable, retry with other
-	 operand order, perhaps it will expand successfully.  */
-      if (swapped)
-	code = orig_code;
-      else if ((reversed = reversed_comparison_code_parts (orig_code, op0, op1,
-							   NULL))
-	       != UNKNOWN)
-	code = reversed;
-      else
-	return NULL_RTX;
-      std::swap (op2, op3);
+  if (icode == CODE_FOR_nothing)
+    return NULL_RTX;
+
+  if (!target)
+    target = gen_reg_rtx (mode);
+
+  saved_pending_stack_adjust save;
+  save_pending_stack_adjust (&save);
+  last = get_last_insn ();
+  do_pending_stack_adjust ();
+
+  class expand_operand ops[4];
+
+  create_output_operand (&ops[0], target, mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_input_operand (&ops[2], op2, mode);
+  create_input_operand (&ops[3], op3, mode);
+
+  if (maybe_expand_insn (icode, 4, ops))
+    {
+      if (ops[0].value != target)
+	convert_move (target, ops[0].value, false);
+      return target;
     }
+
+  delete_insns_since (last);
+  restore_pending_stack_adjust (&save);
+
+  return NULL_RTX;
 }
 
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 3bbceff92d9..f853b93f37f 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
 /* Emit a conditional move operation.  */
 rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
 			   rtx, rtx, machine_mode, int);
+rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
 
 /* Emit a conditional negate or bitwise complement operation.  */
 rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] ifcvt: Try re-using CC for conditional moves.
  2021-06-25 16:09 ` [PATCH 5/7] ifcvt: Try re-using CC for conditional moves Robin Dapp
@ 2021-07-22 12:12   ` Robin Dapp
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2021-07-22 12:12 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 151 bytes --]

Hi,

v2 now properly gets the reversed CC comparison. It also handles a cost2 
== 0 situation that would prefer an empty seq2 before.

Regards
  Robin

[-- Attachment #2: v2-0005-ifcvt-Try-re-using-CC-for-conditional-moves.patch --]
[-- Type: text/x-patch, Size: 6171 bytes --]

From 12b796c4e081ba8a2e136958f4bf919c63516de6 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@linux.ibm.com>
Date: Thu, 24 Jun 2021 15:22:42 +0200
Subject: [PATCH v2 5/7] ifcvt: Try re-using CC for conditional moves.

Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences:  The same one as before
and a second one that tries to re-use the existing CC.  Then their costs
are compared and the cheaper one is selected.
---
 gcc/ifcvt.c | 108 ++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 87 insertions(+), 21 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 7ce60d0af8d..a82b41d9e4a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -83,7 +83,7 @@ static rtx_insn *last_active_insn (basic_block, int);
 static rtx_insn *find_active_insn_before (basic_block, rtx_insn *);
 static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx cond_exec_get_condition (rtx_insn *, bool);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
 static int noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
@@ -426,7 +426,7 @@ cond_exec_process_insns (ce_if_block *ce_info ATTRIBUTE_UNUSED,
 /* Return the condition for a jump.  Do not do any special processing.  */
 
 static rtx
-cond_exec_get_condition (rtx_insn *jump)
+cond_exec_get_condition (rtx_insn *jump, bool get_reversed = false)
 {
   rtx test_if, cond;
 
@@ -438,8 +438,9 @@ cond_exec_get_condition (rtx_insn *jump)
 
   /* If this branches to JUMP_LABEL when the condition is false,
      reverse the condition.  */
-  if (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
-      && label_ref_label (XEXP (test_if, 2)) == JUMP_LABEL (jump))
+  if (get_reversed || (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
+		       && label_ref_label (XEXP (test_if, 2))
+		       == JUMP_LABEL (jump)))
     {
       enum rtx_code rev = reversed_comparison_code (cond, jump);
       if (rev == UNKNOWN)
@@ -3144,6 +3145,46 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* Helper function to emit a cmov sequence.  */
+
+static rtx_insn*
+try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
+		    rtx cond, rtx new_val, rtx old_val, bool need_cmov,
+		    unsigned *cost, rtx *temp_dest,
+		    rtx cc_cmp = NULL, rtx rev_cc_cmp = NULL)
+{
+  rtx_insn *seq = NULL;
+  *cost = 0;
+
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  start_sequence ();
+
+  if (need_cmov)
+    *temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				 x, y, new_val, old_val, cc_cmp, rev_cc_cmp);
+  else
+    {
+      *temp_dest = temp;
+      if (if_info->then_else_reversed)
+	noce_emit_move_insn (temp, old_val);
+      else
+	noce_emit_move_insn (temp, new_val);
+    }
+
+  if (*temp_dest != NULL_RTX)
+    {
+      seq = get_insns ();
+      *cost = seq_cost (seq, if_info->speed_p);
+    }
+
+  end_sequence ();
+
+  return seq;
+}
+
 /* We have something like:
 
      if (x > y)
@@ -3201,7 +3242,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx cond = noce_get_condition (jump, &cond_earliest, false);
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
-  rtx_code cond_code = GET_CODE (cond);
+
+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = cond_exec_get_condition (jump, true);
 
   /* The true targets for a conditional move.  */
   auto_vec<rtx> targets;
@@ -3312,31 +3355,53 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
 	}
 
-      rtx temp_dest = NULL_RTX;
+      /* Try emitting a conditional move passing the backend the
+	 canonicalized comparison.  The backend is then able to
+	 recognize expressions like
 
-      if (need_cmov)
+	   if (x > y)
+	     y = x;
+
+	 as min/max and emit an insn, accordingly.
+	 We will still emit a superfluous CC comparison before the
+	 min/max, though, which complicates costing.  */
+      unsigned cost1 = 0, cost2 = 0;
+      rtx_insn *seq, *seq1, *seq2;
+      rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+
+      seq1 = try_emit_cmove_seq (if_info, temp, cond,
+				 new_val, old_val, need_cmov,
+				 &cost1, &temp_dest1);
+
+      /* Here, we try to pass the backend a non-canonicalized cc comparison
+	 as well.  This allows the backend to emit a cmov directly without
+	 creating an additional compare for each.  If successful, costing
+	 is easier and this sequence is usually preferred.  */
+      seq2 = try_emit_cmove_seq (if_info, target, cond,
+				 new_val, old_val, need_cmov,
+				 &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+
+      /* Check which version is less expensive.  */
+      if (seq1 != NULL_RTX && (cost1 <= cost2 || seq2 == NULL_RTX))
 	{
-	  /* Actually emit the conditional move.  */
-	  temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-					   x, y, new_val, old_val);
+	  seq = seq1;
+	  temp_dest = temp_dest1;
 	}
-      else
+      else if (seq2 != NULL_RTX)
 	{
-	  if (if_info->then_else_reverse)
-	    noce_emit_move_insn (temp, old_val);
-	  else
-	    noce_emit_move_insn (temp, new_val);
-	  temp_dest = temp;
+	  seq = seq2;
+	  temp_dest = temp_dest2;
 	}
-
-      /* If we failed to expand the conditional move, drop out and don't
-	 try to continue.  */
-      if (temp_dest == NULL_RTX)
+      else
 	{
+	  /* Nothing worked, bail out.  */
 	  end_sequence ();
 	  return FALSE;
 	}
 
+      /* End the sub sequence and emit to the main sequence.  */
+      emit_insn (seq);
+
       /* Bookkeeping.  */
       count++;
       targets.safe_push (target);
@@ -3350,7 +3415,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   /* Now fixup the assignments.  */
   for (int i = 0; i < count; i++)
-    noce_emit_move_insn (targets[i], temporaries[i]);
+    if (targets[i] != temporaries[i])
+      noce_emit_move_insn (targets[i], temporaries[i]);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-07-22 12:06     ` Robin Dapp
@ 2021-07-26 19:08       ` Richard Sandiford
  2021-09-15  8:39         ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-07-26 19:08 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
> thanks for going through the patch set.
>
>> A hash_set might be simpler, given that the code only enters insns
>> for which the bool is false.  “rtx_insn *” would be better than rtx.
>
> Right, changed that.
>
>> Do you mean the sets might be removed or that the checks might be
>> removed?
>
> It's actually the checks that are removed.  I clarified and amended the 
> comments.
>
>> The patch is quite hard to review on its own, since nothing actually
>> uses this variable.  It's also not obvious how the
>> reg_overlap_mentioned_p code works if the old target is referenced
>> later.
>> 
>> Could you refactor the series a bit so that each patch is
>> self-contained? It's OK if that means fewer patches.
> The attached v2 makes use of need_cmov now, I hope this makes it a bit 
> clearer.  Regarding the reg_overlap_mentioned_p: it is used to detect 
> the swap idioms as well as when a cmov destination is used in the 
> condition as well.  If needed this could be two separate patches (most 
> of the second patch would be comments, though).

Thanks for the update.  No need for further splitting, this looks like a
nice self-contained patch as it is.

> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 017944f4f79..a5e55456d88 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
>  			       edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void check_need_cmovs (basic_block, hash_set<rtx_insn *> *);
>  \f
>  /* Count the number of non-jump active insns in BB.  */
>  
> @@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>    auto_vec<rtx_insn *> unmodified_insns;
>    int count = 0;
>  
> +  hash_set<rtx_insn *> need_no_cmov;
> +
> +  check_need_cmovs (then_bb, &need_no_cmov);
> +
>    FOR_BB_INSNS (then_bb, insn)
>      {
>        /* Skip over non-insns.  */
> @@ -3213,26 +3218,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>        gcc_checking_assert (set);
>  
>        rtx target = SET_DEST (set);
> -      rtx temp = gen_reg_rtx (GET_MODE (target));
> +      rtx temp;
>        rtx new_val = SET_SRC (set);
>        rtx old_val = target;
>  
> -      /* If we were supposed to read from an earlier write in this block,
> -	 we've changed the register allocation.  Rewire the read.  While
> -	 we are looking, also try to catch a swap idiom.  */
> -      for (int i = count - 1; i >= 0; --i)
> -	if (reg_overlap_mentioned_p (new_val, targets[i]))
> -	  {
> -	    /* Catch a "swap" style idiom.  */
> -	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> -	      /* The write to targets[i] is only live until the read
> -		 here.  As the condition codes match, we can propagate
> -		 the set to here.  */
> -	      new_val = SET_SRC (single_set (unmodified_insns[i]));
> -	    else
> -	      new_val = temporaries[i];
> -	    break;
> -	  }

Don't we still need this code (without the REG_DEAD handling) for the
case in which…

> +      /* As we are transforming
> +	 if (x > y)
> +	   {
> +	     a = b;
> +	     c = d;
> +	   }
> +	 into
> +	   a = (x > y) ...
> +	   c = (x > y) ...
> +
> +	 we potentially check x > y before every set.
> +	 Even though the check might be removed by subsequent passes, this means
> +	 that we cannot transform
> +	   if (x > y)
> +	     {
> +	       x = y;
> +	       ...
> +	     }
> +	 into
> +	   x = (x > y) ...
> +	   ...
> +	 since this would invalidate x and the following to-be-removed checks.
> +	 Therefore we introduce a temporary every time we are about to
> +	 overwrite a variable used in the check.  Costing of a sequence with
> +	 these is going to be inaccurate so only use temporaries when
> +	 needed.  */
> +      if (reg_overlap_mentioned_p (target, cond))
> +	temp = gen_reg_rtx (GET_MODE (target));

…this code triggers?  I don't see otherwise how later uses of x would
pick up “temp” instead of the original target.  E.g. suppose we had:

    if (x > y)
      {
        x = …;
        z = x; // x does not die here
      }

Without the loop, it looks like z would pick up the old value of x
(used in the comparison) instead of the new one.

> +      else
> +	temp = target;
> +
> +      /* We have identified swap-style idioms in check_need_cmovs.  A normal
> +	 set will need to be a cmov while the first instruction of a swap-style
> +	 idiom can be a regular move.  This helps with costing.  */
> +      bool need_cmov = true;
> +      if (need_no_cmov.contains (insn))
> +	need_cmov = false;

Would be simpler as:

      bool need_cmov = !need_no_cmov.contains (insn);
>  
>        /* If we had a non-canonical conditional jump (i.e. one where
>  	 the fallthrough is to the "else" case) we need to reverse
> @@ -3275,9 +3301,22 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>  	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
>  	}
>  
> -      /* Actually emit the conditional move.  */
> -      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
> -				       x, y, new_val, old_val);
> +      rtx temp_dest = NULL_RTX;
> +
> +      if (need_cmov)
> +	{
> +	  /* Actually emit the conditional move.  */
> +	  temp_dest = noce_emit_cmove (if_info, temp, cond_code,
> +					   x, y, new_val, old_val);
> +	}
> +      else
> +	{
> +	  if (if_info->then_else_reverse)
> +	    noce_emit_move_insn (temp, old_val);
> +	  else
> +	    noce_emit_move_insn (temp, new_val);
> +	  temp_dest = temp;
> +	}
>  
>        /* If we failed to expand the conditional move, drop out and don't
>  	 try to continue.  */

I think this comment and the associated null check belong in the
“if (need_cmov)”

> @@ -3808,6 +3847,65 @@ check_cond_move_block (basic_block bb,
>    return TRUE;
>  }
>  
> +/* Find local swap-style idioms in BB and mark the first insn (1)
> +   that is only a temporary as not needing a conditional move as
> +   it is going to be dead afterwards anyway.
> +
> +     (1) int tmp = a;
> +	 a = b;
> +	 b = tmp;
> +
> +	 ifcvt
> +	 -->
> +
> +	 load tmp,a
> +	 cmov a,b
> +	 cmov b,tmp   */
> +
> +static void
> +check_need_cmovs (basic_block bb, hash_set<rtx_insn *> *need_no_cmov)
> +{
> +  rtx_insn *insn;
> +  int count = 0;
> +  auto_vec<rtx_insn *> insns;
> +  auto_vec<rtx> dests;
> +
> +  /* Iterate over all SETs, storing the destinations
> +     in DEST.  If we hit a SET that reads from a destination
> +     that we have seen before and the corresponding register
> +     is dead afterwards, it must be a swap.  */

This is probably pedantic, but I guess it could also be a missed
forward-propagation opportunity.  E.g. there's no reason in principle
why we couldn't see:

  int tmp = a;
  b = tmp; // tmp dies here

The code handles that case correctly, but it isn't a swap.

> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      rtx set, src, dest;
> +
> +      if (!active_insn_p (insn))
> +	continue;
> +
> +      set = single_set (insn);
> +      if (set == NULL_RTX)
> +	continue;
> +
> +      src = SET_SRC (set);
> +      dest = SET_DEST (set);
> +
> +      /* Check if the current SET's source is the same
> +	 as any previously seen destination.
> +	 This is quadratic but the number of insns in BB
> +	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */
> +      for (int i = count - 1; i >= 0; --i)
> +	{
> +	  if (reg_overlap_mentioned_p (src, dests[i])
> +	      && find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
> +	    need_no_cmov->add (insns[i]);
> +	}

Minor formatting nit, sorry, but the braces aren't needed.

This is really a comment about the existing swap recognition code,
but I think the reg_overlap_mentioned_p check would be more obviously
correct if we guard the “for” loop with:

      if (REG_P (src))

This guarantees that any previous instructions are equal to SRC or
subregs of it.  I guess it might be more efficient too.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.
  2021-07-22 12:07     ` Robin Dapp
@ 2021-07-26 19:10       ` Richard Sandiford
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2021-07-26 19:10 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
>> It looks like this is an existing (potential) problem,
>> but default_noce_conversion_profitable_p uses seq_cost, which in turn
>> uses insn_cost.  And insn_cost has an optional target hook behind it,
>> which allows for costing based on insn attributes etc.  For a true
>> apples-with-apples comparison we should use insn_cost here too.
>
> Good point, fixed that.
>
>> I think the detail that COSTS_N_INSNS (2) is the default is useful here.
>> (In other words, I'd forgotten by the time I'd poked around other bits of
>> ifcvt and was about to ask why we didn't cost the condition “properly”.)
>> So how about something like:
>> 
>>     The original costs already include a base cost of COSTS_N_INSNS (2):
>>     one instruction for the compare (which we will be needing either way)
>>     and one instruction for the branch.
>
> Yes, this is much clearer. I went with that wording in the attached v2.
>
> Regards
>   Robin
>
> From 54508fa00fbee082fa62f4e1a8167f477938e781 Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@linux.ibm.com>
> Date: Wed, 27 Nov 2019 13:46:17 +0100
> Subject: [PATCH v2 3/7] ifcvt: Improve costs handling for
>  noce_convert_multiple.
>
> When noce_convert_multiple is called the original costs are not yet
> initialized.  Therefore, up to now, costs were only ever unfairly
> compared against COSTS_N_INSNS (2).  This would lead to
> default_noce_conversion_profitable_p () rejecting all but the most
> contrived of sequences.
>
> This patch temporarily initializes the original costs by counting
> adding costs for all sets inside the then_bb.

OK, thanks.

Richard

> ---
>  gcc/ifcvt.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index f9d4478ec18..91b54dbea9c 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3404,14 +3404,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>     (SET (REG) (REG)) insns suitable for conversion to a series
>     of conditional moves.  Also check that we have more than one set
>     (other routines can handle a single set better than we would), and
> -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
> +   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> +   through the insns store the sum of their potential costs in COST.  */
>  
>  static bool
> -bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
> +bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
>  {
>    rtx_insn *insn;
>    unsigned count = 0;
>    unsigned param = param_max_rtl_if_conversion_insns;
> +  bool speed_p = optimize_bb_for_speed_p (test_bb);
> +  unsigned potential_cost = 0;
>  
>    FOR_BB_INSNS (test_bb, insn)
>      {
> @@ -3447,9 +3450,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
>        if (!can_conditionally_move_p (GET_MODE (dest)))
>  	return false;
>  
> +      potential_cost += insn_cost (insn, speed_p);
> +
>        count++;
>      }
>  
> +  *cost += potential_cost;
> +
>    /* If we would only put out one conditional move, the other strategies
>       this pass tries are better optimized and will be more appropriate.
>       Some targets want to strictly limit the number of conditional moves
> @@ -3497,11 +3504,24 @@ noce_process_if_block (struct noce_if_info *if_info)
>       to calculate a value for x.
>       ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  */
> +  /* First look for multiple SETS.  The original costs already include
> +     a base cost of COSTS_N_INSNS (2): one instruction for the compare
> +     (which we will be needing either way) and one instruction for the
> +     branch.  When comparing costs we want to use the branch instruction
> +     cost and the sets vs. the cmovs generated here.  Therefore subtract
> +     the costs of the compare before checking.
> +     ??? Actually, instead of the branch instruction costs we might want
> +     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> +
> +  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned old_cost = if_info->original_cost;
>    if (!else_bb
>        && HAVE_conditional_move
> -      && bb_ok_for_noce_convert_multiple_sets (then_bb))
> +      && bb_ok_for_noce_convert_multiple_sets (then_bb, &potential_cost))
>      {
> +      /* Temporarily set the original costs to what we estimated so
> +	 we can determine if the transformation is worth it.  */
> +      if_info->original_cost = potential_cost;
>        if (noce_convert_multiple_sets (if_info))
>  	{
>  	  if (dump_file && if_info->transform_name)
> @@ -3509,6 +3529,9 @@ noce_process_if_block (struct noce_if_info *if_info)
>  		     if_info->transform_name);
>  	  return TRUE;
>  	}
> +
> +      /* Restore the original costs.  */
> +      if_info->original_cost = old_cost;
>      }
>  
>    bool speed_p = optimize_bb_for_speed_p (test_bb);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-07-22 12:07     ` Robin Dapp
@ 2021-07-26 19:31       ` Richard Sandiford
  2021-07-27 20:49         ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-07-26 19:31 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches


> Before, we had a loop with two iterations that tried "emit_cmov (cond, 
> op2, op3)".  op2 and op3 can (or even were) If this did not succeed we 
> would revert the condition as well as op3 and op3 in-place and try 
> again.  I found that a bit cumbersome and intended to make this explicit 
> but it's still kind of involved, particularly since cond may come in 
> reversed, we additionally swap op2 and op3 and all the way back again.
>
> I remember from the first iteration of these patches that this (double 
> revert) was needed for exactly one test case in the test suite on x86. 
> When re-running it today I could not reproduce it anymore, though.

Hmm, OK.  Doesn't expanding both versions up-front create the same kind of
problem that the patch is fixing, in that we expand (and therefore cost)
both the reversed and unreversed comparison?  Also…

Robin Dapp <rdapp@linux.ibm.com> writes:
> @@ -4782,17 +4784,30 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    if (cmode == VOIDmode)
>      cmode = GET_MODE (op0);
>  
> -  enum rtx_code orig_code = code;
> +  /* If the first condition operand is constant and the second is not, swap
> +     them.  In that case we also need to reverse the comparison.
> +     If both are non-constant It is possible that the conditional move
> +     will not expand with operands in this order, e.g. when we are about to
> +     create a min/max expression and the operands do not match the condition.
> +     In that case we also need the reversed condition code and comparison.  */

…for min/max, I would have expected this swap to create the canonical
operand order for the min and max too.  What causes it to be rejected?

> +
> +  rtx rev_comparison = NULL_RTX;
>    bool swapped = false;
> -  if (swap_commutative_operands_p (op2, op3)
> -      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
> -          != UNKNOWN))
> +
> +  code = unsignedp ? unsigned_condition (code) : code;
> +  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
> +
> +  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
> +      != UNKNOWN)
>      {
> -      std::swap (op2, op3);
> -      code = reversed;
> -      swapped = true;
> +      reversed = unsignedp ? unsigned_condition (reversed) : reversed;

When is this needed?  I'd have expected the reversed from of an unsigned
code to be naturally unsigned.

Thanks,
Richard

> +      rev_comparison = simplify_gen_relational (reversed, VOIDmode, cmode,
> +						op0, op1);
>      }
>  
> +  if (swap_commutative_operands_p (op2, op3) && reversed != UNKNOWN)
> +    swapped = true;
> +
>    if (mode == VOIDmode)
>      mode = GET_MODE (op2);
>  
> @@ -4804,58 +4819,99 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    if (!target)
>      target = gen_reg_rtx (mode);
>  
> -  for (int pass = 0; ; pass++)
> +  if (comparison && COMPARISON_P (comparison))
> +    prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> +		      GET_CODE (comparison), NULL_RTX,
> +		      unsignedp, OPTAB_WIDEN, &comparison, &cmode);
> +  else
> +    return NULL_RTX;
> +
> +  if (rev_comparison && COMPARISON_P (rev_comparison))
> +    prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
> +		      GET_CODE (rev_comparison), NULL_RTX,
> +		      unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);
> +
> +  if (!swapped)
> +    return emit_conditional_move (target, comparison, rev_comparison,
> +				  op2, op3, mode);
> +  else
> +    return emit_conditional_move (target, rev_comparison, comparison,
> +				  op3, op2, mode);
> +}
> +
> +/* Helper function for emitting a conditional move.  Given a COMPARISON
> +   and a reversed REV_COMPARISON it will try to expand a conditional move
> +   with COMPARISON first and try with REV_COMPARISON if that fails.  */
> +
> +rtx
> +emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
> +		       rtx op2, rtx op3, machine_mode mode)
> +{
> +
> +  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
> +
> +  if (res != NULL_RTX)
> +    return res;
> +
> +  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
> +}
> +
> +/* Helper for emitting a conditional move.  */
> +
> +static rtx
> +emit_conditional_move (rtx target, rtx comparison,
> +		       rtx op2, rtx op3, machine_mode mode)
> +{
> +  rtx_insn *last;
> +  enum insn_code icode;
> +
> +  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> +    return NULL_RTX;
> +
> +  /* If the two source operands are identical, that's just a move.  */
> +  if (rtx_equal_p (op2, op3))
>      {
> -      code = unsignedp ? unsigned_condition (code) : code;
> -      comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
> +      if (!target)
> +	target = gen_reg_rtx (mode);
>  
> -      /* We can get const0_rtx or const_true_rtx in some circumstances.  Just
> -	 punt and let the caller figure out how best to deal with this
> -	 situation.  */
> -      if (COMPARISON_P (comparison))
> -	{
> -	  saved_pending_stack_adjust save;
> -	  save_pending_stack_adjust (&save);
> -	  last = get_last_insn ();
> -	  do_pending_stack_adjust ();
> -	  machine_mode cmpmode = cmode;
> -	  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> -			    GET_CODE (comparison), NULL_RTX, unsignedp,
> -			    OPTAB_WIDEN, &comparison, &cmpmode);
> -	  if (comparison)
> -	    {
> -	      class expand_operand ops[4];
> +      emit_move_insn (target, op3);
> +      return target;
> +    }
>  
> -	      create_output_operand (&ops[0], target, mode);
> -	      create_fixed_operand (&ops[1], comparison);
> -	      create_input_operand (&ops[2], op2, mode);
> -	      create_input_operand (&ops[3], op3, mode);
> -	      if (maybe_expand_insn (icode, 4, ops))
> -		{
> -		  if (ops[0].value != target)
> -		    convert_move (target, ops[0].value, false);
> -		  return target;
> -		}
> -	    }
> -	  delete_insns_since (last);
> -	  restore_pending_stack_adjust (&save);
> -	}
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op2);
>  
> -      if (pass == 1)
> -	return NULL_RTX;
> +  icode = direct_optab_handler (movcc_optab, mode);
>  
> -      /* If the preferred op2/op3 order is not usable, retry with other
> -	 operand order, perhaps it will expand successfully.  */
> -      if (swapped)
> -	code = orig_code;
> -      else if ((reversed = reversed_comparison_code_parts (orig_code, op0, op1,
> -							   NULL))
> -	       != UNKNOWN)
> -	code = reversed;
> -      else
> -	return NULL_RTX;
> -      std::swap (op2, op3);
> +  if (icode == CODE_FOR_nothing)
> +    return NULL_RTX;
> +
> +  if (!target)
> +    target = gen_reg_rtx (mode);
> +
> +  saved_pending_stack_adjust save;
> +  save_pending_stack_adjust (&save);
> +  last = get_last_insn ();
> +  do_pending_stack_adjust ();
> +
> +  class expand_operand ops[4];
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_fixed_operand (&ops[1], comparison);
> +  create_input_operand (&ops[2], op2, mode);
> +  create_input_operand (&ops[3], op3, mode);
> +
> +  if (maybe_expand_insn (icode, 4, ops))
> +    {
> +      if (ops[0].value != target)
> +	convert_move (target, ops[0].value, false);
> +      return target;
>      }
> +
> +  delete_insns_since (last);
> +  restore_pending_stack_adjust (&save);
> +
> +  return NULL_RTX;
>  }
>  
>  
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 3bbceff92d9..f853b93f37f 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
>  /* Emit a conditional move operation.  */
>  rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
>  			   rtx, rtx, machine_mode, int);
> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
>  
>  /* Emit a conditional negate or bitwise complement operation.  */
>  rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-07-26 19:31       ` Richard Sandiford
@ 2021-07-27 20:49         ` Robin Dapp
  2021-08-06 12:14           ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-07-27 20:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

> Hmm, OK.  Doesn't expanding both versions up-front create the same kind of
> problem that the patch is fixing, in that we expand (and therefore cost)
> both the reversed and unreversed comparison?  Also…
> 
[..]
> 
> …for min/max, I would have expected this swap to create the canonical
> operand order for the min and max too.  What causes it to be rejected?
> 

We should not be expanding two comparisons but only emit (and cost) the 
reversed comparison if expanding the non-reversed one failed.

Regarding the reversal, I checked again - the commit introducing the 
op2/op3 swap is g:deed3da9af697ecf073aea855ecce2d22d85ef71, the 
corresponding test case is gcc.target/i386/pr70465-2.c.  It inlines one 
long double ternary operation into another, probably causing  not for 
multiple sets, mind you.  The situation doesn't occur with double.

>> +
>> +  rtx rev_comparison = NULL_RTX;
>>     bool swapped = false;
>> -  if (swap_commutative_operands_p (op2, op3)
>> -      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>> -          != UNKNOWN))
>> +
>> +  code = unsignedp ? unsigned_condition (code) : code;
>> +  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
>> +
>> +  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>> +      != UNKNOWN)
>>       {
>> -      std::swap (op2, op3);
>> -      code = reversed;
>> -      swapped = true;
>> +      reversed = unsignedp ? unsigned_condition (reversed) : reversed;
> 
> When is this needed?  I'd have expected the reversed from of an unsigned
> code to be naturally unsigned.

This was also introduced by the commit above, probably just repeating 
what was done for the non-reversed comparison.

Regards
  Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  2021-07-27 20:49         ` Robin Dapp
@ 2021-08-06 12:14           ` Richard Sandiford
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2021-08-06 12:14 UTC (permalink / raw)
  To: Robin Dapp via Gcc-patches

Sorry for the slow reply.

Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Hmm, OK.  Doesn't expanding both versions up-front create the same kind of
>> problem that the patch is fixing, in that we expand (and therefore cost)
>> both the reversed and unreversed comparison?  Also…
>> 
> [..]
>> 
>> …for min/max, I would have expected this swap to create the canonical
>> operand order for the min and max too.  What causes it to be rejected?
>> 
>
> We should not be expanding two comparisons but only emit (and cost) the 
> reversed comparison if expanding the non-reversed one failed.

The (potential) problem is that prepare_cmp_insn can itself emit
instructions.  With the current code we rewind any prepare_cmp_insn
that isn't needed, whereas with the new code we might keep both.

This also means that prepare_cmp_insn calls need to stay inside the:

  saved_pending_stack_adjust save;
  save_pending_stack_adjust (&save);
  last = get_last_insn ();
  do_pending_stack_adjust ();

  …

  delete_insns_since (last);
  restore_pending_stack_adjust (&save);

block.

> Regarding the reversal, I checked again - the commit introducing the 
> op2/op3 swap is g:deed3da9af697ecf073aea855ecce2d22d85ef71, the 
> corresponding test case is gcc.target/i386/pr70465-2.c.  It inlines one 
> long double ternary operation into another, probably causing  not for 
> multiple sets, mind you.  The situation doesn't occur with double.

OK, so going back to that revision and using the original SciMark test
case, we first try:

  (lt (reg/v:DF 272 [ ab ])
      (reg/v:DF 271 [ t ]))
  (reg/v:SI 227 [ jp ])
  (subreg:SI (reg:DI 346 [ ivtmp.59 ]) 0)

but i386 doesn't provide a native cbranchdf4 for lt and so the
prepare_cmp_insn fails.  Interesting that we use cbranch<mode>4
as the test for what conditional moves should accept, but I guess
that isn't something to change now.

So the key piece of information that I didn't realise before is
that it was the prepare_cmp_insn that failed, not the mov<mode>cc
expander.  I think we can accomodate that in the new scheme
by doing:

  if (rev_comparison && COMPARISON_P (rev_comparison))
    prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
		      GET_CODE (rev_comparison), NULL_RTX,
		      unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);

first and then making:

  if (comparison && COMPARISON_P (comparison))
    prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
		      GET_CODE (comparison), NULL_RTX,
		      unsignedp, OPTAB_WIDEN, &comparison, &cmode);

conditional on !rev_comparison.  But maybe the above makes
that moot.

>>> +
>>> +  rtx rev_comparison = NULL_RTX;
>>>     bool swapped = false;
>>> -  if (swap_commutative_operands_p (op2, op3)
>>> -      && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>>> -          != UNKNOWN))
>>> +
>>> +  code = unsignedp ? unsigned_condition (code) : code;
>>> +  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
>>> +
>>> +  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>>> +      != UNKNOWN)
>>>       {
>>> -      std::swap (op2, op3);
>>> -      code = reversed;
>>> -      swapped = true;
>>> +      reversed = unsignedp ? unsigned_condition (reversed) : reversed;
>> 
>> When is this needed?  I'd have expected the reversed from of an unsigned
>> code to be naturally unsigned.
>
> This was also introduced by the commit above, probably just repeating 
> what was done for the non-reversed comparison.

Yeah, but in the original code, the first reverse_comparison_code_parts
happens outside the loop, before the first unsigned_condition (which
happens inside the loop).  In the new code, the unsigned_condition
happens first, before we try reversing it.

IMO the new order makes more sense than the old one.  But it means that
reversed_comparison_code_parts always sees a comparison of the right
signedness, so we shouldn't need to adjust the result.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-07-26 19:08       ` Richard Sandiford
@ 2021-09-15  8:39         ` Robin Dapp
  2021-10-14  8:45           ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-09-15  8:39 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

Hi Richard,

> Don't we still need this code (without the REG_DEAD handling) for the
> case in which…
> 
>> +      /* As we are transforming
>> +	 if (x > y)
>> +	   {
>> +	     a = b;
>> +	     c = d;
>> +	   }
>> +	 into
>> +	   a = (x > y) ...
>> +	   c = (x > y) ...
>> +
>> +	 we potentially check x > y before every set.
>> +	 Even though the check might be removed by subsequent passes, this means
>> +	 that we cannot transform
>> +	   if (x > y)
>> +	     {
>> +	       x = y;
>> +	       ...
>> +	     }
>> +	 into
>> +	   x = (x > y) ...
>> +	   ...
>> +	 since this would invalidate x and the following to-be-removed checks.
>> +	 Therefore we introduce a temporary every time we are about to
>> +	 overwrite a variable used in the check.  Costing of a sequence with
>> +	 these is going to be inaccurate so only use temporaries when
>> +	 needed.  */
>> +      if (reg_overlap_mentioned_p (target, cond))
>> +	temp = gen_reg_rtx (GET_MODE (target));
> 
> …this code triggers?  I don't see otherwise how later uses of x would
> pick up “temp” instead of the original target.  E.g. suppose we had:
> 
>      if (x > y)
>        {
>          x = …;
>          z = x; // x does not die here
>        }
> 
> Without the loop, it looks like z would pick up the old value of x
> (used in the comparison) instead of the new one.

getting back to this now.  I re-added handling of the situation you 
mentioned (even though I didn't manage to trigger it myself).

Regards
  Robin

[-- Attachment #2: check-need-cmovs.diff --]
[-- Type: text/x-patch, Size: 7235 bytes --]

commit 2d909ee93ee1eb0f7474ed57581713367c22ba6c
Author: Robin Dapp <rdapp@linux.ibm.com>
Date:   Thu Jun 24 16:40:04 2021 +0200

    ifcvt: Check if cmovs are needed.
    
    When if-converting multiple SETs and we encounter a swap-style idiom
    
      if (a > b)
        {
          tmp = c;   // [1]
          c = d;
          d = tmp;
        }
    
    ifcvt should not generate a conditional move for the instruction at
    [1].
    
    In order to achieve that, this patch goes through all relevant SETs
    and marks the relevant instructions.  This helps to evaluate costs.
    
    On top, only generate temporaries if the current cmov is going to
    overwrite one of the comparands of the initial compare.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..f1448667732 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
 			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void need_cmov_or_rewire (basic_block, hash_set<rtx_insn *> *,
+				 hash_map<rtx_insn *, int> *);
 \f
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec<rtx_insn *> unmodified_insns;
   int count = 0;
 
+  hash_set<rtx_insn *> need_no_cmov;
+  hash_map<rtx_insn *, int> rewired_src;
+
+  need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
+
   FOR_BB_INSNS (then_bb, insn)
     {
       /* Skip over non-insns.  */
@@ -3213,26 +3220,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
       gcc_checking_assert (set);
 
       rtx target = SET_DEST (set);
-      rtx temp = gen_reg_rtx (GET_MODE (target));
-      rtx new_val = SET_SRC (set);
+      rtx temp;
+
+      int *ii = rewired_src.get (insn);
+      rtx new_val = ii == NULL ? SET_SRC (set) : temporaries[*ii];
       rtx old_val = target;
 
-      /* If we were supposed to read from an earlier write in this block,
-	 we've changed the register allocation.  Rewire the read.  While
-	 we are looking, also try to catch a swap idiom.  */
-      for (int i = count - 1; i >= 0; --i)
-	if (reg_overlap_mentioned_p (new_val, targets[i]))
-	  {
-	    /* Catch a "swap" style idiom.  */
-	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
-	      /* The write to targets[i] is only live until the read
-		 here.  As the condition codes match, we can propagate
-		 the set to here.  */
-	      new_val = SET_SRC (single_set (unmodified_insns[i]));
-	    else
-	      new_val = temporaries[i];
-	    break;
-	  }
+      /* As we are transforming
+	 if (x > y)
+	   {
+	     a = b;
+	     c = d;
+	   }
+	 into
+	   a = (x > y) ...
+	   c = (x > y) ...
+
+	 we potentially check x > y before every set.
+	 Even though the check might be removed by subsequent passes, this means
+	 that we cannot transform
+	   if (x > y)
+	     {
+	       x = y;
+	       ...
+	     }
+	 into
+	   x = (x > y) ...
+	   ...
+	 since this would invalidate x and the following to-be-removed checks.
+	 Therefore we introduce a temporary every time we are about to
+	 overwrite a variable used in the check.  Costing of a sequence with
+	 these is going to be inaccurate so only use temporaries when
+	 needed.  */
+      if (reg_overlap_mentioned_p (target, cond))
+	temp = gen_reg_rtx (GET_MODE (target));
+      else
+	temp = target;
+
+      /* We have identified swap-style idioms in check_need_cmovs.  A normal
+	 set will need to be a cmov while the first instruction of a swap-style
+	 idiom can be a regular move.  This helps with costing.  */
+      bool need_cmov = !need_no_cmov.contains (insn);
 
       /* If we had a non-canonical conditional jump (i.e. one where
 	 the fallthrough is to the "else" case) we need to reverse
@@ -3275,16 +3303,29 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
 	}
 
-      /* Actually emit the conditional move.  */
-      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-				       x, y, new_val, old_val);
+      rtx temp_dest = NULL_RTX;
 
-      /* If we failed to expand the conditional move, drop out and don't
-	 try to continue.  */
-      if (temp_dest == NULL_RTX)
+      if (need_cmov)
 	{
-	  end_sequence ();
-	  return FALSE;
+	  /* Actually emit the conditional move.  */
+	  temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+					   x, y, new_val, old_val);
+
+	  /* If we failed to expand the conditional move, drop out and don't
+	     try to continue.  */
+	  if (temp_dest == NULL_RTX)
+	    {
+	      end_sequence ();
+	      return FALSE;
+	    }
+	}
+      else
+	{
+	  if (if_info->then_else_reversed)
+	    noce_emit_move_insn (temp, old_val);
+	  else
+	    noce_emit_move_insn (temp, new_val);
+	  temp_dest = temp;
 	}
 
       /* Bookkeeping.  */
@@ -3808,6 +3849,87 @@ check_cond_move_block (basic_block bb,
   return TRUE;
 }
 
+/* Find local swap-style idioms in BB and mark the first insn (1)
+   that is only a temporary as not needing a conditional move as
+   it is going to be dead afterwards anyway.
+
+     (1) int tmp = a;
+	 a = b;
+	 b = tmp;
+
+	 ifcvt
+	 -->
+
+	 tmp = a;
+	 a = cond ? b : a_old;
+	 b = cond ? tmp : b_old;
+
+    Additionally, store the index of insns like (2) when a subsequent
+    SET reads from their destination.
+
+    (2) int c = a;
+	int d = c;
+
+	ifcvt
+	-->
+
+	c = cond ? a : c_old;
+	d = cond ? d : c;     // Need to use c rather than c_old here.
+*/
+
+static void
+need_cmov_or_rewire (basic_block bb,
+		     hash_set<rtx_insn *> *need_no_cmov,
+		     hash_map<rtx_insn *, int> *rewired_src)
+{
+  rtx_insn *insn;
+  int count = 0;
+  auto_vec<rtx_insn *> insns;
+  auto_vec<rtx> dests;
+
+  /* Iterate over all SETs, storing the destinations
+     in DEST.
+     - If we hit a SET that reads from a destination
+       that we have seen before and the corresponding register
+       is dead afterwards, the register does not need to be
+       moved conditionally.
+     - If we encounter a previously changed register,
+       rewire the read to the original source.  */
+  FOR_BB_INSNS (bb, insn)
+    {
+      rtx set, src, dest;
+
+      if (!active_insn_p (insn))
+	continue;
+
+      set = single_set (insn);
+      if (set == NULL_RTX)
+	continue;
+
+      src = SET_SRC (set);
+      dest = SET_DEST (set);
+
+      /* Check if the current SET's source is the same
+	 as any previously seen destination.
+	 This is quadratic but the number of insns in BB
+	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */
+      if (REG_P (src))
+	for (int i = count - 1; i >= 0; --i)
+	  if (reg_overlap_mentioned_p (src, dests[i]))
+	    {
+	      if (find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
+		need_no_cmov->add (insns[i]);
+	      else
+		rewired_src->put (insn, i);
+	    }
+
+      insns.safe_push (insn);
+      dests.safe_push (dest);
+
+      count++;
+    }
+}
+
 /* Given a basic block BB suitable for conditional move conversion,
    a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
    the register values depending on COND, emit the insns in the block as

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-09-15  8:39         ` Robin Dapp
@ 2021-10-14  8:45           ` Richard Sandiford
  2021-10-14 14:20             ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-10-14  8:45 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Hi Robin,

Thanks for the update and sorry for the late response.

Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
>> Don't we still need this code (without the REG_DEAD handling) for the
>> case in which…
>> 
>>> +      /* As we are transforming
>>> +	 if (x > y)
>>> +	   {
>>> +	     a = b;
>>> +	     c = d;
>>> +	   }
>>> +	 into
>>> +	   a = (x > y) ...
>>> +	   c = (x > y) ...
>>> +
>>> +	 we potentially check x > y before every set.
>>> +	 Even though the check might be removed by subsequent passes, this means
>>> +	 that we cannot transform
>>> +	   if (x > y)
>>> +	     {
>>> +	       x = y;
>>> +	       ...
>>> +	     }
>>> +	 into
>>> +	   x = (x > y) ...
>>> +	   ...
>>> +	 since this would invalidate x and the following to-be-removed checks.
>>> +	 Therefore we introduce a temporary every time we are about to
>>> +	 overwrite a variable used in the check.  Costing of a sequence with
>>> +	 these is going to be inaccurate so only use temporaries when
>>> +	 needed.  */
>>> +      if (reg_overlap_mentioned_p (target, cond))
>>> +	temp = gen_reg_rtx (GET_MODE (target));
>> 
>> …this code triggers?  I don't see otherwise how later uses of x would
>> pick up “temp” instead of the original target.  E.g. suppose we had:
>> 
>>      if (x > y)
>>        {
>>          x = …;
>>          z = x; // x does not die here
>>        }
>> 
>> Without the loop, it looks like z would pick up the old value of x
>> (used in the comparison) instead of the new one.
>
> getting back to this now.  I re-added handling of the situation you 
> mentioned (even though I didn't manage to trigger it myself).

Yeah, the only reliable way to test for this might be an rtx test
(see testsuite/gcc.dg/rtl for some examples).  A test isn't necessary
though.

My only remaining concern is that bb_ok_for_noce_convert_multiple_sets
allows subreg sources:

      if (!(REG_P (src)
	   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
	       && subreg_lowpart_p (src))))
	return false;

These subregs could also require rewrites.  It looks like the original
code checks for overlaps correctly, but doesn't necessarily deal with
a hit correctly.  So this is at least partly pre-existing.

I think the way of handling that is as follows:

> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 017944f4f79..f1448667732 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
>  			       edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void need_cmov_or_rewire (basic_block, hash_set<rtx_insn *> *,
> +				 hash_map<rtx_insn *, int> *);
>  \f
>  /* Count the number of non-jump active insns in BB.  */
>  
> @@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>    auto_vec<rtx_insn *> unmodified_insns;
>    int count = 0;
>  
> +  hash_set<rtx_insn *> need_no_cmov;
> +  hash_map<rtx_insn *, int> rewired_src;
> +
> +  need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
> +
>    FOR_BB_INSNS (then_bb, insn)
>      {
>        /* Skip over non-insns.  */
> @@ -3213,26 +3220,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>        gcc_checking_assert (set);
>  
>        rtx target = SET_DEST (set);
> -      rtx temp = gen_reg_rtx (GET_MODE (target));
> -      rtx new_val = SET_SRC (set);
> +      rtx temp;
> +
> +      int *ii = rewired_src.get (insn);
> +      rtx new_val = ii == NULL ? SET_SRC (set) : temporaries[*ii];

(1) here use something like:

    rtx new_val = SET_SRC (set);
    if (int *ii = rewired_src.get (insn))
      new_val = simplify_replace_rtx (new_val, targets[*ii],
				      temporaries[*ii]);

>        rtx old_val = target;
>  
> -      /* If we were supposed to read from an earlier write in this block,
> -	 we've changed the register allocation.  Rewire the read.  While
> -	 we are looking, also try to catch a swap idiom.  */
> -      for (int i = count - 1; i >= 0; --i)
> -	if (reg_overlap_mentioned_p (new_val, targets[i]))
> -	  {
> -	    /* Catch a "swap" style idiom.  */
> -	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> -	      /* The write to targets[i] is only live until the read
> -		 here.  As the condition codes match, we can propagate
> -		 the set to here.  */
> -	      new_val = SET_SRC (single_set (unmodified_insns[i]));
> -	    else
> -	      new_val = temporaries[i];
> -	    break;
> -	  }
> +      /* As we are transforming
> +	 if (x > y)
> +	   {
> +	     a = b;
> +	     c = d;
> +	   }
> +	 into
> +	   a = (x > y) ...
> +	   c = (x > y) ...
> +
> +	 we potentially check x > y before every set.
> +	 Even though the check might be removed by subsequent passes, this means
> +	 that we cannot transform
> +	   if (x > y)
> +	     {
> +	       x = y;
> +	       ...
> +	     }
> +	 into
> +	   x = (x > y) ...
> +	   ...
> +	 since this would invalidate x and the following to-be-removed checks.
> +	 Therefore we introduce a temporary every time we are about to
> +	 overwrite a variable used in the check.  Costing of a sequence with
> +	 these is going to be inaccurate so only use temporaries when
> +	 needed.  */
> +      if (reg_overlap_mentioned_p (target, cond))
> +	temp = gen_reg_rtx (GET_MODE (target));
> +      else
> +	temp = target;
> +
> +      /* We have identified swap-style idioms in check_need_cmovs.  A normal
> +	 set will need to be a cmov while the first instruction of a swap-style
> +	 idiom can be a regular move.  This helps with costing.  */
> +      bool need_cmov = !need_no_cmov.contains (insn);
>  
>        /* If we had a non-canonical conditional jump (i.e. one where
>  	 the fallthrough is to the "else" case) we need to reverse
> @@ -3275,16 +3303,29 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>  	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
>  	}
>  
> -      /* Actually emit the conditional move.  */
> -      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
> -				       x, y, new_val, old_val);
> +      rtx temp_dest = NULL_RTX;
>  
> -      /* If we failed to expand the conditional move, drop out and don't
> -	 try to continue.  */
> -      if (temp_dest == NULL_RTX)
> +      if (need_cmov)
>  	{
> -	  end_sequence ();
> -	  return FALSE;
> +	  /* Actually emit the conditional move.  */
> +	  temp_dest = noce_emit_cmove (if_info, temp, cond_code,
> +					   x, y, new_val, old_val);
> +
> +	  /* If we failed to expand the conditional move, drop out and don't
> +	     try to continue.  */
> +	  if (temp_dest == NULL_RTX)
> +	    {
> +	      end_sequence ();
> +	      return FALSE;
> +	    }
> +	}
> +      else
> +	{
> +	  if (if_info->then_else_reversed)
> +	    noce_emit_move_insn (temp, old_val);
> +	  else
> +	    noce_emit_move_insn (temp, new_val);
> +	  temp_dest = temp;
>  	}
>  
>        /* Bookkeeping.  */
> @@ -3808,6 +3849,87 @@ check_cond_move_block (basic_block bb,
>    return TRUE;
>  }
>  
> +/* Find local swap-style idioms in BB and mark the first insn (1)
> +   that is only a temporary as not needing a conditional move as
> +   it is going to be dead afterwards anyway.
> +
> +     (1) int tmp = a;
> +	 a = b;
> +	 b = tmp;
> +
> +	 ifcvt
> +	 -->
> +
> +	 tmp = a;
> +	 a = cond ? b : a_old;
> +	 b = cond ? tmp : b_old;
> +
> +    Additionally, store the index of insns like (2) when a subsequent
> +    SET reads from their destination.
> +
> +    (2) int c = a;
> +	int d = c;
> +
> +	ifcvt
> +	-->
> +
> +	c = cond ? a : c_old;
> +	d = cond ? d : c;     // Need to use c rather than c_old here.
> +*/
> +
> +static void
> +need_cmov_or_rewire (basic_block bb,
> +		     hash_set<rtx_insn *> *need_no_cmov,
> +		     hash_map<rtx_insn *, int> *rewired_src)
> +{
> +  rtx_insn *insn;
> +  int count = 0;
> +  auto_vec<rtx_insn *> insns;
> +  auto_vec<rtx> dests;
> +
> +  /* Iterate over all SETs, storing the destinations
> +     in DEST.
> +     - If we hit a SET that reads from a destination
> +       that we have seen before and the corresponding register
> +       is dead afterwards, the register does not need to be
> +       moved conditionally.
> +     - If we encounter a previously changed register,
> +       rewire the read to the original source.  */
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      rtx set, src, dest;
> +
> +      if (!active_insn_p (insn))
> +	continue;
> +
> +      set = single_set (insn);
> +      if (set == NULL_RTX)
> +	continue;
> +
> +      src = SET_SRC (set);
> +      dest = SET_DEST (set);
> +
> +      /* Check if the current SET's source is the same
> +	 as any previously seen destination.
> +	 This is quadratic but the number of insns in BB
> +	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */

(2) Insert:

    if (SUBREG_P (src))
      src = SUBREG_REG (src);

here.

OK with those changes if that works.  Let me know if they don't —
I'll try to be quicker with the next review.

Thanks,
Richard

> +      if (REG_P (src))
> +	for (int i = count - 1; i >= 0; --i)
> +	  if (reg_overlap_mentioned_p (src, dests[i]))
> +	    {
> +	      if (find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
> +		need_no_cmov->add (insns[i]);
> +	      else
> +		rewired_src->put (insn, i);
> +	    }
> +
> +      insns.safe_push (insn);
> +      dests.safe_push (dest);
> +
> +      count++;
> +    }
> +}
> +
>  /* Given a basic block BB suitable for conditional move conversion,
>     a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
>     the register values depending on COND, emit the insns in the block as

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-10-14  8:45           ` Richard Sandiford
@ 2021-10-14 14:20             ` Robin Dapp
  2021-10-14 14:32               ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-10-14 14:20 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Hi Richard,

> (2) Insert:
> 
>      if (SUBREG_P (src))
>        src = SUBREG_REG (src);
> 
> here.
> 
> OK with those changes if that works.  Let me know if they don't —
> I'll try to be quicker with the next review.

thank you, this looks good in a first testsuite run on s390.  If you 
have time, would you mind looking at the other outstanding patches of 
this series as well? In case of further comments, which I am sure there 
will be, I could test them all at once afterwards.

Thanks again
  Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-10-14 14:20             ` Robin Dapp
@ 2021-10-14 14:32               ` Richard Sandiford
  2021-10-18 11:40                 ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-10-14 14:32 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
>> (2) Insert:
>> 
>>      if (SUBREG_P (src))
>>        src = SUBREG_REG (src);
>> 
>> here.
>> 
>> OK with those changes if that works.  Let me know if they don't —
>> I'll try to be quicker with the next review.
>
> thank you, this looks good in a first testsuite run on s390.  If you 
> have time, would you mind looking at the other outstanding patches of 
> this series as well? In case of further comments, which I am sure there 
> will be, I could test them all at once afterwards.

Which ones still need review?  I think 2/7 and 3/7 are approved,
but 4/7 was still being discussed.  AFAICT the last message on
that was:

   https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576865.html

We probably need to reach a conclusion on that before 5/7.
6/7 and 7/7 looked to be s390-specific.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-10-14 14:32               ` Richard Sandiford
@ 2021-10-18 11:40                 ` Robin Dapp
  2021-11-03  8:55                   ` Robin Dapp
  2021-11-05 15:33                   ` Richard Sandiford
  0 siblings, 2 replies; 31+ messages in thread
From: Robin Dapp @ 2021-10-18 11:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

Hi Richard,

after giving it a second thought, and seeing that most of the changes to 
existing code are not strictly necessary anymore, I figured it could be 
easier not changing the current control flow too much like in the 
attached patch.

The changes remaining are to "outsource" the maybe_expand_insn part and 
making the emit_conditional_move with full comparison and rev_comparsion 
externally available.

I suppose straightening of the arguably somewhat baroque parts, we can 
defer to a separate patch.

On s390 this works nicely but I haven't yet done a bootstrap on other archs.

Regards
  Robin

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 7483 bytes --]

commit eb50384ee0cdeeefa61ae89bdbb2875500b7ce60
Author: Robin Dapp <rdapp@linux.ibm.com>
Date:   Wed Nov 27 13:53:40 2019 +0100

    ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
    
    Currently we only ever call emit_conditional_move with the comparison
    (as well as its comparands) we got from the jump.  Thus, backends are
    going to emit a CC comparison for every conditional move that is being
    generated instead of re-using the existing CC.
    This, combined with emitting temporaries for each conditional move,
    causes sky-high costs for conditional moves.
    
    This patch allows to re-use a CC so the costing situation is improved a
    bit.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 6ae883cbdd4..f7765e60548 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -772,7 +772,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-			    rtx, rtx, rtx);
+			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1711,7 +1711,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+		 rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1743,23 +1744,30 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
       end_sequence ();
     }
 
-  /* Don't even try if the comparison operands are weird
-     except that the target supports cbranchcc4.  */
-  if (! general_operand (cmp_a, GET_MODE (cmp_a))
-      || ! general_operand (cmp_b, GET_MODE (cmp_b)))
-    {
-      if (!have_cbranchcc4
-	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
-	  || cmp_b != const0_rtx)
-	return NULL_RTX;
-    }
-
   unsignedp = (code == LTU || code == GEU
 	       || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
-				  vtrue, vfalse, GET_MODE (x),
-				  unsignedp);
+  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
+    target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
+				    vtrue, vfalse, GET_MODE (x));
+  else
+    {
+      /* Don't even try if the comparison operands are weird
+	 except that the target supports cbranchcc4.  */
+      if (! general_operand (cmp_a, GET_MODE (cmp_a))
+	  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
+	{
+	  if (!have_cbranchcc4
+	      || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
+	      || cmp_b != const0_rtx)
+	    return NULL_RTX;
+	}
+
+      target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
+				      vtrue, vfalse, GET_MODE (x),
+				      unsignedp);
+    }
+
   if (target)
     return target;
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 019bbb62882..25eecf29ed8 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -52,6 +52,9 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
 static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
 static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
 
+static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
+rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
+
 /* Debug facility for use in GDB.  */
 void debug_optab_libfuncs (void);
 \f
@@ -4875,6 +4878,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   /* get_condition will prefer to generate LT and GT even if the old
      comparison was against zero, so undo that canonicalization here since
      comparisons against zero are cheaper.  */
+
   if (code == LT && op1 == const1_rtx)
     code = LE, op1 = const0_rtx;
   else if (code == GT && op1 == constm1_rtx)
@@ -4925,18 +4929,10 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
 			    OPTAB_WIDEN, &comparison, &cmpmode);
 	  if (comparison)
 	    {
-	      class expand_operand ops[4];
-
-	      create_output_operand (&ops[0], target, mode);
-	      create_fixed_operand (&ops[1], comparison);
-	      create_input_operand (&ops[2], op2, mode);
-	      create_input_operand (&ops[3], op3, mode);
-	      if (maybe_expand_insn (icode, 4, ops))
-		{
-		  if (ops[0].value != target)
-		    convert_move (target, ops[0].value, false);
-		  return target;
-		}
+	       rtx res = emit_conditional_move (target, comparison,
+						op2, op3, mode);
+	       if (res != NULL_RTX)
+		 return res;
 	    }
 	  delete_insns_since (last);
 	  restore_pending_stack_adjust (&save);
@@ -4959,6 +4955,73 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
     }
 }
 
+/* Helper function that, in addition to COMPARISON, also tries
+   the reversed REV_COMPARISON with swapped OP2 and OP3.  As opposed
+   to when we pass the specific constituents of a comparison, no
+   additional insns are emitted for it.  It might still be necessary
+   to emit more than one insn for the final conditional move, though.  */
+
+rtx
+emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
+
+  if (res != NULL_RTX)
+    return res;
+
+  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
+}
+
+/* Helper for emitting a conditional move.  */
+
+static rtx
+emit_conditional_move (rtx target, rtx comparison,
+		       rtx op2, rtx op3, machine_mode mode)
+{
+  enum insn_code icode;
+
+  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
+    return NULL_RTX;
+
+  /* If the two source operands are identical, that's just a move.  */
+  if (rtx_equal_p (op2, op3))
+    {
+      if (!target)
+	target = gen_reg_rtx (mode);
+
+      emit_move_insn (target, op3);
+      return target;
+    }
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op2);
+
+  icode = direct_optab_handler (movcc_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return NULL_RTX;
+
+  if (!target)
+    target = gen_reg_rtx (mode);
+
+  class expand_operand ops[4];
+
+  create_output_operand (&ops[0], target, mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_input_operand (&ops[2], op2, mode);
+  create_input_operand (&ops[3], op3, mode);
+
+  if (maybe_expand_insn (icode, 4, ops))
+    {
+      if (ops[0].value != target)
+	convert_move (target, ops[0].value, false);
+      return target;
+    }
+
+  return NULL_RTX;
+}
+
 
 /* Emit a conditional negate or bitwise complement using the
    negcc or notcc optabs if available.  Return NULL_RTX if such operations
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 3bbceff92d9..f853b93f37f 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
 /* Emit a conditional move operation.  */
 rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
 			   rtx, rtx, machine_mode, int);
+rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
 
 /* Emit a conditional negate or bitwise complement operation.  */
 rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-10-18 11:40                 ` Robin Dapp
@ 2021-11-03  8:55                   ` Robin Dapp
  2021-11-05 15:33                   ` Richard Sandiford
  1 sibling, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2021-11-03  8:55 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Ping :)

Not wanting to pester but I'd have hoped to get this into stage1 still
as it helps performance on s390 in some cases.
Expecting there will be some more reviewing rounds for the remaining
patches, would I need to open a PR in order to be able to "officially"
commit this in a later stage? This could relax the time constraints.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-10-18 11:40                 ` Robin Dapp
  2021-11-03  8:55                   ` Robin Dapp
@ 2021-11-05 15:33                   ` Richard Sandiford
  2021-11-12 13:00                     ` Robin Dapp
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2021-11-05 15:33 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
> after giving it a second thought, and seeing that most of the changes to 
> existing code are not strictly necessary anymore, I figured it could be 
> easier not changing the current control flow too much like in the 
> attached patch.
>
> The changes remaining are to "outsource" the maybe_expand_insn part and 
> making the emit_conditional_move with full comparison and rev_comparsion 
> externally available.
>
> I suppose straightening of the arguably somewhat baroque parts, we can 
> defer to a separate patch.
>
> On s390 this works nicely but I haven't yet done a bootstrap on other archs.
>
> Regards
>   Robin
>
> commit eb50384ee0cdeeefa61ae89bdbb2875500b7ce60
> Author: Robin Dapp <rdapp@linux.ibm.com>
> Date:   Wed Nov 27 13:53:40 2019 +0100
>
>     ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
>     
>     Currently we only ever call emit_conditional_move with the comparison
>     (as well as its comparands) we got from the jump.  Thus, backends are
>     going to emit a CC comparison for every conditional move that is being
>     generated instead of re-using the existing CC.
>     This, combined with emitting temporaries for each conditional move,
>     causes sky-high costs for conditional moves.
>     
>     This patch allows to re-use a CC so the costing situation is improved a
>     bit.

Sorry for the slow reply.

> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 6ae883cbdd4..f7765e60548 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -772,7 +772,7 @@ static int noce_try_addcc (struct noce_if_info *);
>  static int noce_try_store_flag_constants (struct noce_if_info *);
>  static int noce_try_store_flag_mask (struct noce_if_info *);
>  static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
> -			    rtx, rtx, rtx);
> +			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
>  static int noce_try_cmove (struct noce_if_info *);
>  static int noce_try_cmove_arith (struct noce_if_info *);
>  static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> @@ -1711,7 +1711,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
>  
>  static rtx
>  noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
> -		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
> +		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
> +		 rtx rev_cc_cmp)
>  {
>    rtx target ATTRIBUTE_UNUSED;
>    int unsignedp ATTRIBUTE_UNUSED;
> @@ -1743,23 +1744,30 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
>        end_sequence ();
>      }
>  
> -  /* Don't even try if the comparison operands are weird
> -     except that the target supports cbranchcc4.  */
> -  if (! general_operand (cmp_a, GET_MODE (cmp_a))
> -      || ! general_operand (cmp_b, GET_MODE (cmp_b)))
> -    {
> -      if (!have_cbranchcc4
> -	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
> -	  || cmp_b != const0_rtx)
> -	return NULL_RTX;
> -    }
> -
>    unsignedp = (code == LTU || code == GEU
>  	       || code == LEU || code == GTU);
>  
> -  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> -				  vtrue, vfalse, GET_MODE (x),
> -				  unsignedp);
> +  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
> +    target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
> +				    vtrue, vfalse, GET_MODE (x));
> +  else
> +    {
> +      /* Don't even try if the comparison operands are weird
> +	 except that the target supports cbranchcc4.  */
> +      if (! general_operand (cmp_a, GET_MODE (cmp_a))
> +	  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
> +	{
> +	  if (!have_cbranchcc4
> +	      || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
> +	      || cmp_b != const0_rtx)
> +	    return NULL_RTX;
> +	}
> +
> +      target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> +				      vtrue, vfalse, GET_MODE (x),
> +				      unsignedp);
> +    }
> +

It's hard to judge this in isolation because it's not clear when
and how the new arguments are going to be used, but it seems OK
in principle.  Do you still want:

  /* If earliest == jump, try to build the cmove insn directly.
     This is helpful when combine has created some complex condition
     (like for alpha's cmovlbs) that we can't hope to regenerate
     through the normal interface.  */

  if (if_info->cond_earliest == if_info->jump)
    {

to be used when cc_cmp and rev_cc_cmp are nonnull?

>    if (target)
>      return target;
>  
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 019bbb62882..25eecf29ed8 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -52,6 +52,9 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
>  static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
>  static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
>  
> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);

This is redundant with the header file declaration.

> +
>  /* Debug facility for use in GDB.  */
>  void debug_optab_libfuncs (void);
>  \f
> @@ -4875,6 +4878,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>    /* get_condition will prefer to generate LT and GT even if the old
>       comparison was against zero, so undo that canonicalization here since
>       comparisons against zero are cheaper.  */
> +
>    if (code == LT && op1 == const1_rtx)
>      code = LE, op1 = const0_rtx;
>    else if (code == GT && op1 == constm1_rtx)
> @@ -4925,18 +4929,10 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>  			    OPTAB_WIDEN, &comparison, &cmpmode);
>  	  if (comparison)
>  	    {
> -	      class expand_operand ops[4];
> -
> -	      create_output_operand (&ops[0], target, mode);
> -	      create_fixed_operand (&ops[1], comparison);
> -	      create_input_operand (&ops[2], op2, mode);
> -	      create_input_operand (&ops[3], op3, mode);
> -	      if (maybe_expand_insn (icode, 4, ops))
> -		{
> -		  if (ops[0].value != target)
> -		    convert_move (target, ops[0].value, false);
> -		  return target;
> -		}
> +	       rtx res = emit_conditional_move (target, comparison,
> +						op2, op3, mode);
> +	       if (res != NULL_RTX)
> +		 return res;
>  	    }
>  	  delete_insns_since (last);
>  	  restore_pending_stack_adjust (&save);
> @@ -4959,6 +4955,73 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
>      }
>  }
>  
> +/* Helper function that, in addition to COMPARISON, also tries
> +   the reversed REV_COMPARISON with swapped OP2 and OP3.  As opposed
> +   to when we pass the specific constituents of a comparison, no
> +   additional insns are emitted for it.  It might still be necessary
> +   to emit more than one insn for the final conditional move, though.  */
> +
> +rtx
> +emit_conditional_move (rtx target, rtx comparison, rtx rev_comparison,
> +		       rtx op2, rtx op3, machine_mode mode)
> +{
> +  rtx res = emit_conditional_move (target, comparison, op2, op3, mode);
> +
> +  if (res != NULL_RTX)
> +    return res;
> +
> +  return emit_conditional_move (target, rev_comparison, op3, op2, mode);
> +}
> +
> +/* Helper for emitting a conditional move.  */
> +
> +static rtx
> +emit_conditional_move (rtx target, rtx comparison,
> +		       rtx op2, rtx op3, machine_mode mode)

I think it'd be better to call one of these functions something else,
rather than make the interpretation of the third parameter depend on
the total number of parameters.  In the second overload, the comparison
rtx effectively replaces four parameters of the existing
emit_conditional_move, so perhaps that's the one that should remain
emit_conditional_move.  Maybe the first one should be called
emit_conditional_move_with_rev or something.

Part of me wonders if this would be simpler if we created a structure
to describe a comparison and passed that around instead of individual
fields, but I guess it could become a rat hole.

Thanks,
Richard

> +{
> +  enum insn_code icode;
> +
> +  if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> +    return NULL_RTX;
> +
> +  /* If the two source operands are identical, that's just a move.  */
> +  if (rtx_equal_p (op2, op3))
> +    {
> +      if (!target)
> +	target = gen_reg_rtx (mode);
> +
> +      emit_move_insn (target, op3);
> +      return target;
> +    }
> +
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op2);
> +
> +  icode = direct_optab_handler (movcc_optab, mode);
> +
> +  if (icode == CODE_FOR_nothing)
> +    return NULL_RTX;
> +
> +  if (!target)
> +    target = gen_reg_rtx (mode);
> +
> +  class expand_operand ops[4];
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_fixed_operand (&ops[1], comparison);
> +  create_input_operand (&ops[2], op2, mode);
> +  create_input_operand (&ops[3], op3, mode);
> +
> +  if (maybe_expand_insn (icode, 4, ops))
> +    {
> +      if (ops[0].value != target)
> +	convert_move (target, ops[0].value, false);
> +      return target;
> +    }
> +
> +  return NULL_RTX;
> +}
> +
>  
>  /* Emit a conditional negate or bitwise complement using the
>     negcc or notcc optabs if available.  Return NULL_RTX if such operations
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 3bbceff92d9..f853b93f37f 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -281,6 +281,7 @@ extern void emit_indirect_jump (rtx);
>  /* Emit a conditional move operation.  */
>  rtx emit_conditional_move (rtx, enum rtx_code, rtx, rtx, machine_mode,
>  			   rtx, rtx, machine_mode, int);
> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
>  
>  /* Emit a conditional negate or bitwise complement operation.  */
>  rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, rtx,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-11-05 15:33                   ` Richard Sandiford
@ 2021-11-12 13:00                     ` Robin Dapp
  2021-11-30 16:36                       ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2021-11-12 13:00 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

Hi Richard,

> It's hard to judge this in isolation because it's not clear when
> and how the new arguments are going to be used, but it seems OK
> in principle.  Do you still want:
> 
>   /* If earliest == jump, try to build the cmove insn directly.
>      This is helpful when combine has created some complex condition
>      (like for alpha's cmovlbs) that we can't hope to regenerate
>      through the normal interface.  */
> 
>   if (if_info->cond_earliest == if_info->jump)
>     {
> 
> to be used when cc_cmp and rev_cc_cmp are nonnull?

My initial hunch was to just leave it in place as I did not manage to
trigger it.  As it is going to be called and costed both ways (with
cc_cmp, rev_cc_cmp and without) it is probably better to move it into
the else branch.

The single usage of this is in patch 5/7.  We are passing the already
existing condition from the jump and its reverse to see if the backend
can come up with something better than when creating a new comparison.

>> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
>> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
> 
> This is redundant with the header file declaration.
> 

Removed it.

> I think it'd be better to call one of these functions something else,
> rather than make the interpretation of the third parameter depend on
> the total number of parameters.  In the second overload, the comparison
> rtx effectively replaces four parameters of the existing
> emit_conditional_move, so perhaps that's the one that should remain
> emit_conditional_move.  Maybe the first one should be called
> emit_conditional_move_with_rev or something.

Not entirely fond of calling the first one _with_rev because essentially
both try normal and reversed variants but I agree that the naming is not
ideal.  I don't have any great ideas how to properly untangle it so I
would go with your suggestions in order to move forward.  As there is
only one caller of the second function, we could also let the caller
handle the reversing.  Then, the third function would need to be
non-static, though.

The third, static emit_conditional_move I already renamed locally to
emit_conditional_move_1.

> Part of me wonders if this would be simpler if we created a structure
> to describe a comparison and passed that around instead of individual
> fields, but I guess it could become a rat hole.

I also thought about this as it would allow us to use either
representation as required by the usage site.  Even tried it in a branch
locally but indeed it became ugly quickly so I postponed it for now.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.
  2021-11-12 13:00                     ` Robin Dapp
@ 2021-11-30 16:36                       ` Richard Sandiford
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2021-11-30 16:36 UTC (permalink / raw)
  To: Robin Dapp; +Cc: gcc-patches

BTW, in response to your earlier concern about stage 3: you posted the
series well in time for end of stage 1, so I think it can still go in
during stage 3.

Robin Dapp <rdapp@linux.ibm.com> writes:
> Hi Richard,
>
>> It's hard to judge this in isolation because it's not clear when
>> and how the new arguments are going to be used, but it seems OK
>> in principle.  Do you still want:
>> 
>>   /* If earliest == jump, try to build the cmove insn directly.
>>      This is helpful when combine has created some complex condition
>>      (like for alpha's cmovlbs) that we can't hope to regenerate
>>      through the normal interface.  */
>> 
>>   if (if_info->cond_earliest == if_info->jump)
>>     {
>> 
>> to be used when cc_cmp and rev_cc_cmp are nonnull?
>
> My initial hunch was to just leave it in place as I did not manage to
> trigger it.  As it is going to be called and costed both ways (with
> cc_cmp, rev_cc_cmp and without) it is probably better to move it into
> the else branch.
>
> The single usage of this is in patch 5/7.  We are passing the already
> existing condition from the jump and its reverse to see if the backend
> can come up with something better than when creating a new comparison.
>
>>> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
>>> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
>> 
>> This is redundant with the header file declaration.
>> 
>
> Removed it.
>
>> I think it'd be better to call one of these functions something else,
>> rather than make the interpretation of the third parameter depend on
>> the total number of parameters.  In the second overload, the comparison
>> rtx effectively replaces four parameters of the existing
>> emit_conditional_move, so perhaps that's the one that should remain
>> emit_conditional_move.  Maybe the first one should be called
>> emit_conditional_move_with_rev or something.
>
> Not entirely fond of calling the first one _with_rev because essentially
> both try normal and reversed variants but I agree that the naming is not
> ideal.  I don't have any great ideas how to properly untangle it so I
> would go with your suggestions in order to move forward.  As there is
> only one caller of the second function, we could also let the caller
> handle the reversing.  Then, the third function would need to be
> non-static, though.
>
> The third, static emit_conditional_move I already renamed locally to
> emit_conditional_move_1.

Thanks, renaming the third function helps.

>> Part of me wonders if this would be simpler if we created a structure
>> to describe a comparison and passed that around instead of individual
>> fields, but I guess it could become a rat hole.
>
> I also thought about this as it would allow us to use either
> representation as required by the usage site.  Even tried it in a branch
> locally but indeed it became ugly quickly so I postponed it for now.

Still, perhaps we could at least add (in rtl.h):

struct rtx_comparison {
  rtx_code code;
  machine_mode op_mode;
  rtx op0, op1;
};

and make the existing emit_conditional_moves use it instead of four
separate parameters.  These rtx arguments would then be replacing those
rtx_comparison arguments, which would avoid the ambiguity in the overloads.

With C++ it should be possible to rewrite the calls using { … }, e.g.:

  if (!emit_conditional_move (into_target, { cmp_code, op1_mode, cmp1, cmp2 },
			      into_target, into_superword, word_mode, false))

so the new type wouldn't need to spread too far.

Does that sound OK?  If so, could you post the current version of full
patch series and say which bits still need review?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-11-30 16:36 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 16:08 [PATCH 0/7] ifcvt: Convert multiple Robin Dapp
2021-06-25 16:08 ` [PATCH 1/7] ifcvt: Check if cmovs are needed Robin Dapp
2021-07-15 20:10   ` Richard Sandiford
2021-07-22 12:06     ` Robin Dapp
2021-07-26 19:08       ` Richard Sandiford
2021-09-15  8:39         ` Robin Dapp
2021-10-14  8:45           ` Richard Sandiford
2021-10-14 14:20             ` Robin Dapp
2021-10-14 14:32               ` Richard Sandiford
2021-10-18 11:40                 ` Robin Dapp
2021-11-03  8:55                   ` Robin Dapp
2021-11-05 15:33                   ` Richard Sandiford
2021-11-12 13:00                     ` Robin Dapp
2021-11-30 16:36                       ` Richard Sandiford
2021-06-25 16:09 ` [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple Robin Dapp
2021-07-15 20:25   ` Richard Sandiford
2021-06-25 16:09 ` [PATCH 3/7] ifcvt: Improve costs handling " Robin Dapp
2021-07-15 20:42   ` Richard Sandiford
2021-07-22 12:07     ` Robin Dapp
2021-07-26 19:10       ` Richard Sandiford
2021-06-25 16:09 ` [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move Robin Dapp
2021-07-15 20:54   ` Richard Sandiford
2021-07-22 12:07     ` Robin Dapp
2021-07-26 19:31       ` Richard Sandiford
2021-07-27 20:49         ` Robin Dapp
2021-08-06 12:14           ` Richard Sandiford
2021-06-25 16:09 ` [PATCH 5/7] ifcvt: Try re-using CC for conditional moves Robin Dapp
2021-07-22 12:12   ` Robin Dapp
2021-06-25 16:09 ` [PATCH 6/7] testsuite/s390: Add tests for noce_convert_multiple Robin Dapp
2021-06-25 16:09 ` [PATCH 7/7] s390: Increase costs for load on condition and change movqicc expander Robin Dapp
2021-07-13 12:42 ` [PATCH 0/7] ifcvt: Convert multiple Robin Dapp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).