[PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
@ 2023-08-30 10:13 Manolis Tsamis
  2023-08-30 10:13 ` [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:13 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford,
	Manolis Tsamis

noce_convert_multiple_sets has been introduced and extended over time to handle
if conversion for blocks with multiple sets. Currently this is focused on
register moves and rejects any sort of arithmetic operations.

This series is an extension to allow more sequences to take part in if
conversion. The first patch is a required change to emit correct code and the
second patch whitelists a larger number of operations through
bb_ok_for_noce_convert_multiple_sets. The third patch adds support to rewire
multiple registers in noce_convert_multiple_sets_1 and refactors the code with
a new helper info struct. The fourth patch removes some old code that should
not be needed anymore.

For targets that have a rich selection of conditional instructions,
like aarch64, I have seen an ~5x increase of profitable if conversions for
multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
benchmarks and I have not seen performance regressions on either x64 / aarch64.

Some samples that previously resulted in a branch but now better use these
instructions can be seen in the provided test cases.

Bootstrapped and tested on AArch64 and x86-64.

Changes in v3:
        - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
        - Allow rewiring of multiple regs.
        - Refactor code with noce_multiple_sets_info.
        - Remove old code for subregs.

Manolis Tsamis (4):
  ifcvt: handle sequences that clobber flags in
    noce_convert_multiple_sets
  ifcvt: Allow more operations in multiple set if conversion
  ifcvt: Handle multiple rewired regs and refactor
    noce_convert_multiple_sets
  ifcvt: Remove obsolete code for subreg handling in
    noce_convert_multiple_sets

 gcc/ifcvt.cc                                  | 403 ++++++++----------
 gcc/ifcvt.h                                   |  16 +
 .../aarch64/ifcvt_multiple_sets_arithm.c      |  79 ++++
 .../aarch64/ifcvt_multiple_sets_rewire.c      |  20 +
 4 files changed, 290 insertions(+), 228 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
@ 2023-08-30 10:13 ` Manolis Tsamis
  2023-10-19 19:41   ` Richard Sandiford
  2023-08-30 10:13 ` [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:13 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford,
	Manolis Tsamis

This is an extension of what was done in PR106590.

Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).

This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. This makes it possible to include arithmetic operations
in noce_convert_multiple_sets.

gcc/ChangeLog:

	* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
	(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

(no changes since v1)

 gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..3273aeca125 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   return true;
 }
 
-/* Helper function for noce_convert_multiple_sets_1.  If store to
-   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
-
-static void
-check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
-{
-  rtx *p = (rtx *) p0;
-  if (p[0] == NULL_RTX)
-    return;
-  if (reg_overlap_mentioned_p (dest, p[0])
-      || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
-    p[0] = NULL_RTX;
-}
-
 /* This goes through all relevant insns of IF_INFO->then_bb and tries to
    create conditional moves.  In case a simple move sufficis the insn
    should be listed in NEED_NO_CMOV.  The rewired-src cases should be
@@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	 creating an additional compare for each.  If successful, costing
 	 is easier and this sequence is usually preferred.  */
       if (cc_cmp)
-	seq2 = try_emit_cmove_seq (if_info, temp, cond,
-				   new_val, old_val, need_cmov,
-				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+	{
+	  seq2 = try_emit_cmove_seq (if_info, temp, cond,
+				     new_val, old_val, need_cmov,
+				     &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+
+	  /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
+	     clobbered.  We can't safely use the sequence in this case.  */
+	  if (seq2 && (modified_in_p (cc_cmp, seq2)
+	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2))))
+	    seq2 = NULL;
+	}
 
       /* The backend might have created a sequence that uses the
 	 condition.  Check this.  */
@@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	  return false;
 	}
 
-      if (cc_cmp)
+      if (cc_cmp && seq == seq1)
 	{
-	  /* Check if SEQ can clobber registers mentioned in
-	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
-	     only seq1 from that point on.  */
-	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
-	  for (walk = seq; walk; walk = NEXT_INSN (walk))
+	  /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
+	     If yes, we need to use only seq1 from that point on.
+	     Only check when we use seq1 since we have already tested seq2.  */
+	  if (modified_in_p (cc_cmp, seq)
+	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
 	    {
-	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
-	      if (cc_cmp_pair[0] == NULL_RTX)
-		{
-		  cc_cmp = NULL_RTX;
-		  rev_cc_cmp = NULL_RTX;
-		  break;
-		}
+	      cc_cmp = NULL_RTX;
+	      rev_cc_cmp = NULL_RTX;
 	    }
 	}
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion
  2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  2023-08-30 10:13 ` [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
@ 2023-08-30 10:13 ` Manolis Tsamis
  2023-10-19 19:46   ` Richard Sandiford
  2023-08-30 10:13 ` [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets Manolis Tsamis
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:13 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford,
	Manolis Tsamis

Currently the operations allowed for if conversion of a basic block with
multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
bb_ok_for_noce_convert_multiple_sets).

This commit allows more operations (arithmetic, compare, etc) to participate
in if conversion. The target's profitability hook and ifcvt's costing is
expected to reject sequences that are unprofitable.

This is especially useful for targets which provide a rich selection of
conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
which are currently not used in basic blocks with more than a single set.

gcc/ChangeLog:

	* ifcvt.cc (try_emit_cmove_seq): Modify comments.
	(noce_convert_multiple_sets_1): Modify comments.
	(bb_ok_for_noce_convert_multiple_sets): Allow more operations.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

Changes in v3:
        - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
        - Allow rewiring of multiple regs.
        - Refactor code with noce_multiple_sets_info.
        - Remove old code for subregs.

 gcc/ifcvt.cc                                  | 63 ++++++++++-----
 .../aarch64/ifcvt_multiple_sets_arithm.c      | 79 +++++++++++++++++++
 2 files changed, 123 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 3273aeca125..efe8ab1577a 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
 /* We have something like:
 
      if (x > y)
-       { i = a; j = b; k = c; }
+       { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
 
    Make it:
 
-     tmp_i = (x > y) ? a : i;
-     tmp_j = (x > y) ? b : j;
-     tmp_k = (x > y) ? c : k;
+     tmp_i = (x > y) ? EXPR_A : i;
+     tmp_j = (x > y) ? EXPR_B : j;
+     tmp_k = (x > y) ? EXPR_C : k;
      i = tmp_i;
      j = tmp_j;
      k = tmp_k;
@@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 
 
 
-/* Return true iff basic block TEST_BB is comprised of only
-   (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  Also check that we have more than one set
-   (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+/* Return true iff basic block TEST_BB is suitable for conversion to a
+   series of conditional moves.  Also check that we have more than one
+   set (other routines can handle a single set better than we would),
+   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
    through the insns store the sum of their potential costs in COST.  */
 
 static bool
@@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
       rtx dest = SET_DEST (set);
       rtx src = SET_SRC (set);
 
-      /* We can possibly relax this, but for now only handle REG to REG
-	 (including subreg) moves.  This avoids any issues that might come
-	 from introducing loads/stores that might violate data-race-freedom
-	 guarantees.  */
-      if (!REG_P (dest))
+      /* Do not handle anything involving memory loads/stores since it might
+	 violate data-race-freedom guarantees.  */
+      if (!REG_P (dest) || contains_mem_rtx_p (src))
+	return false;
+
+      if (!SCALAR_INT_MODE_P (GET_MODE (src)))
 	return false;
 
-      if (!((REG_P (src) || CONSTANT_P (src))
-	    || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
-	      && subreg_lowpart_p (src))))
+      /* Allow a wide range of operations and let the costing function decide
+	 if the conversion is worth it later.  */
+      enum rtx_code code = GET_CODE (src);
+      if (!(CONSTANT_P (src)
+	    || code == REG
+	    || code == SUBREG
+	    || code == ZERO_EXTEND
+	    || code == SIGN_EXTEND
+	    || code == NOT
+	    || code == NEG
+	    || code == PLUS
+	    || code == MINUS
+	    || code == AND
+	    || code == IOR
+	    || code == MULT
+	    || code == ASHIFT
+	    || code == ASHIFTRT
+	    || code == NE
+	    || code == EQ
+	    || code == GE
+	    || code == GT
+	    || code == LE
+	    || code == LT
+	    || code == GEU
+	    || code == GTU
+	    || code == LEU
+	    || code == LTU
+	    || code == COMPARE))
 	return false;
 
-      /* Destination must be appropriate for a conditional write.  */
-      if (!noce_operand_ok (dest))
+      /* Destination and source must be appropriate.  */
+      if (!noce_operand_ok (dest) || !noce_operand_ok (src))
 	return false;
 
       /* We must be able to conditionally move in this mode.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
new file mode 100644
index 00000000000..d977f4d62ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ce1" } */
+
+void sink2(int, int);
+void sink3(int, int, int);
+
+void cond1(int cond, int x, int y)
+{
+  if (cond)
+    {
+      x = x << 4;
+      y = 1;
+    }
+
+  sink2(x, y);
+}
+
+void cond2(int cond, int x, int y)
+{
+  if (cond)
+    {
+      x++;
+      y++;
+    }
+
+  sink2(x, y);
+}
+
+void cond3(int cond, int x1, int x2, int x3)
+{
+  if (cond)
+    {
+      x1++;
+      x2++;
+      x3++;
+    }
+
+  sink3(x1, x2, x3);
+}
+
+void cond4(int cond, int x, int y)
+{
+  if (cond)
+    {
+      x += 2;
+      y += 3;
+    }
+
+  sink2(x, y);
+}
+
+void cond5(int cond, int x, int y, int r1, int r2)
+{
+  if (cond)
+    {
+      x = r1 + 2;
+      y = r2 - 34;
+    }
+
+  sink2(x, y);
+}
+
+void cond6(int cond, int x, int y)
+{
+  if (cond)
+    {
+      x = -x;
+      y = ~y;
+    }
+
+  sink2(x, y);
+}
+
+/* { dg-final { scan-assembler-times "cinc\t" 5 } } */
+/* { dg-final { scan-assembler-times "csneg\t" 1 } } */
+/* { dg-final { scan-assembler-times "csinv\t" 1 } } */
+/* { dg-final { scan-assembler "csel\t" } } */
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 6 "ce1" } } */
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets
  2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  2023-08-30 10:13 ` [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
  2023-08-30 10:13 ` [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
@ 2023-08-30 10:13 ` Manolis Tsamis
  2023-11-10 23:20   ` Jeff Law
  2023-08-30 10:14 ` [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets Manolis Tsamis
  2023-09-18  8:18 ` [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  4 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:13 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford,
	Manolis Tsamis

The existing implementation of need_cmov_or_rewire and
noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
This commit enchances them so they can handle/rewire arbitrary set statements.

To do that a new helper struct noce_multiple_sets_info is introduced which is
used by noce_convert_multiple_sets and its helper functions. This results in
cleaner function signatures, improved efficientcy (a number of vecs and hash
set/map are replaced with a single vec of struct) and simplicity.

gcc/ChangeLog:

	* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
	(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
	(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
	rewiring of multiple registers.
	(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
	* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
	noce_multiple_sets_info to store info for noce_convert_multiple_sets.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

(no changes since v1)

 gcc/ifcvt.cc                                  | 255 ++++++++----------
 gcc/ifcvt.h                                   |  16 ++
 .../aarch64/ifcvt_multiple_sets_rewire.c      |  20 ++
 3 files changed, 149 insertions(+), 142 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index efe8ab1577a..ecc0cbabef9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -98,14 +98,10 @@ static bool dead_or_predicable (basic_block, basic_block, basic_block,
 				edge, bool);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
-static void need_cmov_or_rewire (basic_block, hash_set<rtx_insn *> *,
-				 hash_map<rtx_insn *, int> *);
+static void init_noce_multiple_sets_info (basic_block,
+  auto_delete_vec<noce_multiple_sets_info> &);
 static bool noce_convert_multiple_sets_1 (struct noce_if_info *,
-					  hash_set<rtx_insn *> *,
-					  hash_map<rtx_insn *, int> *,
-					  auto_vec<rtx> *,
-					  auto_vec<rtx> *,
-					  auto_vec<rtx_insn *> *, int *);
+  auto_delete_vec<noce_multiple_sets_info> &, int *);
 \f
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3270,24 +3266,13 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
 
-  /* The true targets for a conditional move.  */
-  auto_vec<rtx> targets;
-  /* The temporaries introduced to allow us to not consider register
-     overlap.  */
-  auto_vec<rtx> temporaries;
-  /* The insns we've emitted.  */
-  auto_vec<rtx_insn *> unmodified_insns;
-
-  hash_set<rtx_insn *> need_no_cmov;
-  hash_map<rtx_insn *, int> rewired_src;
-
-  need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
+  auto_delete_vec<noce_multiple_sets_info> insn_info;
+  init_noce_multiple_sets_info (then_bb, insn_info);
 
   int last_needs_comparison = -1;
 
   bool ok = noce_convert_multiple_sets_1
-    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-     &unmodified_insns, &last_needs_comparison);
+    (if_info, insn_info, &last_needs_comparison);
   if (!ok)
       return false;
 
@@ -3302,8 +3287,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
       end_sequence ();
       start_sequence ();
       ok = noce_convert_multiple_sets_1
-	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-	 &unmodified_insns, &last_needs_comparison);
+	(if_info, insn_info, &last_needs_comparison);
       /* Actually we should not fail anymore if we reached here,
 	 but better still check.  */
       if (!ok)
@@ -3312,12 +3296,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   /* We must have seen some sort of insn to insert, otherwise we were
      given an empty BB to convert, and we can't handle that.  */
-  gcc_assert (!unmodified_insns.is_empty ());
+  gcc_assert (!insn_info.is_empty ());
 
   /* Now fixup the assignments.  */
-  for (unsigned i = 0; i < targets.length (); i++)
-    if (targets[i] != temporaries[i])
-      noce_emit_move_insn (targets[i], temporaries[i]);
+  for (unsigned i = 0; i < insn_info.length (); i++)
+    if (insn_info[i]->target != insn_info[i]->temporary)
+      noce_emit_move_insn (insn_info[i]->target, insn_info[i]->temporary);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
@@ -3332,10 +3316,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
     set_used_flags (insn);
 
   /* Mark all our temporaries and targets as used.  */
-  for (unsigned i = 0; i < targets.length (); i++)
+  for (unsigned i = 0; i < insn_info.length (); i++)
     {
-      set_used_flags (temporaries[i]);
-      set_used_flags (targets[i]);
+      set_used_flags (insn_info[i]->temporary);
+      set_used_flags (insn_info[i]->target);
     }
 
   set_used_flags (cond);
@@ -3354,7 +3338,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
       return false;
 
   emit_insn_before_setloc (seq, if_info->jump,
-			   INSN_LOCATION (unmodified_insns.last ()));
+			   INSN_LOCATION (insn_info.last ()->unmodified_insn));
 
   /* Clean up THEN_BB and the edges in and out of it.  */
   remove_edge (find_edge (test_bb, join_bb));
@@ -3375,20 +3359,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   return true;
 }
 
-/* This goes through all relevant insns of IF_INFO->then_bb and tries to
-   create conditional moves.  In case a simple move sufficis the insn
-   should be listed in NEED_NO_CMOV.  The rewired-src cases should be
-   specified via REWIRED_SRC.  TARGETS, TEMPORARIES and UNMODIFIED_INSNS
-   are specified and used in noce_convert_multiple_sets and should be passed
-   to this function..  */
+/* This goes through all relevant insns of IF_INFO->then_bb and tries to create
+   conditional moves.  Information for the insns is kept in INSN_INFO.  */
 
 static bool
 noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
-			      hash_set<rtx_insn *> *need_no_cmov,
-			      hash_map<rtx_insn *, int> *rewired_src,
-			      auto_vec<rtx> *targets,
-			      auto_vec<rtx> *temporaries,
-			      auto_vec<rtx_insn *> *unmodified_insns,
+			      auto_delete_vec<noce_multiple_sets_info> &insn_info,
 			      int *last_needs_comparison)
 {
   basic_block then_bb = if_info->then_bb;
@@ -3407,11 +3383,6 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 
   rtx_insn *insn;
   int count = 0;
-
-  targets->truncate (0);
-  temporaries->truncate (0);
-  unmodified_insns->truncate (0);
-
   bool second_try = *last_needs_comparison != -1;
 
   FOR_BB_INSNS (then_bb, insn)
@@ -3420,6 +3391,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       if (!active_insn_p (insn))
 	continue;
 
+      noce_multiple_sets_info *info = insn_info[count];
+
       rtx set = single_set (insn);
       gcc_checking_assert (set);
 
@@ -3427,9 +3400,12 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       rtx temp;
 
       rtx new_val = SET_SRC (set);
-      if (int *ii = rewired_src->get (insn))
-	new_val = simplify_replace_rtx (new_val, (*targets)[*ii],
-					(*temporaries)[*ii]);
+
+      int i, ii;
+      FOR_EACH_VEC_ELT (info->rewired_src, i, ii)
+	new_val = simplify_replace_rtx (new_val, insn_info[ii]->target,
+					insn_info[ii]->temporary);
+
       rtx old_val = target;
 
       /* As we are transforming
@@ -3467,11 +3443,6 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       else
 	temp = target;
 
-      /* We have identified swap-style idioms before.  A normal
-	 set will need to be a cmov while the first instruction of a swap-style
-	 idiom can be a regular move.  This helps with costing.  */
-      bool need_cmov = !need_no_cmov->contains (insn);
-
       /* If we had a non-canonical conditional jump (i.e. one where
 	 the fallthrough is to the "else" case) we need to reverse
 	 the conditional select.  */
@@ -3516,6 +3487,11 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
 	}
 
+      /* We have identified swap-style idioms before.  A normal
+	 set will need to be a cmov while the first instruction of a swap-style
+	 idiom can be a regular move.  This helps with costing.  */
+      bool need_cmov = info->need_cmov;
+
       /* Try emitting a conditional move passing the backend the
 	 canonicalized comparison.  The backend is then able to
 	 recognize expressions like
@@ -3621,9 +3597,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 
       /* Bookkeeping.  */
       count++;
-      targets->safe_push (target);
-      temporaries->safe_push (temp_dest);
-      unmodified_insns->safe_push (insn);
+
+      info->target = target;
+      info->temporary = temp_dest;
+      info->unmodified_insn = insn;
     }
 
   /* Even if we did not actually need the comparison, we want to make sure
@@ -3631,11 +3608,88 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
   if (*last_needs_comparison == -1)
     *last_needs_comparison = 0;
 
-
   return true;
 }
 
+/* Find local swap-style idioms in BB and mark the first insn (1)
+   that is only a temporary as not needing a conditional move as
+   it is going to be dead afterwards anyway.
+
+     (1) int tmp = a;
+	 a = b;
+	 b = tmp;
 
+	 ifcvt
+	 -->
+
+	 tmp = a;
+	 a = cond ? b : a_old;
+	 b = cond ? tmp : b_old;
+
+    Additionally, store the index of insns like (2) when a subsequent
+    SET reads from their destination.
+
+    (2) int c = a;
+	int d = c;
+
+	ifcvt
+	-->
+
+	c = cond ? a : c_old;
+	d = cond ? d : c;     // Need to use c rather than c_old here.
+*/
+
+static void
+init_noce_multiple_sets_info (basic_block bb,
+		     auto_delete_vec<noce_multiple_sets_info> &insn_info)
+{
+  rtx_insn *insn;
+  int count = 0;
+  auto_vec<rtx> dests;
+
+  /* Iterate over all SETs, storing the destinations
+     in DEST.
+     - If we hit a SET that reads from a destination
+       that we have seen before and the corresponding register
+       is dead afterwards, the register does not need to be
+       moved conditionally.
+     - If we encounter a previously changed register,
+       rewire the read to the original source.  */
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!active_insn_p (insn))
+	continue;
+
+      noce_multiple_sets_info *info = new noce_multiple_sets_info;
+      info->target = NULL_RTX;
+      info->temporary = NULL_RTX;
+      info->unmodified_insn = NULL;
+      info->need_cmov = true;
+      insn_info.safe_push (info);
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
+
+      /* Check if the current SET's source is the same
+	 as any previously seen destination.
+	 This is quadratic but the number of insns in BB
+	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */
+      for (int i = count - 1; i >= 0; --i)
+	if (reg_mentioned_p (dests[i], src))
+	  {
+	    if (find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
+	      insn_info[i]->need_cmov = false;
+	    else
+	      insn_info[count]->rewired_src.safe_push (i);
+	  }
+
+      dests.safe_push (dest);
+      count++;
+    }
+}
 
 /* Return true iff basic block TEST_BB is suitable for conversion to a
    series of conditional moves.  Also check that we have more than one
@@ -4135,89 +4189,6 @@ check_cond_move_block (basic_block bb,
   return true;
 }
 
-/* Find local swap-style idioms in BB and mark the first insn (1)
-   that is only a temporary as not needing a conditional move as
-   it is going to be dead afterwards anyway.
-
-     (1) int tmp = a;
-	 a = b;
-	 b = tmp;
-
-	 ifcvt
-	 -->
-
-	 tmp = a;
-	 a = cond ? b : a_old;
-	 b = cond ? tmp : b_old;
-
-    Additionally, store the index of insns like (2) when a subsequent
-    SET reads from their destination.
-
-    (2) int c = a;
-	int d = c;
-
-	ifcvt
-	-->
-
-	c = cond ? a : c_old;
-	d = cond ? d : c;     // Need to use c rather than c_old here.
-*/
-
-static void
-need_cmov_or_rewire (basic_block bb,
-		     hash_set<rtx_insn *> *need_no_cmov,
-		     hash_map<rtx_insn *, int> *rewired_src)
-{
-  rtx_insn *insn;
-  int count = 0;
-  auto_vec<rtx_insn *> insns;
-  auto_vec<rtx> dests;
-
-  /* Iterate over all SETs, storing the destinations
-     in DEST.
-     - If we hit a SET that reads from a destination
-       that we have seen before and the corresponding register
-       is dead afterwards, the register does not need to be
-       moved conditionally.
-     - If we encounter a previously changed register,
-       rewire the read to the original source.  */
-  FOR_BB_INSNS (bb, insn)
-    {
-      rtx set, src, dest;
-
-      if (!active_insn_p (insn))
-	continue;
-
-      set = single_set (insn);
-      if (set == NULL_RTX)
-	continue;
-
-      src = SET_SRC (set);
-      if (SUBREG_P (src))
-	src = SUBREG_REG (src);
-      dest = SET_DEST (set);
-
-      /* Check if the current SET's source is the same
-	 as any previously seen destination.
-	 This is quadratic but the number of insns in BB
-	 is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS.  */
-      if (REG_P (src))
-	for (int i = count - 1; i >= 0; --i)
-	  if (reg_overlap_mentioned_p (src, dests[i]))
-	    {
-	      if (find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
-		need_no_cmov->add (insns[i]);
-	      else
-		rewired_src->put (insn, i);
-	    }
-
-      insns.safe_push (insn);
-      dests.safe_push (dest);
-
-      count++;
-    }
-}
-
 /* Given a basic block BB suitable for conditional move conversion,
    a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
    the register values depending on COND, emit the insns in the block as
diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
index be1385aabe4..f953705f887 100644
--- a/gcc/ifcvt.h
+++ b/gcc/ifcvt.h
@@ -40,6 +40,22 @@ struct ce_if_block
   int pass;				/* Pass number.  */
 };
 
+struct noce_multiple_sets_info
+{
+  /* A list of indices to instructions that we need to rewire into this
+     instruction when we replace them with temporary conditional moves.  */
+  auto_vec<int> rewired_src;
+  /* The true targets for a conditional move.  */
+  rtx target;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  rtx temporary;
+  /* The insns we've emitted.  */
+  rtx_insn *unmodified_insn;
+  /* True if a simple move can be used instead of a conditional move.  */
+  bool need_cmov;
+};
+
 /* Used by noce_process_if_block to communicate with its subroutines.
 
    The subroutines know that A and B may be evaluated freely.  They
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c
new file mode 100644
index 00000000000..411874e96c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ce1" } */
+
+void sink2(int, int);
+
+void cond1(int cond, int x, int y, int z)
+{
+  if (x)
+    {
+      x = y + z;
+      y = z + x;
+    }
+
+  sink2(x, y);
+}
+
+/* { dg-final { scan-assembler-times "csel\tw0, w0, w1" 1 } } */
+/* { dg-final { scan-assembler-times "csel\tw1, w3, w2" 1 } } */
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 1 "ce1" } } */
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets
  2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
                   ` (2 preceding siblings ...)
  2023-08-30 10:13 ` [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets Manolis Tsamis
@ 2023-08-30 10:14 ` Manolis Tsamis
  2023-11-10 22:03   ` Jeff Law
  2023-09-18  8:18 ` [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  4 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford,
	Manolis Tsamis

This code used to handle register replacement issues with SUBREG before
simplify_replace_rtx was introduced. This should not be needed anymore as
new_val has the correct mode and that should be preserved by
simplify_replace_rtx.

gcc/ChangeLog:

	* ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

(no changes since v1)

 gcc/ifcvt.cc | 38 --------------------------------------
 1 file changed, 38 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index ecc0cbabef9..3b4b873612c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3449,44 +3449,6 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       if (if_info->then_else_reversed)
 	std::swap (old_val, new_val);
 
-
-      /* We allow simple lowpart register subreg SET sources in
-	 bb_ok_for_noce_convert_multiple_sets.  Be careful when processing
-	 sequences like:
-	 (set (reg:SI r1) (reg:SI r2))
-	 (set (reg:HI r3) (subreg:HI (r1)))
-	 For the second insn new_val or old_val (r1 in this example) will be
-	 taken from the temporaries and have the wider mode which will not
-	 match with the mode of the other source of the conditional move, so
-	 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
-	 Wrap the two cmove operands into subregs if appropriate to prevent
-	 that.  */
-
-      if (!CONSTANT_P (new_val)
-	  && GET_MODE (new_val) != GET_MODE (temp))
-	{
-	  machine_mode src_mode = GET_MODE (new_val);
-	  machine_mode dst_mode = GET_MODE (temp);
-	  if (!partial_subreg_p (dst_mode, src_mode))
-	    {
-	      end_sequence ();
-	      return false;
-	    }
-	  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
-	}
-      if (!CONSTANT_P (old_val)
-	  && GET_MODE (old_val) != GET_MODE (temp))
-	{
-	  machine_mode src_mode = GET_MODE (old_val);
-	  machine_mode dst_mode = GET_MODE (temp);
-	  if (!partial_subreg_p (dst_mode, src_mode))
-	    {
-	      end_sequence ();
-	      return false;
-	    }
-	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
-	}
-
       /* We have identified swap-style idioms before.  A normal
 	 set will need to be a cmov while the first instruction of a swap-style
 	 idiom can be a regular move.  This helps with costing.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
                   ` (3 preceding siblings ...)
  2023-08-30 10:14 ` [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets Manolis Tsamis
@ 2023-09-18  8:18 ` Manolis Tsamis
  2023-10-19  6:53   ` Manolis Tsamis
  4 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-09-18  8:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford

Kind ping for V3 of these ifcvt changes
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html

Thanks,
Manolis

On Wed, Aug 30, 2023 at 1:14 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
>
> noce_convert_multiple_sets has been introduced and extended over time to handle
> if conversion for blocks with multiple sets. Currently this is focused on
> register moves and rejects any sort of arithmetic operations.
>
> This series is an extension to allow more sequences to take part in if
> conversion. The first patch is a required change to emit correct code and the
> second patch whitelists a larger number of operations through
> bb_ok_for_noce_convert_multiple_sets. The third patch adds support to rewire
> multiple registers in noce_convert_multiple_sets_1 and refactors the code with
> a new helper info struct. The fourth patch removes some old code that should
> not be needed anymore.
>
> For targets that have a rich selection of conditional instructions,
> like aarch64, I have seen an ~5x increase of profitable if conversions for
> multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> benchmarks and I have not seen performance regressions on either x64 / aarch64.
>
> Some samples that previously resulted in a branch but now better use these
> instructions can be seen in the provided test cases.
>
> Bootstrapped and tested on AArch64 and x86-64.
>
>
> Changes in v3:
>         - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
>         - Allow rewiring of multiple regs.
>         - Refactor code with noce_multiple_sets_info.
>         - Remove old code for subregs.
>
> Manolis Tsamis (4):
>   ifcvt: handle sequences that clobber flags in
>     noce_convert_multiple_sets
>   ifcvt: Allow more operations in multiple set if conversion
>   ifcvt: Handle multiple rewired regs and refactor
>     noce_convert_multiple_sets
>   ifcvt: Remove obsolete code for subreg handling in
>     noce_convert_multiple_sets
>
>  gcc/ifcvt.cc                                  | 403 ++++++++----------
>  gcc/ifcvt.h                                   |  16 +
>  .../aarch64/ifcvt_multiple_sets_arithm.c      |  79 ++++
>  .../aarch64/ifcvt_multiple_sets_rewire.c      |  20 +
>  4 files changed, 290 insertions(+), 228 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-09-18  8:18 ` [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
@ 2023-10-19  6:53   ` Manolis Tsamis
  0 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2023-10-19  6:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Philipp Tomsich, Robin Dapp, Jakub Jelinek, Richard Sandiford

Hi all,

Pinging once more (
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html).

Manolis

On Mon, Sep 18, 2023 at 11:18 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> Kind ping for V3 of these ifcvt changes
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html
>
> Thanks,
> Manolis
>
> On Wed, Aug 30, 2023 at 1:14 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >
> >
> > noce_convert_multiple_sets has been introduced and extended over time to handle
> > if conversion for blocks with multiple sets. Currently this is focused on
> > register moves and rejects any sort of arithmetic operations.
> >
> > This series is an extension to allow more sequences to take part in if
> > conversion. The first patch is a required change to emit correct code and the
> > second patch whitelists a larger number of operations through
> > bb_ok_for_noce_convert_multiple_sets. The third patch adds support to rewire
> > multiple registers in noce_convert_multiple_sets_1 and refactors the code with
> > a new helper info struct. The fourth patch removes some old code that should
> > not be needed anymore.
> >
> > For targets that have a rich selection of conditional instructions,
> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> > benchmarks and I have not seen performance regressions on either x64 / aarch64.
> >
> > Some samples that previously resulted in a branch but now better use these
> > instructions can be seen in the provided test cases.
> >
> > Bootstrapped and tested on AArch64 and x86-64.
> >
> >
> > Changes in v3:
> >         - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
> >         - Allow rewiring of multiple regs.
> >         - Refactor code with noce_multiple_sets_info.
> >         - Remove old code for subregs.
> >
> > Manolis Tsamis (4):
> >   ifcvt: handle sequences that clobber flags in
> >     noce_convert_multiple_sets
> >   ifcvt: Allow more operations in multiple set if conversion
> >   ifcvt: Handle multiple rewired regs and refactor
> >     noce_convert_multiple_sets
> >   ifcvt: Remove obsolete code for subreg handling in
> >     noce_convert_multiple_sets
> >
> >  gcc/ifcvt.cc                                  | 403 ++++++++----------
> >  gcc/ifcvt.h                                   |  16 +
> >  .../aarch64/ifcvt_multiple_sets_arithm.c      |  79 ++++
> >  .../aarch64/ifcvt_multiple_sets_rewire.c      |  20 +
> >  4 files changed, 290 insertions(+), 228 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c
> >
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-08-30 10:13 ` [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
@ 2023-10-19 19:41   ` Richard Sandiford
  2023-10-20  7:04     ` Robin Dapp
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Richard Sandiford @ 2023-10-19 19:41 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: gcc-patches, Philipp Tomsich, Robin Dapp, Jakub Jelinek

Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> This is an extension of what was done in PR106590.
>
> Currently if a sequence generated in noce_convert_multiple_sets clobbers the
> condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
> (sequences that emit the comparison itself). Since this applies only from the
> next iteration it assumes that the sequences generated (in particular seq2)
> doesn't clobber the condition rtx itself before using it in the if_then_else,
> which is only true in specific cases (currently only register/subregister moves
> are allowed).
>
> This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
> the current iteration. This makes it possible to include arithmetic operations
> in noce_convert_multiple_sets.
>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
> 	(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> (no changes since v1)
>
>  gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
>  1 file changed, 19 insertions(+), 30 deletions(-)

Sorry for the slow review.  TBH I was hoping someone else would pick
it up, since (a) I'm not very familiar with this code, and (b) I don't
really agree with the way that the current code works.  I'm not sure the
current dependency checking is safe, so I'm nervous about adding even
more cases to it.  And it feels like the different ifcvt techniques are
not now well distinguished, so that they're beginning to overlap and
compete with one another.  None of that is your fault, of course. :)

> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index a0af553b9ff..3273aeca125 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>    return true;
>  }
>  
> -/* Helper function for noce_convert_multiple_sets_1.  If store to
> -   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
> -
> -static void
> -check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
> -{
> -  rtx *p = (rtx *) p0;
> -  if (p[0] == NULL_RTX)
> -    return;
> -  if (reg_overlap_mentioned_p (dest, p[0])
> -      || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
> -    p[0] = NULL_RTX;
> -}
> -
>  /* This goes through all relevant insns of IF_INFO->then_bb and tries to
>     create conditional moves.  In case a simple move sufficis the insn
>     should be listed in NEED_NO_CMOV.  The rewired-src cases should be
> @@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  	 creating an additional compare for each.  If successful, costing
>  	 is easier and this sequence is usually preferred.  */
>        if (cc_cmp)
> -	seq2 = try_emit_cmove_seq (if_info, temp, cond,
> -				   new_val, old_val, need_cmov,
> -				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> +	{
> +	  seq2 = try_emit_cmove_seq (if_info, temp, cond,
> +				     new_val, old_val, need_cmov,
> +				     &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> +
> +	  /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
> +	     clobbered.  We can't safely use the sequence in this case.  */
> +	  if (seq2 && (modified_in_p (cc_cmp, seq2)
> +	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2))))
> +	    seq2 = NULL;

modified_in_p only checks the first instruction in seq2, not the whole
sequence.

I think the unpatched approach is OK in cases where seq2 clobbers
cc_cmp/rev_cc_cmp in or after the last use, since instructions are
defined to operate on a read-all/compute/write-all basis.

Soon after the snippet above, the unpatched code has this loop:

      /* The backend might have created a sequence that uses the
	 condition.  Check this.  */
      rtx_insn *walk = seq2;
      while (walk)
	{
	  rtx set = single_set (walk);

	  if (!set || !SET_SRC (set))

This condition looks odd.  A SET_SRC is never null.  But more importantly,
continuing means "assume the best", and I don't think we should assume
the best for (say) an insn with two parallel sets.

It doesn't look like the series addresses this, but !set seems more
likely to occur if we extend the function to general operations.

	    {
	      walk = NEXT_INSN (walk);
	      continue;
	    }

	  rtx src = SET_SRC (set);

	  if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
	    ; /* We assume that this is the cmove created by the backend that
		 naturally uses the condition.  Therefore we ignore it.  */

XEXP (set, 1) is src.  But here too, I'm a bit nervous about assuming
that any if_then_else is safe to ignore, even currently.  It seems
more dangerous if we extend the function to handle more cases.

	  else
	    {
	      if (reg_mentioned_p (XEXP (cond, 0), src)
		  || reg_mentioned_p (XEXP (cond, 1), src))
		{
		  read_comparison = true;
		  break;
		}
	    }

	  walk = NEXT_INSN (walk);
	}

Maybe a safer way to check whether "insn" is sensitive to the values
in "val" is to use:

  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, val, NONCONST)
    if (OBJECT_P (*iter) && reg_overlap_mentioned_p (*iter, insn))
      return true;
  return false;

which doesn't assume that insn is a single set.

But I'm not sure which cases this code is trying to catch.  Is it trying
to catch cases where seq2 "spontaneously" uses registers that happen to
overlap with cond?  If so, then when does that happen?  And if it does
happen, wouldn't the sequence also have to set the registers first?

If instead the code is trying to detect uses of cond that were fed
to seq2 outside of cond itself, via try_emit_cmove_seq, then wouldn't
it be enough to check old_val and new_val?

Anyway, the point I was trying to get to was: we could use the
FOR_EACH_SUBRTX above to check whether an instruction in seq2
is sensitive to the value of cc_cmp/rev_cc_cmp.  And we can use
modified_in_p, as in your patch, to check whether an instruction
changes the value of cc_cmp/rev_cc_cmp.  We could therefore record
whether we've seen a modification of cc_cmp/rev_cc_cmp, then reject
seq2 if we see a subsequent use.

Thanks,
Richard

> +	}
>  
>        /* The backend might have created a sequence that uses the
>  	 condition.  Check this.  */
> @@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  	  return false;
>  	}
>  
> -      if (cc_cmp)
> +      if (cc_cmp && seq == seq1)
>  	{
> -	  /* Check if SEQ can clobber registers mentioned in
> -	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
> -	     only seq1 from that point on.  */
> -	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
> -	  for (walk = seq; walk; walk = NEXT_INSN (walk))
> +	  /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
> +	     If yes, we need to use only seq1 from that point on.
> +	     Only check when we use seq1 since we have already tested seq2.  */
> +	  if (modified_in_p (cc_cmp, seq)
> +	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
>  	    {
> -	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
> -	      if (cc_cmp_pair[0] == NULL_RTX)
> -		{
> -		  cc_cmp = NULL_RTX;
> -		  rev_cc_cmp = NULL_RTX;
> -		  break;
> -		}
> +	      cc_cmp = NULL_RTX;
> +	      rev_cc_cmp = NULL_RTX;
>  	    }
>  	}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion
  2023-08-30 10:13 ` [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
@ 2023-10-19 19:46   ` Richard Sandiford
  2023-11-10 21:53     ` Jeff Law
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Richard Sandiford @ 2023-10-19 19:46 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: gcc-patches, Philipp Tomsich, Robin Dapp, Jakub Jelinek

Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> Currently the operations allowed for if conversion of a basic block with
> multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
> bb_ok_for_noce_convert_multiple_sets).
>
> This commit allows more operations (arithmetic, compare, etc) to participate
> in if conversion. The target's profitability hook and ifcvt's costing is
> expected to reject sequences that are unprofitable.
>
> This is especially useful for targets which provide a rich selection of
> conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
> which are currently not used in basic blocks with more than a single set.
>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (try_emit_cmove_seq): Modify comments.
> 	(noce_convert_multiple_sets_1): Modify comments.
> 	(bb_ok_for_noce_convert_multiple_sets): Allow more operations.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> Changes in v3:
>         - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
>         - Allow rewiring of multiple regs.
>         - Refactor code with noce_multiple_sets_info.
>         - Remove old code for subregs.
>
>  gcc/ifcvt.cc                                  | 63 ++++++++++-----
>  .../aarch64/ifcvt_multiple_sets_arithm.c      | 79 +++++++++++++++++++
>  2 files changed, 123 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 3273aeca125..efe8ab1577a 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
>  /* We have something like:
>  
>       if (x > y)
> -       { i = a; j = b; k = c; }
> +       { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
>  
>     Make it:
>  
> -     tmp_i = (x > y) ? a : i;
> -     tmp_j = (x > y) ? b : j;
> -     tmp_k = (x > y) ? c : k;
> +     tmp_i = (x > y) ? EXPR_A : i;
> +     tmp_j = (x > y) ? EXPR_B : j;
> +     tmp_k = (x > y) ? EXPR_C : k;
>       i = tmp_i;
>       j = tmp_j;
>       k = tmp_k;
> @@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  
>  
>  
> -/* Return true iff basic block TEST_BB is comprised of only
> -   (SET (REG) (REG)) insns suitable for conversion to a series
> -   of conditional moves.  Also check that we have more than one set
> -   (other routines can handle a single set better than we would), and
> -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> +/* Return true iff basic block TEST_BB is suitable for conversion to a
> +   series of conditional moves.  Also check that we have more than one
> +   set (other routines can handle a single set better than we would),
> +   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
>     through the insns store the sum of their potential costs in COST.  */
>  
>  static bool
> @@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
>        rtx dest = SET_DEST (set);
>        rtx src = SET_SRC (set);
>  
> -      /* We can possibly relax this, but for now only handle REG to REG
> -	 (including subreg) moves.  This avoids any issues that might come
> -	 from introducing loads/stores that might violate data-race-freedom
> -	 guarantees.  */
> -      if (!REG_P (dest))
> +      /* Do not handle anything involving memory loads/stores since it might
> +	 violate data-race-freedom guarantees.  */
> +      if (!REG_P (dest) || contains_mem_rtx_p (src))
> +	return false;
> +
> +      if (!SCALAR_INT_MODE_P (GET_MODE (src)))
>  	return false;
>  
> -      if (!((REG_P (src) || CONSTANT_P (src))
> -	    || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> -	      && subreg_lowpart_p (src))))
> +      /* Allow a wide range of operations and let the costing function decide
> +	 if the conversion is worth it later.  */
> +      enum rtx_code code = GET_CODE (src);
> +      if (!(CONSTANT_P (src)
> +	    || code == REG
> +	    || code == SUBREG
> +	    || code == ZERO_EXTEND
> +	    || code == SIGN_EXTEND
> +	    || code == NOT
> +	    || code == NEG
> +	    || code == PLUS
> +	    || code == MINUS
> +	    || code == AND
> +	    || code == IOR
> +	    || code == MULT
> +	    || code == ASHIFT
> +	    || code == ASHIFTRT
> +	    || code == NE
> +	    || code == EQ
> +	    || code == GE
> +	    || code == GT
> +	    || code == LE
> +	    || code == LT
> +	    || code == GEU
> +	    || code == GTU
> +	    || code == LEU
> +	    || code == LTU
> +	    || code == COMPARE))
>  	return false;

I'm nervous about lists of operations like these, for two reasons:

(1) It isn't obvious what criteria are used to select the codes.

(2) It requires the top-level code to belong to a given set, but it
    allows subrtxes of src to be arbitrarily complex.  E.g. (to pick
    a silly example) a toplevel (popcount ...) would be rejected, but
    (plus (popcount ...) (const_int 1)) would be OK.

Could we just remove this condition instead?

Thanks,
Richard

>  
> -      /* Destination must be appropriate for a conditional write.  */
> -      if (!noce_operand_ok (dest))
> +      /* Destination and source must be appropriate.  */
> +      if (!noce_operand_ok (dest) || !noce_operand_ok (src))
>  	return false;
>  
>        /* We must be able to conditionally move in this mode.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> new file mode 100644
> index 00000000000..d977f4d62ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> @@ -0,0 +1,79 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> +
> +void sink2(int, int);
> +void sink3(int, int, int);
> +
> +void cond1(int cond, int x, int y)
> +{
> +  if (cond)
> +    {
> +      x = x << 4;
> +      y = 1;
> +    }
> +
> +  sink2(x, y);
> +}
> +
> +void cond2(int cond, int x, int y)
> +{
> +  if (cond)
> +    {
> +      x++;
> +      y++;
> +    }
> +
> +  sink2(x, y);
> +}
> +
> +void cond3(int cond, int x1, int x2, int x3)
> +{
> +  if (cond)
> +    {
> +      x1++;
> +      x2++;
> +      x3++;
> +    }
> +
> +  sink3(x1, x2, x3);
> +}
> +
> +void cond4(int cond, int x, int y)
> +{
> +  if (cond)
> +    {
> +      x += 2;
> +      y += 3;
> +    }
> +
> +  sink2(x, y);
> +}
> +
> +void cond5(int cond, int x, int y, int r1, int r2)
> +{
> +  if (cond)
> +    {
> +      x = r1 + 2;
> +      y = r2 - 34;
> +    }
> +
> +  sink2(x, y);
> +}
> +
> +void cond6(int cond, int x, int y)
> +{
> +  if (cond)
> +    {
> +      x = -x;
> +      y = ~y;
> +    }
> +
> +  sink2(x, y);
> +}
> +
> +/* { dg-final { scan-assembler-times "cinc\t" 5 } } */
> +/* { dg-final { scan-assembler-times "csneg\t" 1 } } */
> +/* { dg-final { scan-assembler-times "csinv\t" 1 } } */
> +/* { dg-final { scan-assembler "csel\t" } } */
> +
> +/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 6 "ce1" } } */
> \ No newline at end of file

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-19 19:41   ` Richard Sandiford
@ 2023-10-20  7:04     ` Robin Dapp
  2023-10-20  9:16       ` Richard Sandiford
  2023-11-10 21:31       ` Jeff Law
  2023-11-10 21:25     ` Jeff Law
  2024-04-23 10:58     ` Manolis Tsamis
  2 siblings, 2 replies; 24+ messages in thread
From: Robin Dapp @ 2023-10-20  7:04 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Jakub Jelinek,
	richard.sandiford
  Cc: rdapp.gcc

> Sorry for the slow review.  TBH I was hoping someone else would pick
> it up, since (a) I'm not very familiar with this code, and (b) I don't
> really agree with the way that the current code works.  I'm not sure the
> current dependency checking is safe, so I'm nervous about adding even
> more cases to it.  And it feels like the different ifcvt techniques are
> not now well distinguished, so that they're beginning to overlap and
> compete with one another.  None of that is your fault, of course. :)

I might be to blame, at least partially :)  The idea back then was to
do it here because (1) it can handle cases the other part cannot and
(2) its costing is based on sequence cost rather than just counting
instructions.

> This condition looks odd.  A SET_SRC is never null.  But more importantly,
> continuing means "assume the best", and I don't think we should assume
> the best for (say) an insn with two parallel sets.

IIRC I didn't consider two parallel sets at all :/

> But I'm not sure which cases this code is trying to catch.  Is it trying
> to catch cases where seq2 "spontaneously" uses registers that happen to
> overlap with cond?  If so, then when does that happen?  And if it does
> happen, wouldn't the sequence also have to set the registers first?

In order for sequence costing to be superior to just counting "conditional"
instructions we need to make sure that as few redundant instructions as
possible are present in the costed sequences. (redundant as in "will be
removed in a subsequent pass").

So, originally, the backend would always be passed a comparison
(set cc (cmp a b)).  On e.g. s390 this would result in at least two
redundant instructions per conditional move so costing was always wrong.
When passing the backend the comparison "result" e.g. (cmp cc 0)
there is no redundant instruction.

Now, passing the comparison is useful as well when we want to turn
a conditional move into a min/max in the backend.  This, however,
overwrites the initial condition result and any subsequent iterations
must not pass the result but the comparison itself to the backend.

In my first approach I hadn't considered several special cases which,
of course, complicated things further down the line.

All that said - maybe trying hard to get rid of later-redundant insns
is not a good approach after all?  A lot of complexity would go away
if we counted emitted conditional instructions just like the other
ifcvt part.  Maybe a middle ground would be to cost the initial
sequence as well as if + jmp and compare this against insn_cost of
only the created conditional insns?

In that case we might need to tighten/audit our preconditions in order
not to accidentally allow cases that result in strictly worse code.
But maybe that's an easier problem than what we are currently solving?

Regards
 Robin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-20  7:04     ` Robin Dapp
@ 2023-10-20  9:16       ` Richard Sandiford
  2023-11-10 21:48         ` Jeff Law
  2023-11-10 21:31       ` Jeff Law
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2023-10-20  9:16 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Manolis Tsamis, gcc-patches, Philipp Tomsich, Jakub Jelinek

Thanks for the context.

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Sorry for the slow review.  TBH I was hoping someone else would pick
>> it up, since (a) I'm not very familiar with this code, and (b) I don't
>> really agree with the way that the current code works.  I'm not sure the
>> current dependency checking is safe, so I'm nervous about adding even
>> more cases to it.  And it feels like the different ifcvt techniques are
>> not now well distinguished, so that they're beginning to overlap and
>> compete with one another.  None of that is your fault, of course. :)
>
> I might be to blame, at least partially :)  The idea back then was to
> do it here because (1) it can handle cases the other part cannot and
> (2) its costing is based on sequence cost rather than just counting
> instructions.

Ah, OK.  (2) seems like a good reason.

>> This condition looks odd.  A SET_SRC is never null.  But more importantly,
>> continuing means "assume the best", and I don't think we should assume
>> the best for (say) an insn with two parallel sets.
>
> IIRC I didn't consider two parallel sets at all :/
>
>> But I'm not sure which cases this code is trying to catch.  Is it trying
>> to catch cases where seq2 "spontaneously" uses registers that happen to
>> overlap with cond?  If so, then when does that happen?  And if it does
>> happen, wouldn't the sequence also have to set the registers first?
>
> In order for sequence costing to be superior to just counting "conditional"
> instructions we need to make sure that as few redundant instructions as
> possible are present in the costed sequences. (redundant as in "will be
> removed in a subsequent pass").
>  
> So, originally, the backend would always be passed a comparison
> (set cc (cmp a b)).  On e.g. s390 this would result in at least two
> redundant instructions per conditional move so costing was always wrong.
> When passing the backend the comparison "result" e.g. (cmp cc 0)
> there is no redundant instruction.
>
> Now, passing the comparison is useful as well when we want to turn
> a conditional move into a min/max in the backend.  This, however,
> overwrites the initial condition result and any subsequent iterations
> must not pass the result but the comparison itself to the backend.

Yeah, makes sense.  Using your example, there seem to be two things
that we're checking:

(1) Does the sequence change cc?  This is checked via:

      if (cc_cmp)
	{
	  /* Check if SEQ can clobber registers mentioned in
	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
	     only seq1 from that point on.  */
	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
	  for (walk = seq; walk; walk = NEXT_INSN (walk))
	    {
	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
	      if (cc_cmp_pair[0] == NULL_RTX)
		{
		  cc_cmp = NULL_RTX;
		  rev_cc_cmp = NULL_RTX;
		  break;
		}
	    }
	}

    and is the case that Manolis's patch is dealing with.

(2) Does the sequence use a and b?  If so, we need to use temporary
    destinations for any earlier writes to a and b.

Is that right?

(1) looks OK, although Manolis's modified_in_p would work too.
(2) is the code I quoted yesterday and is the part I'm not sure
about.  First of all:

      seq1 = try_emit_cmove_seq (if_info, temp, cond,
				 new_val, old_val, need_cmov,
				 &cost1, &temp_dest1);

must have a consistent view of what a and b are.  So old_val and new_val
cannot at this point reference "newer" values of a and b (as set by previous
instructions in the sequence).  AIUI:

      if (int *ii = rewired_src->get (insn))
	new_val = simplify_replace_rtx (new_val, (*targets)[*ii],
					(*temporaries)[*ii]);

is the code that ensures this.  If a previous write to a and b has been
replaced by a temporary, this code substitutes that temporary into new_val.

The same cond, new_val and old_val are used in:

	seq2 = try_emit_cmove_seq (if_info, temp, cond,
				   new_val, old_val, need_cmov,
				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);

So won't any use of a and b in seq2 to be from cond, rather than old_val
and new_val?  If so, shouldn't we set read_comparison for any use of a
and b, rather than skipping IF_THEN_ELSE?

> In my first approach I hadn't considered several special cases which,
> of course, complicated things further down the line.
>
> All that said - maybe trying hard to get rid of later-redundant insns
> is not a good approach after all?  A lot of complexity would go away
> if we counted emitted conditional instructions just like the other
> ifcvt part.  Maybe a middle ground would be to cost the initial
> sequence as well as if + jmp and compare this against insn_cost of
> only the created conditional insns?

Using seq_cost seems like the best way of costing things.  And I agree
that it's worth trying to avoid costing (and generating) redundant
instructions.

> In that case we might need to tighten/audit our preconditions in order
> not to accidentally allow cases that result in strictly worse code.
> But maybe that's an easier problem than what we are currently solving?

Not sure :)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-19 19:41   ` Richard Sandiford
  2023-10-20  7:04     ` Robin Dapp
@ 2023-11-10 21:25     ` Jeff Law
  2024-04-23 10:58     ` Manolis Tsamis
  2 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2023-11-10 21:25 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Robin Dapp,
	Jakub Jelinek, richard.sandiford



On 10/19/23 13:41, Richard Sandiford wrote:
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
>> This is an extension of what was done in PR106590.
>>
>> Currently if a sequence generated in noce_convert_multiple_sets clobbers the
>> condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
>> (sequences that emit the comparison itself). Since this applies only from the
>> next iteration it assumes that the sequences generated (in particular seq2)
>> doesn't clobber the condition rtx itself before using it in the if_then_else,
>> which is only true in specific cases (currently only register/subregister moves
>> are allowed).
>>
>> This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
>> the current iteration. This makes it possible to include arithmetic operations
>> in noce_convert_multiple_sets.
>>
>> gcc/ChangeLog:
>>
>> 	* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
>> 	(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
>>
>> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
>> ---
>>
>> (no changes since v1)
>>
>>   gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
>>   1 file changed, 19 insertions(+), 30 deletions(-)
> 
> Sorry for the slow review.  TBH I was hoping someone else would pick
> it up, since (a) I'm not very familiar with this code, and (b) I don't
> really agree with the way that the current code works.  I'm not sure the
> current dependency checking is safe, so I'm nervous about adding even
> more cases to it.  And it feels like the different ifcvt techniques are
> not now well distinguished, so that they're beginning to overlap and
> compete with one another.  None of that is your fault, of course. :)
I'd been hoping to get it it as well, particularly since I've got a TODO 
to sort out the conditional zero support in ifcvt.cc from various 
contibutors.  While there isn't any overlap I can see between that work 
and this submission from Manolis, it's close enough that if I'm going to 
get re-familiar with ifcvt.cc I figured I'd try to handle both.


jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-20  7:04     ` Robin Dapp
  2023-10-20  9:16       ` Richard Sandiford
@ 2023-11-10 21:31       ` Jeff Law
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Law @ 2023-11-10 21:31 UTC (permalink / raw)
  To: Robin Dapp, Manolis Tsamis, gcc-patches, Philipp Tomsich,
	Jakub Jelinek, richard.sandiford



On 10/20/23 01:04, Robin Dapp wrote:
> 
>> But I'm not sure which cases this code is trying to catch.  Is it trying
>> to catch cases where seq2 "spontaneously" uses registers that happen to
>> overlap with cond?  If so, then when does that happen?  And if it does
>> happen, wouldn't the sequence also have to set the registers first?
> 
> In order for sequence costing to be superior to just counting "conditional"
> instructions we need to make sure that as few redundant instructions as
> possible are present in the costed sequences. (redundant as in "will be
> removed in a subsequent pass").
[ ... ]
Sounds a lot like a scenario we had with threading.  Threading will 
often generate code in duplicated blocks that will trivially be eliminated.

IIRC I had Alex O. tackle that in threading several years back with good 
success.  I don't think he tried to be exhaustive, just sensible in what 
was likely to be dead after threading.  It helped numerous cases where 
we clearly should have threaded, but weren't because of artificially 
high costing.

It's not a trivial balancing act.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-20  9:16       ` Richard Sandiford
@ 2023-11-10 21:48         ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2023-11-10 21:48 UTC (permalink / raw)
  To: Robin Dapp, Manolis Tsamis, gcc-patches, Philipp Tomsich,
	Jakub Jelinek, richard.sandiford



On 10/20/23 03:16, Richard Sandiford wrote:
> Thanks for the context.
> 
> Robin Dapp <rdapp.gcc@gmail.com> writes:
>>> Sorry for the slow review.  TBH I was hoping someone else would pick
>>> it up, since (a) I'm not very familiar with this code, and (b) I don't
>>> really agree with the way that the current code works.  I'm not sure the
>>> current dependency checking is safe, so I'm nervous about adding even
>>> more cases to it.  And it feels like the different ifcvt techniques are
>>> not now well distinguished, so that they're beginning to overlap and
>>> compete with one another.  None of that is your fault, of course. :)
>>
>> I might be to blame, at least partially :)  The idea back then was to
>> do it here because (1) it can handle cases the other part cannot and
>> (2) its costing is based on sequence cost rather than just counting
>> instructions.
> 
> Ah, OK.  (2) seems like a good reason.
Agreed.  It's been a problem area (costing ifcvt), but it's still the 
right thing to do.  No doubt if we change something from counting insns 
to sequence costing it'll cause another set of problems, but that 
shouldn't stop us from doing the right thing here.


> 
> Yeah, makes sense.  Using your example, there seem to be two things
> that we're checking:
> 
> (1) Does the sequence change cc?  This is checked via:
> 
>        if (cc_cmp)
> 	{
> 	  /* Check if SEQ can clobber registers mentioned in
> 	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
> 	     only seq1 from that point on.  */
> 	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
> 	  for (walk = seq; walk; walk = NEXT_INSN (walk))
> 	    {
> 	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
> 	      if (cc_cmp_pair[0] == NULL_RTX)
> 		{
> 		  cc_cmp = NULL_RTX;
> 		  rev_cc_cmp = NULL_RTX;
> 		  break;
> 		}
> 	    }
> 	}
> 
>      and is the case that Manolis's patch is dealing with.
> 
> (2) Does the sequence use a and b?  If so, we need to use temporary
>      destinations for any earlier writes to a and b.
> 
> Is that right?
> 
> (1) looks OK, although Manolis's modified_in_p would work too.
Agreed.


> (2) is the code I quoted yesterday and is the part I'm not sure
> about.  First of all:
> 
>        seq1 = try_emit_cmove_seq (if_info, temp, cond,
> 				 new_val, old_val, need_cmov,
> 				 &cost1, &temp_dest1);
> 
> must have a consistent view of what a and b are.  So old_val and new_val
> cannot at this point reference "newer" values of a and b (as set by previous
> instructions in the sequence).  AIUI:
Sigh.  ifcvt seems to pervasively adjust arguments, then you have to 
figure out which one is the right one for any given context.  I was 
driving me nuts a couple weeks ago when I was looking at the condzero 
work.  It's part of why I set everything down at the time.  I ran into 
it in the VRULL code, Ventana's hacks on top of the VRULL code and in 
the ESWIN code, got frustrated and decided to look at something else for 
a bit (which has led to its own little rathole).





> 
> The same cond, new_val and old_val are used in:
> 
> 	seq2 = try_emit_cmove_seq (if_info, temp, cond,
> 				   new_val, old_val, need_cmov,
> 				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> 
> So won't any use of a and b in seq2 to be from cond, rather than old_val
> and new_val?  If so, shouldn't we set read_comparison for any use of a
> and b, rather than skipping IF_THEN_ELSE?
Seems like it to me, yes.



> 
> Using seq_cost seems like the best way of costing things.  And I agree
> that it's worth trying to avoid costing (and generating) redundant
> instructions.
I think there's general agreement on seq_cost.  I wonder if we should 
look to split that out on its own, then figure out what to do with the 
bigger issues in this space.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion
  2023-10-19 19:46   ` Richard Sandiford
@ 2023-11-10 21:53     ` Jeff Law
  2023-11-13 12:47     ` Manolis Tsamis
  2024-04-23 11:00     ` Manolis Tsamis
  2 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2023-11-10 21:53 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Robin Dapp,
	Jakub Jelinek, richard.sandiford



On 10/19/23 13:46, Richard Sandiford wrote:
>> +      /* Allow a wide range of operations and let the costing function decide
>> +	 if the conversion is worth it later.  */
>> +      enum rtx_code code = GET_CODE (src);
>> +      if (!(CONSTANT_P (src)
>> +	    || code == REG
>> +	    || code == SUBREG
>> +	    || code == ZERO_EXTEND
>> +	    || code == SIGN_EXTEND
>> +	    || code == NOT
>> +	    || code == NEG
>> +	    || code == PLUS
>> +	    || code == MINUS
>> +	    || code == AND
>> +	    || code == IOR
>> +	    || code == MULT
>> +	    || code == ASHIFT
>> +	    || code == ASHIFTRT
>> +	    || code == NE
>> +	    || code == EQ
>> +	    || code == GE
>> +	    || code == GT
>> +	    || code == LE
>> +	    || code == LT
>> +	    || code == GEU
>> +	    || code == GTU
>> +	    || code == LEU
>> +	    || code == LTU
>> +	    || code == COMPARE))
>>   	return false;
> 
> I'm nervous about lists of operations like these, for two reasons:
> 
> (1) It isn't obvious what criteria are used to select the codes.
> 
> (2) It requires the top-level code to belong to a given set, but it
>      allows subrtxes of src to be arbitrarily complex.  E.g. (to pick
>      a silly example) a toplevel (popcount ...) would be rejected, but
>      (plus (popcount ...) (const_int 1)) would be OK.
> 
> Could we just remove this condition instead?
I'd be all for that.    We've actually got a similar problem in Joern's 
ext-dce code that I'm working on.  At least in that case I think we'll 
be able to enumerate how/why things are on the list if we still need it 
after the cleanup phase.

So I think the guidance on patch #2 would be to remove the list entirely 
if we can.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets
  2023-08-30 10:14 ` [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets Manolis Tsamis
@ 2023-11-10 22:03   ` Jeff Law
  2023-11-13 12:43     ` Manolis Tsamis
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-11-10 22:03 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches; +Cc: Jakub Jelinek, Richard Sandiford



On 8/30/23 04:14, Manolis Tsamis wrote:
> This code used to handle register replacement issues with SUBREG before
> simplify_replace_rtx was introduced. This should not be needed anymore as
> new_val has the correct mode and that should be preserved by
> simplify_replace_rtx.
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.
So is it the case that this code is supposed to no longer be needed as a 
result of your kit or it is unnecessary independent of patches 1..3?  If 
the latter then it's OK for the trunk now.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets
  2023-08-30 10:13 ` [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets Manolis Tsamis
@ 2023-11-10 23:20   ` Jeff Law
  2023-11-13 12:40     ` Manolis Tsamis
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2023-11-10 23:20 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches; +Cc: Jakub Jelinek, Richard Sandiford



On 8/30/23 04:13, Manolis Tsamis wrote:
> The existing implementation of need_cmov_or_rewire and
> noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
> This commit enchances them so they can handle/rewire arbitrary set statements.
> 
> To do that a new helper struct noce_multiple_sets_info is introduced which is
> used by noce_convert_multiple_sets and its helper functions. This results in
> cleaner function signatures, improved efficientcy (a number of vecs and hash
> set/map are replaced with a single vec of struct) and simplicity.
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
> 	(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
> 	(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
> 	rewiring of multiple registers.
> 	(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
> 	* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
> 	noce_multiple_sets_info to store info for noce_convert_multiple_sets.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.
So this seems like (in theory) it could move forward independently.  The 
handling of arbitrary statements code wouldn't be exercised yet, but 
that's OK IMHO as I don't think anyone is fundamentally against trying 
to handle additional kinds of statements.

So my suggestion would be to bootstrap & regression test this 
independently.  AFAICT this should have no functional change if it were 
to go in on its own.  Note the testsuite entry might not be applicable 
if this were to go in on its own and would need to roll into another 
patch in the series.


Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets
  2023-11-10 23:20   ` Jeff Law
@ 2023-11-13 12:40     ` Manolis Tsamis
  2023-11-21 18:10       ` Manolis Tsamis
  0 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-11-13 12:40 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Jakub Jelinek, Richard Sandiford

Hi Jeff,

Indeed, that sounds like a good idea. I will make this separate and
send it after the required testing.
I'll see what can be done about a testcase.

Best,
Manolis

On Sat, Nov 11, 2023 at 1:20 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/30/23 04:13, Manolis Tsamis wrote:
> > The existing implementation of need_cmov_or_rewire and
> > noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
> > This commit enchances them so they can handle/rewire arbitrary set statements.
> >
> > To do that a new helper struct noce_multiple_sets_info is introduced which is
> > used by noce_convert_multiple_sets and its helper functions. This results in
> > cleaner function signatures, improved efficientcy (a number of vecs and hash
> > set/map are replaced with a single vec of struct) and simplicity.
> >
> > gcc/ChangeLog:
> >
> >       * ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
> >       (init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
> >       (noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
> >       rewiring of multiple registers.
> >       (noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
> >       * ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
> >       noce_multiple_sets_info to store info for noce_convert_multiple_sets.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.
> So this seems like (in theory) it could move forward independently.  The
> handling of arbitrary statements code wouldn't be exercised yet, but
> that's OK IMHO as I don't think anyone is fundamentally against trying
> to handle additional kinds of statements.
>
> So my suggestion would be to bootstrap & regression test this
> independently.  AFAICT this should have no functional change if it were
> to go in on its own.  Note the testsuite entry might not be applicable
> if this were to go in on its own and would need to roll into another
> patch in the series.
>
>
> Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets
  2023-11-10 22:03   ` Jeff Law
@ 2023-11-13 12:43     ` Manolis Tsamis
  2023-11-21 18:08       ` Manolis Tsamis
  0 siblings, 1 reply; 24+ messages in thread
From: Manolis Tsamis @ 2023-11-13 12:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Jakub Jelinek, Richard Sandiford

Yes, my finding back then was that this is leftover code from the
initial implementation, nothing to do with the rest of the changes.
I will first re-evaluate this and test it separately from the other
series. If all is good I'll let you know so we can proceed.

Manolis

On Sat, Nov 11, 2023 at 12:03 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/30/23 04:14, Manolis Tsamis wrote:
> > This code used to handle register replacement issues with SUBREG before
> > simplify_replace_rtx was introduced. This should not be needed anymore as
> > new_val has the correct mode and that should be preserved by
> > simplify_replace_rtx.
> >
> > gcc/ChangeLog:
> >
> >       * ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.
> So is it the case that this code is supposed to no longer be needed as a
> result of your kit or it is unnecessary independent of patches 1..3?  If
> the latter then it's OK for the trunk now.
>
> Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion
  2023-10-19 19:46   ` Richard Sandiford
  2023-11-10 21:53     ` Jeff Law
@ 2023-11-13 12:47     ` Manolis Tsamis
  2024-04-23 11:00     ` Manolis Tsamis
  2 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2023-11-13 12:47 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Robin Dapp,
	Jakub Jelinek, richard.sandiford

On Thu, Oct 19, 2023 at 10:46 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > Currently the operations allowed for if conversion of a basic block with
> > multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
> > bb_ok_for_noce_convert_multiple_sets).
> >
> > This commit allows more operations (arithmetic, compare, etc) to participate
> > in if conversion. The target's profitability hook and ifcvt's costing is
> > expected to reject sequences that are unprofitable.
> >
> > This is especially useful for targets which provide a rich selection of
> > conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
> > which are currently not used in basic blocks with more than a single set.
> >
> > gcc/ChangeLog:
> >
> >       * ifcvt.cc (try_emit_cmove_seq): Modify comments.
> >       (noce_convert_multiple_sets_1): Modify comments.
> >       (bb_ok_for_noce_convert_multiple_sets): Allow more operations.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> > Changes in v3:
> >         - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
> >         - Allow rewiring of multiple regs.
> >         - Refactor code with noce_multiple_sets_info.
> >         - Remove old code for subregs.
> >
> >  gcc/ifcvt.cc                                  | 63 ++++++++++-----
> >  .../aarch64/ifcvt_multiple_sets_arithm.c      | 79 +++++++++++++++++++
> >  2 files changed, 123 insertions(+), 19 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> >
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > index 3273aeca125..efe8ab1577a 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
> >  /* We have something like:
> >
> >       if (x > y)
> > -       { i = a; j = b; k = c; }
> > +       { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
> >
> >     Make it:
> >
> > -     tmp_i = (x > y) ? a : i;
> > -     tmp_j = (x > y) ? b : j;
> > -     tmp_k = (x > y) ? c : k;
> > +     tmp_i = (x > y) ? EXPR_A : i;
> > +     tmp_j = (x > y) ? EXPR_B : j;
> > +     tmp_k = (x > y) ? EXPR_C : k;
> >       i = tmp_i;
> >       j = tmp_j;
> >       k = tmp_k;
> > @@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
> >
> >
> >
> > -/* Return true iff basic block TEST_BB is comprised of only
> > -   (SET (REG) (REG)) insns suitable for conversion to a series
> > -   of conditional moves.  Also check that we have more than one set
> > -   (other routines can handle a single set better than we would), and
> > -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> > +/* Return true iff basic block TEST_BB is suitable for conversion to a
> > +   series of conditional moves.  Also check that we have more than one
> > +   set (other routines can handle a single set better than we would),
> > +   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> >     through the insns store the sum of their potential costs in COST.  */
> >
> >  static bool
> > @@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
> >        rtx dest = SET_DEST (set);
> >        rtx src = SET_SRC (set);
> >
> > -      /* We can possibly relax this, but for now only handle REG to REG
> > -      (including subreg) moves.  This avoids any issues that might come
> > -      from introducing loads/stores that might violate data-race-freedom
> > -      guarantees.  */
> > -      if (!REG_P (dest))
> > +      /* Do not handle anything involving memory loads/stores since it might
> > +      violate data-race-freedom guarantees.  */
> > +      if (!REG_P (dest) || contains_mem_rtx_p (src))
> > +     return false;
> > +
> > +      if (!SCALAR_INT_MODE_P (GET_MODE (src)))
> >       return false;
> >
> > -      if (!((REG_P (src) || CONSTANT_P (src))
> > -         || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> > -           && subreg_lowpart_p (src))))
> > +      /* Allow a wide range of operations and let the costing function decide
> > +      if the conversion is worth it later.  */
> > +      enum rtx_code code = GET_CODE (src);
> > +      if (!(CONSTANT_P (src)
> > +         || code == REG
> > +         || code == SUBREG
> > +         || code == ZERO_EXTEND
> > +         || code == SIGN_EXTEND
> > +         || code == NOT
> > +         || code == NEG
> > +         || code == PLUS
> > +         || code == MINUS
> > +         || code == AND
> > +         || code == IOR
> > +         || code == MULT
> > +         || code == ASHIFT
> > +         || code == ASHIFTRT
> > +         || code == NE
> > +         || code == EQ
> > +         || code == GE
> > +         || code == GT
> > +         || code == LE
> > +         || code == LT
> > +         || code == GEU
> > +         || code == GTU
> > +         || code == LEU
> > +         || code == LTU
> > +         || code == COMPARE))
> >       return false;
>
> I'm nervous about lists of operations like these, for two reasons:
>
> (1) It isn't obvious what criteria are used to select the codes.

Indeed; I was hoping to find some existing GCC predicate that accepts
'common binary ops' or similar, in order to avoid complicated
situations. But it looks like this is mostly unnecessary.
I'll try removing this list and see if there are any issues.

>
> (2) It requires the top-level code to belong to a given set, but it
>     allows subrtxes of src to be arbitrarily complex.  E.g. (to pick
>     a silly example) a toplevel (popcount ...) would be rejected, but
>     (plus (popcount ...) (const_int 1)) would be OK.
>

I have completely missed that, so thanks for mentioning.

> Could we just remove this condition instead?
>

So, yes, will do.

Thanks and sorry for the late reply,
Manolis


> Thanks,
> Richard
>
> >
> > -      /* Destination must be appropriate for a conditional write.  */
> > -      if (!noce_operand_ok (dest))
> > +      /* Destination and source must be appropriate.  */
> > +      if (!noce_operand_ok (dest) || !noce_operand_ok (src))
> >       return false;
> >
> >        /* We must be able to conditionally move in this mode.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> > new file mode 100644
> > index 00000000000..d977f4d62ec
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> > @@ -0,0 +1,79 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> > +
> > +void sink2(int, int);
> > +void sink3(int, int, int);
> > +
> > +void cond1(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x = x << 4;
> > +      y = 1;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond2(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x++;
> > +      y++;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond3(int cond, int x1, int x2, int x3)
> > +{
> > +  if (cond)
> > +    {
> > +      x1++;
> > +      x2++;
> > +      x3++;
> > +    }
> > +
> > +  sink3(x1, x2, x3);
> > +}
> > +
> > +void cond4(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x += 2;
> > +      y += 3;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond5(int cond, int x, int y, int r1, int r2)
> > +{
> > +  if (cond)
> > +    {
> > +      x = r1 + 2;
> > +      y = r2 - 34;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond6(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x = -x;
> > +      y = ~y;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "cinc\t" 5 } } */
> > +/* { dg-final { scan-assembler-times "csneg\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "csinv\t" 1 } } */
> > +/* { dg-final { scan-assembler "csel\t" } } */
> > +
> > +/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 6 "ce1" } } */
> > \ No newline at end of file

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets
  2023-11-13 12:43     ` Manolis Tsamis
@ 2023-11-21 18:08       ` Manolis Tsamis
  0 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2023-11-21 18:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Jakub Jelinek, Richard Sandiford

Retested, made independent of the rest of the series and submitted as
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637662.html.

Manolis

On Mon, Nov 13, 2023 at 2:43 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> Yes, my finding back then was that this is leftover code from the
> initial implementation, nothing to do with the rest of the changes.
> I will first re-evaluate this and test it separately from the other
> series. If all is good I'll let you know so we can proceed.
>
> Manolis
>
> On Sat, Nov 11, 2023 at 12:03 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 8/30/23 04:14, Manolis Tsamis wrote:
> > > This code used to handle register replacement issues with SUBREG before
> > > simplify_replace_rtx was introduced. This should not be needed anymore as
> > > new_val has the correct mode and that should be preserved by
> > > simplify_replace_rtx.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.
> > So is it the case that this code is supposed to no longer be needed as a
> > result of your kit or it is unnecessary independent of patches 1..3?  If
> > the latter then it's OK for the trunk now.
> >
> > Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets
  2023-11-13 12:40     ` Manolis Tsamis
@ 2023-11-21 18:10       ` Manolis Tsamis
  0 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2023-11-21 18:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Jakub Jelinek, Richard Sandiford

I have made this independent from the rest of the series, cleaned up
some comments and moved some code to its original position.
Re-submitted as
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637660.html.

Manolis

On Mon, Nov 13, 2023 at 2:40 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> Hi Jeff,
>
> Indeed, that sounds like a good idea. I will make this separate and
> send it after the required testing.
> I'll see what can be done about a testcase.
>
> Best,
> Manolis
>
> On Sat, Nov 11, 2023 at 1:20 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 8/30/23 04:13, Manolis Tsamis wrote:
> > > The existing implementation of need_cmov_or_rewire and
> > > noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
> > > This commit enchances them so they can handle/rewire arbitrary set statements.
> > >
> > > To do that a new helper struct noce_multiple_sets_info is introduced which is
> > > used by noce_convert_multiple_sets and its helper functions. This results in
> > > cleaner function signatures, improved efficientcy (a number of vecs and hash
> > > set/map are replaced with a single vec of struct) and simplicity.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
> > >       (init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
> > >       (noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
> > >       rewiring of multiple registers.
> > >       (noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
> > >       * ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
> > >       noce_multiple_sets_info to store info for noce_convert_multiple_sets.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.
> > So this seems like (in theory) it could move forward independently.  The
> > handling of arbitrary statements code wouldn't be exercised yet, but
> > that's OK IMHO as I don't think anyone is fundamentally against trying
> > to handle additional kinds of statements.
> >
> > So my suggestion would be to bootstrap & regression test this
> > independently.  AFAICT this should have no functional change if it were
> > to go in on its own.  Note the testsuite entry might not be applicable
> > if this were to go in on its own and would need to roll into another
> > patch in the series.
> >
> >
> > Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-10-19 19:41   ` Richard Sandiford
  2023-10-20  7:04     ` Robin Dapp
  2023-11-10 21:25     ` Jeff Law
@ 2024-04-23 10:58     ` Manolis Tsamis
  2 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2024-04-23 10:58 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Robin Dapp,
	Jakub Jelinek, richard.sandiford

On Thu, Oct 19, 2023 at 10:41 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > This is an extension of what was done in PR106590.
> >
> > Currently if a sequence generated in noce_convert_multiple_sets clobbers the
> > condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
> > (sequences that emit the comparison itself). Since this applies only from the
> > next iteration it assumes that the sequences generated (in particular seq2)
> > doesn't clobber the condition rtx itself before using it in the if_then_else,
> > which is only true in specific cases (currently only register/subregister moves
> > are allowed).
> >
> > This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
> > the current iteration. This makes it possible to include arithmetic operations
> > in noce_convert_multiple_sets.
> >
> > gcc/ChangeLog:
> >
> >       * ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
> >       (noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> > (no changes since v1)
> >
> >  gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
> >  1 file changed, 19 insertions(+), 30 deletions(-)
>
> Sorry for the slow review.  TBH I was hoping someone else would pick
> it up, since (a) I'm not very familiar with this code, and (b) I don't
> really agree with the way that the current code works.  I'm not sure the
> current dependency checking is safe, so I'm nervous about adding even
> more cases to it.  And it feels like the different ifcvt techniques are
> not now well distinguished, so that they're beginning to overlap and
> compete with one another.  None of that is your fault, of course. :)
>
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > index a0af553b9ff..3273aeca125 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
> >    return true;
> >  }
> >
> > -/* Helper function for noce_convert_multiple_sets_1.  If store to
> > -   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
> > -
> > -static void
> > -check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
> > -{
> > -  rtx *p = (rtx *) p0;
> > -  if (p[0] == NULL_RTX)
> > -    return;
> > -  if (reg_overlap_mentioned_p (dest, p[0])
> > -      || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
> > -    p[0] = NULL_RTX;
> > -}
> > -
> >  /* This goes through all relevant insns of IF_INFO->then_bb and tries to
> >     create conditional moves.  In case a simple move sufficis the insn
> >     should be listed in NEED_NO_CMOV.  The rewired-src cases should be
> > @@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
> >        creating an additional compare for each.  If successful, costing
> >        is easier and this sequence is usually preferred.  */
> >        if (cc_cmp)
> > -     seq2 = try_emit_cmove_seq (if_info, temp, cond,
> > -                                new_val, old_val, need_cmov,
> > -                                &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> > +     {
> > +       seq2 = try_emit_cmove_seq (if_info, temp, cond,
> > +                                  new_val, old_val, need_cmov,
> > +                                  &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> > +
> > +       /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
> > +          clobbered.  We can't safely use the sequence in this case.  */
> > +       if (seq2 && (modified_in_p (cc_cmp, seq2)
> > +           || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2))))
> > +         seq2 = NULL;
>
> modified_in_p only checks the first instruction in seq2, not the whole
> sequence.
>
> I think the unpatched approach is OK in cases where seq2 clobbers
> cc_cmp/rev_cc_cmp in or after the last use, since instructions are
> defined to operate on a read-all/compute/write-all basis.
>
> Soon after the snippet above, the unpatched code has this loop:
>
>       /* The backend might have created a sequence that uses the
>          condition.  Check this.  */
>       rtx_insn *walk = seq2;
>       while (walk)
>         {
>           rtx set = single_set (walk);
>
>           if (!set || !SET_SRC (set))
>
> This condition looks odd.  A SET_SRC is never null.  But more importantly,
> continuing means "assume the best", and I don't think we should assume
> the best for (say) an insn with two parallel sets.
>
> It doesn't look like the series addresses this, but !set seems more
> likely to occur if we extend the function to general operations.
>
>             {
>               walk = NEXT_INSN (walk);
>               continue;
>             }
>
>           rtx src = SET_SRC (set);
>
>           if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
>             ; /* We assume that this is the cmove created by the backend that
>                  naturally uses the condition.  Therefore we ignore it.  */
>
> XEXP (set, 1) is src.  But here too, I'm a bit nervous about assuming
> that any if_then_else is safe to ignore, even currently.  It seems
> more dangerous if we extend the function to handle more cases.
>
>           else
>             {
>               if (reg_mentioned_p (XEXP (cond, 0), src)
>                   || reg_mentioned_p (XEXP (cond, 1), src))
>                 {
>                   read_comparison = true;
>                   break;
>                 }
>             }
>
>           walk = NEXT_INSN (walk);
>         }
>
> Maybe a safer way to check whether "insn" is sensitive to the values
> in "val" is to use:
>
>   subrtx_iterator::array_type array;
>   FOR_EACH_SUBRTX (iter, array, val, NONCONST)
>     if (OBJECT_P (*iter) && reg_overlap_mentioned_p (*iter, insn))
>       return true;
>   return false;
>
> which doesn't assume that insn is a single set.
>
> But I'm not sure which cases this code is trying to catch.  Is it trying
> to catch cases where seq2 "spontaneously" uses registers that happen to
> overlap with cond?  If so, then when does that happen?  And if it does
> happen, wouldn't the sequence also have to set the registers first?
>
> If instead the code is trying to detect uses of cond that were fed
> to seq2 outside of cond itself, via try_emit_cmove_seq, then wouldn't
> it be enough to check old_val and new_val?
>
> Anyway, the point I was trying to get to was: we could use the
> FOR_EACH_SUBRTX above to check whether an instruction in seq2
> is sensitive to the value of cc_cmp/rev_cc_cmp.  And we can use
> modified_in_p, as in your patch, to check whether an instruction
> changes the value of cc_cmp/rev_cc_cmp.  We could therefore record
> whether we've seen a modification of cc_cmp/rev_cc_cmp, then reject
> seq2 if we see a subsequent use.
>
Hi Richard,

I have re-implemented this code based on all your suggestions and
resubmitted it as an RFC:
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649898.html
The code is a bit more complicated but it handles a lot more cases.

On a side note, I also observed that the original code was sometimes
setting read_comparison = true more conservatively than needed due to
a limitation of reg_mentioned_p.
In the expression reg_mentioned_p (XEXP (cond, 1), src) both XEXP
(cond, 1) and src can be (const_int 1) and reg_mentioned_p will return
true, although it should return false. This is because it has a check

if (reg == in)
  return true;

Which will return true. If we look at reg_overlap_mentioned_p instead,
it has a check

if (CONSTANT_P (in))
  return false;

and I believe we need something similar for reg_mentioned_p.

Thanks,
Manolis

> Thanks,
> Richard
>
> > +     }
> >
> >        /* The backend might have created a sequence that uses the
> >        condition.  Check this.  */
> > @@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
> >         return false;
> >       }
> >
> > -      if (cc_cmp)
> > +      if (cc_cmp && seq == seq1)
> >       {
> > -       /* Check if SEQ can clobber registers mentioned in
> > -          cc_cmp and/or rev_cc_cmp.  If yes, we need to use
> > -          only seq1 from that point on.  */
> > -       rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
> > -       for (walk = seq; walk; walk = NEXT_INSN (walk))
> > +       /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
> > +          If yes, we need to use only seq1 from that point on.
> > +          Only check when we use seq1 since we have already tested seq2.  */
> > +       if (modified_in_p (cc_cmp, seq)
> > +           || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
> >           {
> > -           note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
> > -           if (cc_cmp_pair[0] == NULL_RTX)
> > -             {
> > -               cc_cmp = NULL_RTX;
> > -               rev_cc_cmp = NULL_RTX;
> > -               break;
> > -             }
> > +           cc_cmp = NULL_RTX;
> > +           rev_cc_cmp = NULL_RTX;
> >           }
> >       }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion
  2023-10-19 19:46   ` Richard Sandiford
  2023-11-10 21:53     ` Jeff Law
  2023-11-13 12:47     ` Manolis Tsamis
@ 2024-04-23 11:00     ` Manolis Tsamis
  2 siblings, 0 replies; 24+ messages in thread
From: Manolis Tsamis @ 2024-04-23 11:00 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Robin Dapp,
	Jakub Jelinek, richard.sandiford

On Thu, Oct 19, 2023 at 10:46 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > Currently the operations allowed for if conversion of a basic block with
> > multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
> > bb_ok_for_noce_convert_multiple_sets).
> >
> > This commit allows more operations (arithmetic, compare, etc) to participate
> > in if conversion. The target's profitability hook and ifcvt's costing is
> > expected to reject sequences that are unprofitable.
> >
> > This is especially useful for targets which provide a rich selection of
> > conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
> > which are currently not used in basic blocks with more than a single set.
> >
> > gcc/ChangeLog:
> >
> >       * ifcvt.cc (try_emit_cmove_seq): Modify comments.
> >       (noce_convert_multiple_sets_1): Modify comments.
> >       (bb_ok_for_noce_convert_multiple_sets): Allow more operations.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> > Changes in v3:
> >         - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
> >         - Allow rewiring of multiple regs.
> >         - Refactor code with noce_multiple_sets_info.
> >         - Remove old code for subregs.
> >
> >  gcc/ifcvt.cc                                  | 63 ++++++++++-----
> >  .../aarch64/ifcvt_multiple_sets_arithm.c      | 79 +++++++++++++++++++
> >  2 files changed, 123 insertions(+), 19 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> >
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > index 3273aeca125..efe8ab1577a 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
> >  /* We have something like:
> >
> >       if (x > y)
> > -       { i = a; j = b; k = c; }
> > +       { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
> >
> >     Make it:
> >
> > -     tmp_i = (x > y) ? a : i;
> > -     tmp_j = (x > y) ? b : j;
> > -     tmp_k = (x > y) ? c : k;
> > +     tmp_i = (x > y) ? EXPR_A : i;
> > +     tmp_j = (x > y) ? EXPR_B : j;
> > +     tmp_k = (x > y) ? EXPR_C : k;
> >       i = tmp_i;
> >       j = tmp_j;
> >       k = tmp_k;
> > @@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
> >
> >
> >
> > -/* Return true iff basic block TEST_BB is comprised of only
> > -   (SET (REG) (REG)) insns suitable for conversion to a series
> > -   of conditional moves.  Also check that we have more than one set
> > -   (other routines can handle a single set better than we would), and
> > -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> > +/* Return true iff basic block TEST_BB is suitable for conversion to a
> > +   series of conditional moves.  Also check that we have more than one
> > +   set (other routines can handle a single set better than we would),
> > +   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> >     through the insns store the sum of their potential costs in COST.  */
> >
> >  static bool
> > @@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
> >        rtx dest = SET_DEST (set);
> >        rtx src = SET_SRC (set);
> >
> > -      /* We can possibly relax this, but for now only handle REG to REG
> > -      (including subreg) moves.  This avoids any issues that might come
> > -      from introducing loads/stores that might violate data-race-freedom
> > -      guarantees.  */
> > -      if (!REG_P (dest))
> > +      /* Do not handle anything involving memory loads/stores since it might
> > +      violate data-race-freedom guarantees.  */
> > +      if (!REG_P (dest) || contains_mem_rtx_p (src))
> > +     return false;
> > +
> > +      if (!SCALAR_INT_MODE_P (GET_MODE (src)))
> >       return false;
> >
> > -      if (!((REG_P (src) || CONSTANT_P (src))
> > -         || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> > -           && subreg_lowpart_p (src))))
> > +      /* Allow a wide range of operations and let the costing function decide
> > +      if the conversion is worth it later.  */
> > +      enum rtx_code code = GET_CODE (src);
> > +      if (!(CONSTANT_P (src)
> > +         || code == REG
> > +         || code == SUBREG
> > +         || code == ZERO_EXTEND
> > +         || code == SIGN_EXTEND
> > +         || code == NOT
> > +         || code == NEG
> > +         || code == PLUS
> > +         || code == MINUS
> > +         || code == AND
> > +         || code == IOR
> > +         || code == MULT
> > +         || code == ASHIFT
> > +         || code == ASHIFTRT
> > +         || code == NE
> > +         || code == EQ
> > +         || code == GE
> > +         || code == GT
> > +         || code == LE
> > +         || code == LT
> > +         || code == GEU
> > +         || code == GTU
> > +         || code == LEU
> > +         || code == LTU
> > +         || code == COMPARE))
> >       return false;
>
> I'm nervous about lists of operations like these, for two reasons:
>
> (1) It isn't obvious what criteria are used to select the codes.
>
> (2) It requires the top-level code to belong to a given set, but it
>     allows subrtxes of src to be arbitrarily complex.  E.g. (to pick
>     a silly example) a toplevel (popcount ...) would be rejected, but
>     (plus (popcount ...) (const_int 1)) would be OK.
>
> Could we just remove this condition instead?
>
True and done: https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649899.html

I also removed the if (!SCALAR_INT_MODE_P (GET_MODE (src))) because it
was rejecting (const_int) apparently. The mode is checked in the rest
of the code so it wasn't needed in the first place.

Manolis

> Thanks,
> Richard
>
> >
> > -      /* Destination must be appropriate for a conditional write.  */
> > -      if (!noce_operand_ok (dest))
> > +      /* Destination and source must be appropriate.  */
> > +      if (!noce_operand_ok (dest) || !noce_operand_ok (src))
> >       return false;
> >
> >        /* We must be able to conditionally move in this mode.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> > new file mode 100644
> > index 00000000000..d977f4d62ec
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
> > @@ -0,0 +1,79 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> > +
> > +void sink2(int, int);
> > +void sink3(int, int, int);
> > +
> > +void cond1(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x = x << 4;
> > +      y = 1;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond2(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x++;
> > +      y++;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond3(int cond, int x1, int x2, int x3)
> > +{
> > +  if (cond)
> > +    {
> > +      x1++;
> > +      x2++;
> > +      x3++;
> > +    }
> > +
> > +  sink3(x1, x2, x3);
> > +}
> > +
> > +void cond4(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x += 2;
> > +      y += 3;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond5(int cond, int x, int y, int r1, int r2)
> > +{
> > +  if (cond)
> > +    {
> > +      x = r1 + 2;
> > +      y = r2 - 34;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +void cond6(int cond, int x, int y)
> > +{
> > +  if (cond)
> > +    {
> > +      x = -x;
> > +      y = ~y;
> > +    }
> > +
> > +  sink2(x, y);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "cinc\t" 5 } } */
> > +/* { dg-final { scan-assembler-times "csneg\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "csinv\t" 1 } } */
> > +/* { dg-final { scan-assembler "csel\t" } } */
> > +
> > +/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 6 "ce1" } } */
> > \ No newline at end of file

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-04-23 11:01 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30 10:13 [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
2023-08-30 10:13 ` [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
2023-10-19 19:41   ` Richard Sandiford
2023-10-20  7:04     ` Robin Dapp
2023-10-20  9:16       ` Richard Sandiford
2023-11-10 21:48         ` Jeff Law
2023-11-10 21:31       ` Jeff Law
2023-11-10 21:25     ` Jeff Law
2024-04-23 10:58     ` Manolis Tsamis
2023-08-30 10:13 ` [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
2023-10-19 19:46   ` Richard Sandiford
2023-11-10 21:53     ` Jeff Law
2023-11-13 12:47     ` Manolis Tsamis
2024-04-23 11:00     ` Manolis Tsamis
2023-08-30 10:13 ` [PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets Manolis Tsamis
2023-11-10 23:20   ` Jeff Law
2023-11-13 12:40     ` Manolis Tsamis
2023-11-21 18:10       ` Manolis Tsamis
2023-08-30 10:14 ` [PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets Manolis Tsamis
2023-11-10 22:03   ` Jeff Law
2023-11-13 12:43     ` Manolis Tsamis
2023-11-21 18:08       ` Manolis Tsamis
2023-09-18  8:18 ` [PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
2023-10-19  6:53   ` Manolis Tsamis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).