public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
@ 2023-07-13 14:09 Manolis Tsamis
  2023-07-13 14:09 ` [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Manolis Tsamis @ 2023-07-13 14:09 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Jakub Jelinek, Andrew Pinski, Robin Dapp,
	Manolis Tsamis


noce_convert_multiple_sets has been introduced and extended over time to handle
if conversion for blocks with multiple sets. Currently this is focused on
register moves and rejects any sort of arithmetic operations.

This series is an extension to allow more sequences to take part in if
conversion. The first patch is a required change to emit correct code and the
second patch whitelists a larger number of operations through
bb_ok_for_noce_convert_multiple_sets.

For targets that have a rich selection of conditional instructions,
like aarch64, I have seen an ~5x increase of profitable if conversions for
multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
benchmarks and I have not seen performance regressions on either x64 / aarch64.

Some samples that previously resulted in a branch but now better use these
instructions can be seen in the provided test case.

Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
failing with an ICE but I believe that it's not an issue of this change.
force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
is provided through emit_conditional_move.


Changes in v2:
        - Change "conditional moves" to "conditional instructions"
        in bb_ok_for_noce_convert_multiple_sets's comment.

Manolis Tsamis (2):
  ifcvt: handle sequences that clobber flags in
    noce_convert_multiple_sets
  ifcvt: Allow more operations in multiple set if conversion

 gcc/ifcvt.cc                                  | 109 ++++++++++--------
 .../aarch64/ifcvt_multiple_sets_arithm.c      |  67 +++++++++++
 2 files changed, 127 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets
  2023-07-13 14:09 [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
@ 2023-07-13 14:09 ` Manolis Tsamis
  2023-07-13 14:09 ` [PATCH v2 2/2] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
  2023-07-17 22:12 ` [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Richard Sandiford
  2 siblings, 0 replies; 7+ messages in thread
From: Manolis Tsamis @ 2023-07-13 14:09 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Jakub Jelinek, Andrew Pinski, Robin Dapp,
	Manolis Tsamis

This is an extension of what was done in PR106590.

Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).

This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. This makes it possible to include arithmetic operations
in noce_convert_multiple_sets.

gcc/ChangeLog:

	* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
	(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

(no changes since v1)

 gcc/ifcvt.cc | 49 +++++++++++++++++++------------------------------
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..3273aeca125 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   return true;
 }
 
-/* Helper function for noce_convert_multiple_sets_1.  If store to
-   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
-
-static void
-check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
-{
-  rtx *p = (rtx *) p0;
-  if (p[0] == NULL_RTX)
-    return;
-  if (reg_overlap_mentioned_p (dest, p[0])
-      || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
-    p[0] = NULL_RTX;
-}
-
 /* This goes through all relevant insns of IF_INFO->then_bb and tries to
    create conditional moves.  In case a simple move sufficis the insn
    should be listed in NEED_NO_CMOV.  The rewired-src cases should be
@@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	 creating an additional compare for each.  If successful, costing
 	 is easier and this sequence is usually preferred.  */
       if (cc_cmp)
-	seq2 = try_emit_cmove_seq (if_info, temp, cond,
-				   new_val, old_val, need_cmov,
-				   &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+	{
+	  seq2 = try_emit_cmove_seq (if_info, temp, cond,
+				     new_val, old_val, need_cmov,
+				     &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+
+	  /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
+	     clobbered.  We can't safely use the sequence in this case.  */
+	  if (seq2 && (modified_in_p (cc_cmp, seq2)
+	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2))))
+	    seq2 = NULL;
+	}
 
       /* The backend might have created a sequence that uses the
 	 condition.  Check this.  */
@@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	  return false;
 	}
 
-      if (cc_cmp)
+      if (cc_cmp && seq == seq1)
 	{
-	  /* Check if SEQ can clobber registers mentioned in
-	     cc_cmp and/or rev_cc_cmp.  If yes, we need to use
-	     only seq1 from that point on.  */
-	  rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
-	  for (walk = seq; walk; walk = NEXT_INSN (walk))
+	  /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
+	     If yes, we need to use only seq1 from that point on.
+	     Only check when we use seq1 since we have already tested seq2.  */
+	  if (modified_in_p (cc_cmp, seq)
+	      || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
 	    {
-	      note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
-	      if (cc_cmp_pair[0] == NULL_RTX)
-		{
-		  cc_cmp = NULL_RTX;
-		  rev_cc_cmp = NULL_RTX;
-		  break;
-		}
+	      cc_cmp = NULL_RTX;
+	      rev_cc_cmp = NULL_RTX;
 	    }
 	}
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] ifcvt: Allow more operations in multiple set if conversion
  2023-07-13 14:09 [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  2023-07-13 14:09 ` [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
@ 2023-07-13 14:09 ` Manolis Tsamis
  2023-07-17 22:12 ` [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Richard Sandiford
  2 siblings, 0 replies; 7+ messages in thread
From: Manolis Tsamis @ 2023-07-13 14:09 UTC (permalink / raw)
  To: gcc-patches
  Cc: Philipp Tomsich, Jakub Jelinek, Andrew Pinski, Robin Dapp,
	Manolis Tsamis

Currently the operations allowed for if conversion of a basic block with
multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
bb_ok_for_noce_convert_multiple_sets).

This commit allows more operations (arithmetic, compare, etc) to participate
in if conversion. The target's profitability hook and ifcvt's costing is
expected to reject sequences that are unprofitable.

This is especially useful for targets which provide a rich selection of
conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
which are currently not used in basic blocks with more than a single set.

gcc/ChangeLog:

	* ifcvt.cc (try_emit_cmove_seq): Modify comments.
	(noce_convert_multiple_sets_1): Modify comments.
	(bb_ok_for_noce_convert_multiple_sets): Allow more operations.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

Changes in v2:
        - Change "conditional moves" to "conditional instructions"
        in bb_ok_for_noce_convert_multiple_sets's comment.

 gcc/ifcvt.cc                                  | 60 +++++++++++------
 .../aarch64/ifcvt_multiple_sets_arithm.c      | 67 +++++++++++++++++++
 2 files changed, 108 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 3273aeca125..be29403a5b5 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
 /* We have something like:
 
      if (x > y)
-       { i = a; j = b; k = c; }
+       { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
 
    Make it:
 
-     tmp_i = (x > y) ? a : i;
-     tmp_j = (x > y) ? b : j;
-     tmp_k = (x > y) ? c : k;
+     tmp_i = (x > y) ? EXPR_A : i;
+     tmp_j = (x > y) ? EXPR_B : j;
+     tmp_k = (x > y) ? EXPR_C : k;
      i = tmp_i;
      j = tmp_j;
      k = tmp_k;
@@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 
 
 
-/* Return true iff basic block TEST_BB is comprised of only
-   (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  Also check that we have more than one set
-   (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+/* Return true iff basic block TEST_BB is suitable for conversion to a
+   series of conditional instructions.  Also check that we have more than
+   one set (other routines can handle a single set better than we would),
+   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
    through the insns store the sum of their potential costs in COST.  */
 
 static bool
@@ -3667,20 +3666,43 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
       rtx dest = SET_DEST (set);
       rtx src = SET_SRC (set);
 
-      /* We can possibly relax this, but for now only handle REG to REG
-	 (including subreg) moves.  This avoids any issues that might come
-	 from introducing loads/stores that might violate data-race-freedom
-	 guarantees.  */
-      if (!REG_P (dest))
+      /* Do not handle anything involving memory loads/stores since it might
+	 violate data-race-freedom guarantees.  */
+      if (!REG_P (dest) || contains_mem_rtx_p (src))
 	return false;
 
-      if (!((REG_P (src) || CONSTANT_P (src))
-	    || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
-	      && subreg_lowpart_p (src))))
+      /* Allow a wide range of operations and let the costing function decide
+	 if the conversion is worth it later.  */
+      enum rtx_code code = GET_CODE (src);
+      if (!(CONSTANT_P (src)
+	    || code == REG
+	    || code == SUBREG
+	    || code == ZERO_EXTEND
+	    || code == SIGN_EXTEND
+	    || code == NOT
+	    || code == NEG
+	    || code == PLUS
+	    || code == MINUS
+	    || code == AND
+	    || code == IOR
+	    || code == MULT
+	    || code == ASHIFT
+	    || code == ASHIFTRT
+	    || code == NE
+	    || code == EQ
+	    || code == GE
+	    || code == GT
+	    || code == LE
+	    || code == LT
+	    || code == GEU
+	    || code == GTU
+	    || code == LEU
+	    || code == LTU
+	    || code == COMPARE))
 	return false;
 
-      /* Destination must be appropriate for a conditional write.  */
-      if (!noce_operand_ok (dest))
+      /* Destination and source must be appropriate.  */
+      if (!noce_operand_ok (dest) || !noce_operand_ok (src))
 	return false;
 
       /* We must be able to conditionally move in this mode.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
new file mode 100644
index 00000000000..f29cc72263a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ce1" } */
+
+void sink2(int, int);
+void sink3(int, int, int);
+
+void cond_1(int cond, int x, int y) {
+    if (cond) {
+        x = x << 4;
+        y = 1;
+    }
+
+    sink2(x, y);
+}
+
+void cond_2(int cond, int x, int y) {
+    if (cond) {
+        x++;
+        y++;
+    }
+
+    sink2(x, y);
+}
+
+void cond_3(int cond, int x1, int x2, int x3) {
+    if (cond) {
+        x1++;
+        x2++;
+        x3++;
+    }
+
+    sink3(x1, x2, x3);
+}
+
+void cond_4(int cond, int x, int y) {
+    if (cond) {
+        x += 2;
+        y += 3;
+    }
+
+    sink2(x, y);
+}
+
+void cond_5(int cond, int x, int y, int r1, int r2) {
+    if (cond) {
+        x = r1 + 2;
+        y = r2 - 34;
+    }
+
+    sink2(x, y);
+}
+
+void cond_6(int cond, int x, int y) {
+    if (cond) {
+        x = -x;
+        y = ~y;
+    }
+
+    sink2(x, y);
+}
+
+/* { dg-final { scan-assembler-times "cinc\t" 5 } } */
+/* { dg-final { scan-assembler-times "csneg\t" 1 } } */
+/* { dg-final { scan-assembler-times "csinv\t" 1 } } */
+/* { dg-final { scan-assembler "csel\t" 1 } } */
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through noce_convert_multiple_sets" 6 "ce1" } } */
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-07-13 14:09 [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
  2023-07-13 14:09 ` [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
  2023-07-13 14:09 ` [PATCH v2 2/2] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
@ 2023-07-17 22:12 ` Richard Sandiford
  2023-07-18 17:03   ` Manolis Tsamis
  2 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2023-07-17 22:12 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Philipp Tomsich, Jakub Jelinek, Andrew Pinski, Robin Dapp

Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> noce_convert_multiple_sets has been introduced and extended over time to handle
> if conversion for blocks with multiple sets. Currently this is focused on
> register moves and rejects any sort of arithmetic operations.
>
> This series is an extension to allow more sequences to take part in if
> conversion. The first patch is a required change to emit correct code and the
> second patch whitelists a larger number of operations through
> bb_ok_for_noce_convert_multiple_sets.
>
> For targets that have a rich selection of conditional instructions,
> like aarch64, I have seen an ~5x increase of profitable if conversions for
> multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> benchmarks and I have not seen performance regressions on either x64 / aarch64.

Interesting results.  Are you free to say which target you used for aarch64?

If I've understood the cost heuristics correctly, we'll allow a "predictable"
branch to be replaced by up to 5 simple conditional instructions and an
"unpredictable" branch to be replaced by up to 10 simple conditional
instructions.  That seems pretty high.  And I'm not sure how well we
guess predictability in the absence of real profile information.

So my gut instinct was that the limitations of the current code might
be saving us from overly generous limits.  It sounds from your results
like that might not be the case though.

Still, if it does turn out to be the case in future, I agree we should
fix the costs rather than hamstring the code.

> Some samples that previously resulted in a branch but now better use these
> instructions can be seen in the provided test case.
>
> Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> failing with an ICE but I believe that it's not an issue of this change.
> force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> is provided through emit_conditional_move.

I guess that needs to be fixed first though.  (Thanks for checking both
targets.)

My main comments on the series are:

(1) It isn't obvious which operations should be included in the list
    in patch 2 and which shouldn't.  Also, the patch only checks the
    outermost operation, and so it allows the inner rtxes to be
    arbitrarily complex.

    Because of that, it might be better to remove the condition
    altogether and just rely on the other routines to do costing and
    correctness checks.

(2) Don't you also need to update the "rewiring" mechanism, to cope
    with cases where the then block has something like:

      if (a == 0) {
        a = b op c;       ->    a' = a == 0 ? b op c : a;
        d = a op b;       ->    d = a == 0 ? a' op b : d;
      }                         a = a'

    At the moment the code only handles regs and subregs, whereas but IIUC
    it should now iterate over all the regs in the SET_SRC.  And I suppose
    that creates the need for multiple possible rewirings in the same insn,
    so that it isn't a simple insn -> index mapping any more.

Thanks,
Richard

>
>
> Changes in v2:
>         - Change "conditional moves" to "conditional instructions"
>         in bb_ok_for_noce_convert_multiple_sets's comment.
>
> Manolis Tsamis (2):
>   ifcvt: handle sequences that clobber flags in
>     noce_convert_multiple_sets
>   ifcvt: Allow more operations in multiple set if conversion
>
>  gcc/ifcvt.cc                                  | 109 ++++++++++--------
>  .../aarch64/ifcvt_multiple_sets_arithm.c      |  67 +++++++++++
>  2 files changed, 127 insertions(+), 49 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-07-17 22:12 ` [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Richard Sandiford
@ 2023-07-18 17:03   ` Manolis Tsamis
  2023-07-18 18:38     ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Manolis Tsamis @ 2023-07-18 17:03 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Jakub Jelinek,
	Andrew Pinski, Robin Dapp, richard.sandiford

Hi Richard,

Thanks for your insightful reply.

On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > noce_convert_multiple_sets has been introduced and extended over time to handle
> > if conversion for blocks with multiple sets. Currently this is focused on
> > register moves and rejects any sort of arithmetic operations.
> >
> > This series is an extension to allow more sequences to take part in if
> > conversion. The first patch is a required change to emit correct code and the
> > second patch whitelists a larger number of operations through
> > bb_ok_for_noce_convert_multiple_sets.
> >
> > For targets that have a rich selection of conditional instructions,
> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> > benchmarks and I have not seen performance regressions on either x64 / aarch64.
>
> Interesting results.  Are you free to say which target you used for aarch64?
>
> If I've understood the cost heuristics correctly, we'll allow a "predictable"
> branch to be replaced by up to 5 simple conditional instructions and an
> "unpredictable" branch to be replaced by up to 10 simple conditional
> instructions.  That seems pretty high.  And I'm not sure how well we
> guess predictability in the absence of real profile information.
>
> So my gut instinct was that the limitations of the current code might
> be saving us from overly generous limits.  It sounds from your results
> like that might not be the case though.
>
> Still, if it does turn out to be the case in future, I agree we should
> fix the costs rather than hamstring the code.
>

My writing may have been confusing, but with "~5x increase of
profitable if conversions" I just meant that ifcvt considers these
profitable, not that they actually are when executed in particular
hardware.
But at the same time I haven't yet seen any obvious performance
regressions in some benchmarks that I have ran.
In any case it could be interesting to microbenchmark branches vs
conditional instructions and see how sane these numbers are.

> > Some samples that previously resulted in a branch but now better use these
> > instructions can be seen in the provided test case.
> >
> > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> > failing with an ICE but I believe that it's not an issue of this change.
> > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> > is provided through emit_conditional_move.
>
> I guess that needs to be fixed first though.  (Thanks for checking both
> targets.)
>

I get a feeling this may be fixed if I properly take care of your
points 1 & 2 below. I will report on that.

> My main comments on the series are:
>
> (1) It isn't obvious which operations should be included in the list
>     in patch 2 and which shouldn't.  Also, the patch only checks the
>     outermost operation, and so it allows the inner rtxes to be
>     arbitrarily complex.
>
>     Because of that, it might be better to remove the condition
>     altogether and just rely on the other routines to do costing and
>     correctness checks.
>

That is true; I wanted to somehow only allow "normal arithmetic
operations" and avoid generating sequences with stranger codes. I will
try and see what happens if I remove the condition altogether. I also
totally missed the fact that I was allowing arbitrarily complex inner
rtxes so thanks for pointing that out.

> (2) Don't you also need to update the "rewiring" mechanism, to cope
>     with cases where the then block has something like:
>
>       if (a == 0) {
>         a = b op c;       ->    a' = a == 0 ? b op c : a;
>         d = a op b;       ->    d = a == 0 ? a' op b : d;
>       }                         a = a'
>
>     At the moment the code only handles regs and subregs, whereas but IIUC
>     it should now iterate over all the regs in the SET_SRC.  And I suppose
>     that creates the need for multiple possible rewirings in the same insn,
>     so that it isn't a simple insn -> index mapping any more.
>

Indeed, I believe this current patch cannot properly handle these. I
will create testcases for this and see what changes need to be done in
the next iteration so that correct code is generated.

Thanks,
Manolis

> Thanks,
> Richard
>
> >
> >
> > Changes in v2:
> >         - Change "conditional moves" to "conditional instructions"
> >         in bb_ok_for_noce_convert_multiple_sets's comment.
> >
> > Manolis Tsamis (2):
> >   ifcvt: handle sequences that clobber flags in
> >     noce_convert_multiple_sets
> >   ifcvt: Allow more operations in multiple set if conversion
> >
> >  gcc/ifcvt.cc                                  | 109 ++++++++++--------
> >  .../aarch64/ifcvt_multiple_sets_arithm.c      |  67 +++++++++++
> >  2 files changed, 127 insertions(+), 49 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-07-18 17:03   ` Manolis Tsamis
@ 2023-07-18 18:38     ` Richard Sandiford
  2023-08-30 10:30       ` Manolis Tsamis
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2023-07-18 18:38 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Philipp Tomsich, Jakub Jelinek, Andrew Pinski, Robin Dapp

Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
>> > noce_convert_multiple_sets has been introduced and extended over time to handle
>> > if conversion for blocks with multiple sets. Currently this is focused on
>> > register moves and rejects any sort of arithmetic operations.
>> >
>> > This series is an extension to allow more sequences to take part in if
>> > conversion. The first patch is a required change to emit correct code and the
>> > second patch whitelists a larger number of operations through
>> > bb_ok_for_noce_convert_multiple_sets.
>> >
>> > For targets that have a rich selection of conditional instructions,
>> > like aarch64, I have seen an ~5x increase of profitable if conversions for
>> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
>> > benchmarks and I have not seen performance regressions on either x64 / aarch64.
>>
>> Interesting results.  Are you free to say which target you used for aarch64?
>>
>> If I've understood the cost heuristics correctly, we'll allow a "predictable"
>> branch to be replaced by up to 5 simple conditional instructions and an
>> "unpredictable" branch to be replaced by up to 10 simple conditional
>> instructions.  That seems pretty high.  And I'm not sure how well we
>> guess predictability in the absence of real profile information.
>>
>> So my gut instinct was that the limitations of the current code might
>> be saving us from overly generous limits.  It sounds from your results
>> like that might not be the case though.
>>
>> Still, if it does turn out to be the case in future, I agree we should
>> fix the costs rather than hamstring the code.
>>
>
> My writing may have been confusing, but with "~5x increase of
> profitable if conversions" I just meant that ifcvt considers these
> profitable, not that they actually are when executed in particular
> hardware.

Yeah, sorry, I'd read that part as measuring the number of if-converisons.
But...

> But at the same time I haven't yet seen any obvious performance
> regressions in some benchmarks that I have ran.

...it was a pleasant surprise that doing so much more if-conversion
didn't make things noticeably worse. :)

> In any case it could be interesting to microbenchmark branches vs
> conditional instructions and see how sane these numbers are.

I think for this we really do need the real workload, since it's
hard to measure realistic branch mispredict penalties with a
microbenchmark.

> [...]
>> (2) Don't you also need to update the "rewiring" mechanism, to cope
>>     with cases where the then block has something like:
>>
>>       if (a == 0) {
>>         a = b op c;       ->    a' = a == 0 ? b op c : a;
>>         d = a op b;       ->    d = a == 0 ? a' op b : d;
>>       }                         a = a'
>>
>>     At the moment the code only handles regs and subregs, whereas but IIUC
>>     it should now iterate over all the regs in the SET_SRC.  And I suppose
>>     that creates the need for multiple possible rewirings in the same insn,
>>     so that it isn't a simple insn -> index mapping any more.
>>
>
> Indeed, I believe this current patch cannot properly handle these. I
> will create testcases for this and see what changes need to be done in
> the next iteration so that correct code is generated.

Perhaps we should change the way that the rewiring is done.
At the moment, need_cmov_or_rewire detects the renumbering
ahead of time.  But it might be easier to:

- have noce_convert_multiple_sets_1 keep track of which
  SET_DESTs it has replaced with temporaries.

- for each subsequent instruction, go through that list in order
  and use insn_propagation (from recog.h) to apply each replacement.

That might be simpler, and should also be more robust, since the
insn_propagation routines return false on failure.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
  2023-07-18 18:38     ` Richard Sandiford
@ 2023-08-30 10:30       ` Manolis Tsamis
  0 siblings, 0 replies; 7+ messages in thread
From: Manolis Tsamis @ 2023-08-30 10:30 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches, Philipp Tomsich, Jakub Jelinek,
	Andrew Pinski, Robin Dapp, richard.sandiford

On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> >> > noce_convert_multiple_sets has been introduced and extended over time to handle
> >> > if conversion for blocks with multiple sets. Currently this is focused on
> >> > register moves and rejects any sort of arithmetic operations.
> >> >
> >> > This series is an extension to allow more sequences to take part in if
> >> > conversion. The first patch is a required change to emit correct code and the
> >> > second patch whitelists a larger number of operations through
> >> > bb_ok_for_noce_convert_multiple_sets.
> >> >
> >> > For targets that have a rich selection of conditional instructions,
> >> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> >> > benchmarks and I have not seen performance regressions on either x64 / aarch64.
> >>
> >> Interesting results.  Are you free to say which target you used for aarch64?
> >>
> >> If I've understood the cost heuristics correctly, we'll allow a "predictable"
> >> branch to be replaced by up to 5 simple conditional instructions and an
> >> "unpredictable" branch to be replaced by up to 10 simple conditional
> >> instructions.  That seems pretty high.  And I'm not sure how well we
> >> guess predictability in the absence of real profile information.
> >>
> >> So my gut instinct was that the limitations of the current code might
> >> be saving us from overly generous limits.  It sounds from your results
> >> like that might not be the case though.
> >>
> >> Still, if it does turn out to be the case in future, I agree we should
> >> fix the costs rather than hamstring the code.
> >>
> >
> > My writing may have been confusing, but with "~5x increase of
> > profitable if conversions" I just meant that ifcvt considers these
> > profitable, not that they actually are when executed in particular
> > hardware.
>
> Yeah, sorry, I'd read that part as measuring the number of if-converisons.
> But...
>
> > But at the same time I haven't yet seen any obvious performance
> > regressions in some benchmarks that I have ran.
>
> ...it was a pleasant surprise that doing so much more if-conversion
> didn't make things noticeably worse. :)
>
> > In any case it could be interesting to microbenchmark branches vs
> > conditional instructions and see how sane these numbers are.
>
> I think for this we really do need the real workload, since it's
> hard to measure realistic branch mispredict penalties with a
> microbenchmark.
>
Yes indeed. I'm still trying to get properly analyze the effects of this change.
I'll share when I have something interesting on the benchmarks side.

> > [...]
> >> (2) Don't you also need to update the "rewiring" mechanism, to cope
> >>     with cases where the then block has something like:
> >>
> >>       if (a == 0) {
> >>         a = b op c;       ->    a' = a == 0 ? b op c : a;
> >>         d = a op b;       ->    d = a == 0 ? a' op b : d;
> >>       }                         a = a'
> >>
> >>     At the moment the code only handles regs and subregs, whereas but IIUC
> >>     it should now iterate over all the regs in the SET_SRC.  And I suppose
> >>     that creates the need for multiple possible rewirings in the same insn,
> >>     so that it isn't a simple insn -> index mapping any more.
> >>
> >
> > Indeed, I believe this current patch cannot properly handle these. I
> > will create testcases for this and see what changes need to be done in
> > the next iteration so that correct code is generated.
>
> Perhaps we should change the way that the rewiring is done.
> At the moment, need_cmov_or_rewire detects the renumbering
> ahead of time.  But it might be easier to:
>
> - have noce_convert_multiple_sets_1 keep track of which
>   SET_DESTs it has replaced with temporaries.
>
> - for each subsequent instruction, go through that list in order
>   and use insn_propagation (from recog.h) to apply each replacement.
>
> That might be simpler, and should also be more robust, since the
> insn_propagation routines return false on failure.
>
Thanks, I've tried various designs with these ideas, including moving
the rewiring code to noce_convert_multiple_sets_1  but there was
always some issue.
For example we cannot remove the need_cmov_or_rewire function because
the part that calculates need_no_cmov needs to iterate once before the
conversion, otherwise it wouldn't work.
Also noce_convert_multiple_sets_1 is ran twice each time so doing the
new rewiring logic which is somewhat expensive two times felt like a
regression without making things much easier.

In the end I opted to keep need_cmov_or_rewire (albait renamed) and
introduce a new struct noce_multiple_sets_info, which I thing made
things much nicer.
That includes taking care of the very long signatures of the helper
functions and improving efficiency by removing a lot of vectors/hash
set/hash maps.

I've also added a SCALAR_INT_MODE_P check that takes care of the
force_reg x86-64 issue.

These changes can be found in the two last commits of the new series.
Any feedback on the new implementation would be welcome:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html

> Thanks,
> Richard
>

Thanks,
Manolis

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-30 10:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13 14:09 [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Manolis Tsamis
2023-07-13 14:09 ` [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
2023-07-13 14:09 ` [PATCH v2 2/2] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
2023-07-17 22:12 ` [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Richard Sandiford
2023-07-18 17:03   ` Manolis Tsamis
2023-07-18 18:38     ` Richard Sandiford
2023-08-30 10:30       ` Manolis Tsamis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).