public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] Teach RTL ifcvt to handle multiple simple set instructions
@ 2015-09-08 15:01 James Greenhalgh
  2015-09-10 18:24 ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-09-08 15:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]


Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
    std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
    tmp = d;
    d = e;
    e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James

---
gcc/

2015-09-07  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-Teach-RTL-ifcvt-to-handle-multiple-simple-set-.patch --]
[-- Type: text/x-patch;  name=0001-Patch-Teach-RTL-ifcvt-to-handle-multiple-simple-set-.patch, Size: 8062 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..059bd89 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+     if (x > y)
+       { i = a; j = b; k = c; }
+
+   Make it:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? b : j;
+     tmp_k = (x > y) ? c : k;
+     i = tmp_i; <- Should be cleaned up
+     j = tmp_j; <- Likewise.
+     k = tmp_k; <- Likewise.
+
+   Look for special cases such as use of temporary registers (for
+   example in a swap idiom).
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  unsigned int cost = 0;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec<rtx> targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  vec<rtx> temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec<rtx_insn *> unmodified_insns = vNULL;
+  unsigned count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+    {
+      /* Skip over non-insns.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx target = SET_DEST (set);
+      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx new_val = SET_SRC (set);
+      rtx old_val = target;
+
+      /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+      rtx candidate_rewire = new_val;
+      for (unsigned i = 0; i < count; i++)
+	{
+	  if (reg_overlap_mentioned_p (new_val, targets[i]))
+	    {
+	      /* Catch a "swap" style idiom.  */
+	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+		{
+		  /* The write to targets[i] is only live until the read
+		     here.  As the condition codes match, we can propagate
+		     the set to here.  */
+		   candidate_rewire
+		     = SET_SRC (single_set (unmodified_insns[i]));
+
+		   /* Discount the cost calculation by one conditional
+		      set instruction.  As we are just putting out
+		      a group of SET instructions, any will do.  */
+		   cost -= insn_rtx_cost (PATTERN (get_last_insn ()),
+					  optimize_bb_for_speed_p (test_bb));
+		}
+	      else
+		candidate_rewire = temporaries[i];
+	    }
+	}
+      new_val = candidate_rewire;
+
+      /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */
+      if (if_info->then_else_reversed)
+	std::swap (old_val, new_val);
+
+      /* Actually emit the conditional move.  */
+      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				       x, y, new_val, old_val);
+
+      /* If we failed to expand the conditional move, drop out and don't
+	 try to continue.  */
+      if (temp_dest == NULL_RTX)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+
+      /* Track the cost of building these conditional instructions.  */
+      cost += insn_rtx_cost (PATTERN (get_last_insn ()),
+			     optimize_bb_for_speed_p (test_bb));
+
+      /* Bookkeeping.  */
+      count++;
+      targets.safe_push (target);
+      temporaries.safe_push (temp_dest);
+      unmodified_insns.safe_push (insn);
+    }
+
+  /* We must have seen some sort of insn to insert, otherwise we were
+     given an empty BB to convert, and we can't handle that.  */
+  if (unmodified_insns.is_empty ())
+    {
+      end_sequence ();
+      return FALSE;
+    }
+
+  /* Check if this is actually beneficial.  */
+  if (cost > COSTS_N_INSNS (if_info->branch_cost))
+     {
+       end_sequence ();
+       return FALSE;
+     }
+
+  /* Now fixup the assignments.  */
+  for (unsigned i = 0; i < count; i++)
+    noce_emit_move_insn (targets[i], temporaries[i]);
+
+  /* Actually emit the sequence.  */
+  rtx_insn *seq = get_insns ();
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
+  unshare_all_rtl_in_chain (seq);
+  end_sequence ();
+
+  if (!seq)
+    return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			   INSN_LOCATION (unmodified_insns.last ()));
+
+  /* Clean up THEN_BB and the edges in and out of it.  */
+  remove_edge (find_edge (test_bb, join_bb));
+  remove_edge (find_edge (then_bb, join_bb));
+  redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb);
+  delete_basic_block (then_bb);
+  num_true_changes++;
+
+  /* Maybe merge blocks now the jump is simple enough.  */
+  if (can_merge_blocks_p (test_bb, join_bb))
+    {
+      merge_blocks (test_bb, join_bb);
+      num_true_changes++;
+    }
+
+  num_updated_if_blocks++;
+  return TRUE;
+}
+
+/* Return true iff basic block TEST_BB is comprised of only
+   (SET (REG) (REG)) insns suitable for conversion to a series
+   of conditional moves.  */
+
+static bool
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+{
+  rtx_insn *insn;
+
+  /* We must have at least one real insn to convert, or there will
+     be trouble!  */
+  bool bb_is_not_empty = false;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      /* Skip over notes etc.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      /* We only handle SET insns.  */
+      rtx set = single_set (insn);
+      if (set == NULL_RTX)
+	return false;
+
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* We can possibly relax this, but for now only handle REG to REG
+	 moves.  This avoids any issues that might come from introducing
+	 loads/stores that might violate data-race-freedom guarantees.  */
+      if (!(REG_P (src) && REG_P (dest)))
+	return false;
+
+      /* Destination must be appropriate for a conditional write.  */
+      if (!noce_operand_ok (dest))
+	return false;
+
+      /* We must be able to conditionally move in this mode.  */
+      if (!can_conditionally_move_p (GET_MODE (dest)))
+	return false;
+
+      bb_is_not_empty = true;
+    }
+  return bb_is_not_empty;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -3004,12 +3221,22 @@ noce_process_if_block (struct noce_if_info *if_info)
      (1) if (...) x = a; else x = b;
      (2) x = b; if (...) x = a;
      (3) if (...) x = a;   // as if with an initial x = x.
-
+     (4) if (...) { x = a; y = b; z = c; }  // Like 3, for multiple SETS.
      The later patterns require jumps to be more expensive.
      For the if (...) x = a; else x = b; case we allow multiple insns
      inside the then and else blocks as long as their only effect is
      to calculate a value for x.
-     ??? For future expansion, look for multiple X in such patterns.  */
+     ??? For future expansion, further expand the "multiple X" rules.  */
+
+  /* First look for multiple SETS.  */
+  if (!else_bb
+      && HAVE_conditional_move
+      && !HAVE_cc0
+      && bb_ok_for_noce_convert_multiple_sets (then_bb))
+    {
+      if (noce_convert_multiple_sets (if_info))
+	return TRUE;
+    }
 
   if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
 				    &if_info->then_simple))

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-07-21 11:32 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 15:01 [Patch] Teach RTL ifcvt to handle multiple simple set instructions James Greenhalgh
2015-09-10 18:24 ` Bernd Schmidt
2015-09-10 21:34   ` Jeff Law
2015-09-11  8:51     ` Kyrill Tkachov
2015-09-11 21:49       ` Jeff Law
2015-09-11  9:04     ` Bernd Schmidt
2015-09-11  9:08       ` Ramana Radhakrishnan
2015-09-11 10:55         ` James Greenhalgh
2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
2015-09-25 15:06             ` [Patch ifcvt 1/3] Factor out cost calculations from noce cases James Greenhalgh
2015-09-25 15:08             ` [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h James Greenhalgh
2015-09-25 15:14             ` [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 James Greenhalgh
2015-09-29 10:43               ` Richard Biener
2015-09-25 15:28             ` [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion James Greenhalgh
2015-09-29 10:36             ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs Richard Biener
2015-09-29 15:28               ` James Greenhalgh
2015-09-29 19:52                 ` Mike Stump
2015-09-30  8:42                   ` Richard Biener
2015-09-30  8:48                 ` Richard Biener
2015-09-30 19:01                   ` Mike Stump
2015-10-01  9:37                 ` Bernd Schmidt
2015-10-09 11:28                   ` Bernd Schmidt
2015-10-09 15:28                     ` Jeff Law
2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
2016-06-02 16:54                       ` [RFC: Patch 4/6] Modify cost model for noce_cmove_arith James Greenhalgh
2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
2016-06-03 10:39                         ` Richard Biener
2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
2016-06-21 15:51                             ` [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
2016-07-13 21:18                               ` Jeff Law
2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
2016-07-13 21:23                               ` Jeff Law
2016-06-21 15:51                             ` [RFC: Patch 3/6 v2] Remove if_info->branch_cost James Greenhalgh
2016-07-13 21:19                               ` Jeff Law
2016-06-21 15:53                             ` [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
2016-07-13 21:24                               ` Jeff Law
2016-06-21 15:53                             ` [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith James Greenhalgh
2016-07-13 21:22                               ` Jeff Law
2016-06-21 21:31                             ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost Bernhard Reutner-Fischer
2016-06-30 12:01                             ` Bernd Schmidt
2016-07-13 21:16                             ` Jeff Law
2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
2016-07-20  9:52                                 ` [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources James Greenhalgh
2016-07-20  9:53                                 ` [RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt James Greenhalgh
2016-07-20  9:53                                 ` [Patch RFC 4/2 v3] Refactor noce_try_cmove_arith James Greenhalgh
2016-07-20 11:41                                 ` [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Bernd Schmidt
2016-07-20 16:40                                   ` James Greenhalgh
2016-07-21 11:32                                     ` Bernd Schmidt
2016-06-02 16:55                       ` [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
2016-06-02 16:55                       ` [RFC: Patch 5/6] Improve the cost model for multiple-sets James Greenhalgh
2016-06-02 16:56                       ` [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
2016-06-02 16:56                       ` [RFC: Patch 3/6] Remove if_info->branch_cost James Greenhalgh
2016-06-03  9:32                       ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models Bernd Schmidt
2016-06-09 16:58                       ` Jeff Law
2016-06-10 10:45                         ` James Greenhalgh
2015-09-12 14:04     ` [Patch] Teach RTL ifcvt to handle multiple simple set instructions Eric Botcazou
2015-10-30 18:09   ` [Patch ifcvt] " James Greenhalgh
2015-11-04 11:04     ` Bernd Schmidt
2015-11-04 15:37       ` James Greenhalgh
2015-11-06  9:13         ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).