[Patch] Teach RTL ifcvt to handle multiple simple set instructions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch] Teach RTL ifcvt to handle multiple simple set instructions
@ 2015-09-08 15:01 James Greenhalgh
  2015-09-10 18:24 ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-09-08 15:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]


Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
    std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
    tmp = d;
    d = e;
    e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James

---
gcc/

2015-09-07  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-Teach-RTL-ifcvt-to-handle-multiple-simple-set-.patch --]
[-- Type: text/x-patch;  name=0001-Patch-Teach-RTL-ifcvt-to-handle-multiple-simple-set-.patch, Size: 8062 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..059bd89 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+     if (x > y)
+       { i = a; j = b; k = c; }
+
+   Make it:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? b : j;
+     tmp_k = (x > y) ? c : k;
+     i = tmp_i; <- Should be cleaned up
+     j = tmp_j; <- Likewise.
+     k = tmp_k; <- Likewise.
+
+   Look for special cases such as use of temporary registers (for
+   example in a swap idiom).
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  unsigned int cost = 0;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec<rtx> targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  vec<rtx> temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec<rtx_insn *> unmodified_insns = vNULL;
+  unsigned count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+    {
+      /* Skip over non-insns.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx target = SET_DEST (set);
+      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx new_val = SET_SRC (set);
+      rtx old_val = target;
+
+      /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+      rtx candidate_rewire = new_val;
+      for (unsigned i = 0; i < count; i++)
+	{
+	  if (reg_overlap_mentioned_p (new_val, targets[i]))
+	    {
+	      /* Catch a "swap" style idiom.  */
+	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+		{
+		  /* The write to targets[i] is only live until the read
+		     here.  As the condition codes match, we can propagate
+		     the set to here.  */
+		   candidate_rewire
+		     = SET_SRC (single_set (unmodified_insns[i]));
+
+		   /* Discount the cost calculation by one conditional
+		      set instruction.  As we are just putting out
+		      a group of SET instructions, any will do.  */
+		   cost -= insn_rtx_cost (PATTERN (get_last_insn ()),
+					  optimize_bb_for_speed_p (test_bb));
+		}
+	      else
+		candidate_rewire = temporaries[i];
+	    }
+	}
+      new_val = candidate_rewire;
+
+      /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */
+      if (if_info->then_else_reversed)
+	std::swap (old_val, new_val);
+
+      /* Actually emit the conditional move.  */
+      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				       x, y, new_val, old_val);
+
+      /* If we failed to expand the conditional move, drop out and don't
+	 try to continue.  */
+      if (temp_dest == NULL_RTX)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+
+      /* Track the cost of building these conditional instructions.  */
+      cost += insn_rtx_cost (PATTERN (get_last_insn ()),
+			     optimize_bb_for_speed_p (test_bb));
+
+      /* Bookkeeping.  */
+      count++;
+      targets.safe_push (target);
+      temporaries.safe_push (temp_dest);
+      unmodified_insns.safe_push (insn);
+    }
+
+  /* We must have seen some sort of insn to insert, otherwise we were
+     given an empty BB to convert, and we can't handle that.  */
+  if (unmodified_insns.is_empty ())
+    {
+      end_sequence ();
+      return FALSE;
+    }
+
+  /* Check if this is actually beneficial.  */
+  if (cost > COSTS_N_INSNS (if_info->branch_cost))
+     {
+       end_sequence ();
+       return FALSE;
+     }
+
+  /* Now fixup the assignments.  */
+  for (unsigned i = 0; i < count; i++)
+    noce_emit_move_insn (targets[i], temporaries[i]);
+
+  /* Actually emit the sequence.  */
+  rtx_insn *seq = get_insns ();
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
+  unshare_all_rtl_in_chain (seq);
+  end_sequence ();
+
+  if (!seq)
+    return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			   INSN_LOCATION (unmodified_insns.last ()));
+
+  /* Clean up THEN_BB and the edges in and out of it.  */
+  remove_edge (find_edge (test_bb, join_bb));
+  remove_edge (find_edge (then_bb, join_bb));
+  redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb);
+  delete_basic_block (then_bb);
+  num_true_changes++;
+
+  /* Maybe merge blocks now the jump is simple enough.  */
+  if (can_merge_blocks_p (test_bb, join_bb))
+    {
+      merge_blocks (test_bb, join_bb);
+      num_true_changes++;
+    }
+
+  num_updated_if_blocks++;
+  return TRUE;
+}
+
+/* Return true iff basic block TEST_BB is comprised of only
+   (SET (REG) (REG)) insns suitable for conversion to a series
+   of conditional moves.  */
+
+static bool
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+{
+  rtx_insn *insn;
+
+  /* We must have at least one real insn to convert, or there will
+     be trouble!  */
+  bool bb_is_not_empty = false;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      /* Skip over notes etc.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      /* We only handle SET insns.  */
+      rtx set = single_set (insn);
+      if (set == NULL_RTX)
+	return false;
+
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* We can possibly relax this, but for now only handle REG to REG
+	 moves.  This avoids any issues that might come from introducing
+	 loads/stores that might violate data-race-freedom guarantees.  */
+      if (!(REG_P (src) && REG_P (dest)))
+	return false;
+
+      /* Destination must be appropriate for a conditional write.  */
+      if (!noce_operand_ok (dest))
+	return false;
+
+      /* We must be able to conditionally move in this mode.  */
+      if (!can_conditionally_move_p (GET_MODE (dest)))
+	return false;
+
+      bb_is_not_empty = true;
+    }
+  return bb_is_not_empty;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -3004,12 +3221,22 @@ noce_process_if_block (struct noce_if_info *if_info)
      (1) if (...) x = a; else x = b;
      (2) x = b; if (...) x = a;
      (3) if (...) x = a;   // as if with an initial x = x.
-
+     (4) if (...) { x = a; y = b; z = c; }  // Like 3, for multiple SETS.
      The later patterns require jumps to be more expensive.
      For the if (...) x = a; else x = b; case we allow multiple insns
      inside the then and else blocks as long as their only effect is
      to calculate a value for x.
-     ??? For future expansion, look for multiple X in such patterns.  */
+     ??? For future expansion, further expand the "multiple X" rules.  */
+
+  /* First look for multiple SETS.  */
+  if (!else_bb
+      && HAVE_conditional_move
+      && !HAVE_cc0
+      && bb_ok_for_noce_convert_multiple_sets (then_bb))
+    {
+      if (noce_convert_multiple_sets (if_info))
+	return TRUE;
+    }
 
   if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
 				    &if_info->then_simple))

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-08 15:01 [Patch] Teach RTL ifcvt to handle multiple simple set instructions James Greenhalgh
@ 2015-09-10 18:24 ` Bernd Schmidt
  2015-09-10 21:34   ` Jeff Law
  2015-10-30 18:09   ` [Patch ifcvt] " James Greenhalgh
  0 siblings, 2 replies; 60+ messages in thread
From: Bernd Schmidt @ 2015-09-10 18:24 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches; +Cc: law, ebotcazou, steven

On 09/08/2015 04:53 PM, James Greenhalgh wrote:
> One big question I have with this patch is how I ought to write a meaningful
> cost model I've used. It seems like yet another misuse of RTX costs, and
> another bit of stuff for targets to carefully balance. Now, if the
> relative cost of branches and conditional move instructions is not
> carefully managed, you may enable or disable these optimisations. This is
> probably acceptable, but I dislike adding more and more gotcha's to
> target costs, as I get bitten by them hard enough as is!

The code you have seems reasonable, except that for compile time it 
might make sense to not even attempt the optimization if the number of 
sets is too large. I'm not too worried about that, but maybe you could 
bail out early if your cost estimate goes too much above the branch cost.

> +      /* If we were supposed to read from an earlier write in this block,
> +	 we've changed the register allocation.  Rewire the read.  While
> +	 we are looking, also try to catch a swap idiom.  */

So this is one interesting case; do you also have to worry about others 
(such as maybe setting the same register multiple times)?

> +  /* We must have seen some sort of insn to insert, otherwise we were
> +     given an empty BB to convert, and we can't handle that.  */
> +  if (unmodified_insns.is_empty ())
> +    {
> +      end_sequence ();
> +      return FALSE;
> +    }

Looks like some of the error conditions are tested twice across the two 
new functions? I think it would be better to get rid of one copy or turn 
the second one into a gcc_assert.

 > No testcase provided, as currently I don't know of targets with a high
 > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll 
leave it to others to decide whether we want to apply it. Other than the 
points above it looks OK to me.

Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-10 18:24 ` Bernd Schmidt
@ 2015-09-10 21:34   ` Jeff Law
  2015-09-11  8:51     ` Kyrill Tkachov
                       ` (2 more replies)
  2015-10-30 18:09   ` [Patch ifcvt] " James Greenhalgh
  1 sibling, 3 replies; 60+ messages in thread
From: Jeff Law @ 2015-09-10 21:34 UTC (permalink / raw)
  To: Bernd Schmidt, James Greenhalgh, gcc-patches; +Cc: ebotcazou, steven

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:
>
>  > No testcase provided, as currently I don't know of targets with a high
>  > enough branch cost to actually trigger the optimisation.
>
> Hmm, so the code would not actually be used right now? In that case I'll
> leave it to others to decide whether we want to apply it. Other than the
> points above it looks OK to me.
Some targets have -mbranch-cost to allow overriding the default costing. 
  visium has a branch cost of 10!  Several ports have a cost of 6 either 
unconditionally or when the branch is not well predicted.

Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some 
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative 
to anything other than the default value of "1".  It doesn't directly 
correspond to COSTS_N_INSNS or anything else.  So while using 
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually 
doesn't.  It's not even clear how a value of 10 relates to a value of 1 
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But 
with BRANCH_COST having no meaning relative to anything else I can see 
why Richard did things that way.

In an ideal world we'd find some mapping from BRANCH_COST that relates 
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist 
and we'll likely regress targets with any simplistic mapping.  But maybe 
now is the time to address that fundamental problem and deal with the 
fallout.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-10 21:34   ` Jeff Law
@ 2015-09-11  8:51     ` Kyrill Tkachov
  2015-09-11 21:49       ` Jeff Law
  2015-09-11  9:04     ` Bernd Schmidt
  2015-09-12 14:04     ` [Patch] Teach RTL ifcvt to handle multiple simple set instructions Eric Botcazou
  2 siblings, 1 reply; 60+ messages in thread
From: Kyrill Tkachov @ 2015-09-11  8:51 UTC (permalink / raw)
  To: Jeff Law, Bernd Schmidt, James Greenhalgh, gcc-patches; +Cc: ebotcazou, steven


On 10/09/15 22:11, Jeff Law wrote:
> On 09/10/2015 12:23 PM, Bernd Schmidt wrote:
>>   > No testcase provided, as currently I don't know of targets with a high
>>   > enough branch cost to actually trigger the optimisation.
>>
>> Hmm, so the code would not actually be used right now? In that case I'll
>> leave it to others to decide whether we want to apply it. Other than the
>> points above it looks OK to me.
> Some targets have -mbranch-cost to allow overriding the default costing.
>    visium has a branch cost of 10!  Several ports have a cost of 6 either
> unconditionally or when the branch is not well predicted.
>
> Presumably James is more interested in the ARM/AArch64 targets ;-)
>
> I think that's probably what James is most interested in getting some
> ideas around -- the cost model.
>
> I think the fundamental problem is BRANCH_COST isn't actually relative
> to anything other than the default value of "1".  It doesn't directly
> correspond to COSTS_N_INSNS or anything else.  So while using
> COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> doesn't.  It's not even clear how a value of 10 relates to a value of 1
> other than it's more expensive.
>
> ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> with BRANCH_COST having no meaning relative to anything else I can see
> why Richard did things that way.

Out of interest, what was the intended original meaning
of branch costs if it was not to be relative to instructions?

Thanks,
Kyrill

> In an ideal world we'd find some mapping from BRANCH_COST that relates
> to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> and we'll likely regress targets with any simplistic mapping.  But maybe
> now is the time to address that fundamental problem and deal with the
> fallout.
>
> jeff
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-10 21:34   ` Jeff Law
  2015-09-11  8:51     ` Kyrill Tkachov
@ 2015-09-11  9:04     ` Bernd Schmidt
  2015-09-11  9:08       ` Ramana Radhakrishnan
  2015-09-12 14:04     ` [Patch] Teach RTL ifcvt to handle multiple simple set instructions Eric Botcazou
  2 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2015-09-11  9:04 UTC (permalink / raw)
  To: Jeff Law, James Greenhalgh, gcc-patches; +Cc: ebotcazou, steven

On 09/10/2015 11:11 PM, Jeff Law wrote:
> I think that's probably what James is most interested in getting some
> ideas around -- the cost model.
>
> I think the fundamental problem is BRANCH_COST isn't actually relative
> to anything other than the default value of "1".  It doesn't directly
> correspond to COSTS_N_INSNS or anything else.  So while using
> COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> doesn't.  It's not even clear how a value of 10 relates to a value of 1
> other than it's more expensive.
>
> ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> with BRANCH_COST having no meaning relative to anything else I can see
> why Richard did things that way.
>
> In an ideal world we'd find some mapping from BRANCH_COST that relates
> to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> and we'll likely regress targets with any simplistic mapping.  But maybe
> now is the time to address that fundamental problem and deal with the
> fallout.

I think the right approach if we want to fix this is a new 
branch_cost_ninsns target hook (maybe with arguments taken_percentage, 
predictability), and gradually move everything to use that instead of 
BRANCH_COST.


Bernd


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-11  9:04     ` Bernd Schmidt
@ 2015-09-11  9:08       ` Ramana Radhakrishnan
  2015-09-11 10:55         ` James Greenhalgh
  0 siblings, 1 reply; 60+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-11  9:08 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, James Greenhalgh, gcc-patches, ebotcazou, steven

On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> On 09/10/2015 11:11 PM, Jeff Law wrote:
> >I think that's probably what James is most interested in getting some
> >ideas around -- the cost model.
> >
> >I think the fundamental problem is BRANCH_COST isn't actually relative
> >to anything other than the default value of "1".  It doesn't directly
> >correspond to COSTS_N_INSNS or anything else.  So while using
> >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> >other than it's more expensive.
> >
> >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> >with BRANCH_COST having no meaning relative to anything else I can see
> >why Richard did things that way.
> >
> >In an ideal world we'd find some mapping from BRANCH_COST that relates
> >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> >and we'll likely regress targets with any simplistic mapping.  But maybe
> >now is the time to address that fundamental problem and deal with the
> >fallout.
> 
> I think the right approach if we want to fix this is a new
> branch_cost_ninsns target hook (maybe with arguments
> taken_percentage, predictability), and gradually move everything to
> use that instead of BRANCH_COST.

Perhaps providing backends with the entire if-then-else block along
with the above mentioned information being if converted may be another
approach, it allows the backends to analyse what cases are good to
if-convert as per the ISA or micro-architecture and what aren't.

regards
Ramana

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-11  9:08       ` Ramana Radhakrishnan
@ 2015-09-11 10:55         ` James Greenhalgh
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
  0 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-09-11 10:55 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Bernd Schmidt, Jeff Law, gcc-patches, ebotcazou, steven

On Fri, Sep 11, 2015 at 10:04:12AM +0100, Ramana Radhakrishnan wrote:
> On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> > On 09/10/2015 11:11 PM, Jeff Law wrote:
> > >I think that's probably what James is most interested in getting some
> > >ideas around -- the cost model.
> > >
> > >I think the fundamental problem is BRANCH_COST isn't actually relative
> > >to anything other than the default value of "1".  It doesn't directly
> > >correspond to COSTS_N_INSNS or anything else.  So while using
> > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> > >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> > >other than it's more expensive.
> > >
> > >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> > >with BRANCH_COST having no meaning relative to anything else I can see
> > >why Richard did things that way.
> > >
> > >In an ideal world we'd find some mapping from BRANCH_COST that relates
> > >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> > >and we'll likely regress targets with any simplistic mapping.  But maybe
> > >now is the time to address that fundamental problem and deal with the
> > >fallout.
> > 
> > I think the right approach if we want to fix this is a new
> > branch_cost_ninsns target hook (maybe with arguments
> > taken_percentage, predictability), and gradually move everything to
> > use that instead of BRANCH_COST.
> 
> Perhaps providing backends with the entire if-then-else block along
> with the above mentioned information being if converted may be another
> approach, it allows the backends to analyse what cases are good to
> if-convert as per the ISA or micro-architecture and what aren't.

I'm not sure how much of this is likely to be target-dependent and how
much can just be abstracted to common ifcvt code resuing rtx_costs.

I've been sketching out a rough idea of a more applicable cost model for
RTL ifcvt, taking in to consideration what David mentioned regarding the
talks at cauldron. The question we want to ask is:

Which is preferable between:

  Before:
   (COSTS_N_INSNS cost of the compare+branch insns at the tail of the if BB.
     ??? (possibly) some factor related to BRANCH_COST)
   + weighted cost of then BB.
   + (if needed) weighted cost of else BB.

  After:
   seq_cost the candidate new sequence.

The weighted cost of the two BBs should mix in some idea as to the relative
probability that we execute them.

The tough part is figuring out how to (reasonably) factor in branch cost.
The reason that is tough is that BRANCH_COST is used inconsistently. Normally
it is not measured relative to anything, but is compared against magic numbers
for optimizations (each of which are really their own question to be posed
as above).

I don't have a good answer to that, nor a good answer as to what BRANCH_COST
should represent in future. The use in the compiler is sort-of consistent
with a measurement against instruction counts (i.e. a branch cost of 3 means
a branch is equivalent to 3 cheap instructions), but is sometimes just used
as a measure of expensive (a branch cost of >= 2 means that abs should be
expanded using a sequence of bit operations).

I'll look in to how the code in ifcvt starts to look with a modified cost
model and get back to you...

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-11  8:51     ` Kyrill Tkachov
@ 2015-09-11 21:49       ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2015-09-11 21:49 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernd Schmidt, James Greenhalgh, gcc-patches
  Cc: ebotcazou, steven

On 09/11/2015 02:49 AM, Kyrill Tkachov wrote:
>
> On 10/09/15 22:11, Jeff Law wrote:
>> On 09/10/2015 12:23 PM, Bernd Schmidt wrote:
>>>   > No testcase provided, as currently I don't know of targets with a
>>> high
>>>   > enough branch cost to actually trigger the optimisation.
>>>
>>> Hmm, so the code would not actually be used right now? In that case I'll
>>> leave it to others to decide whether we want to apply it. Other than the
>>> points above it looks OK to me.
>> Some targets have -mbranch-cost to allow overriding the default costing.
>>    visium has a branch cost of 10!  Several ports have a cost of 6 either
>> unconditionally or when the branch is not well predicted.
>>
>> Presumably James is more interested in the ARM/AArch64 targets ;-)
>>
>> I think that's probably what James is most interested in getting some
>> ideas around -- the cost model.
>>
>> I think the fundamental problem is BRANCH_COST isn't actually relative
>> to anything other than the default value of "1".  It doesn't directly
>> correspond to COSTS_N_INSNS or anything else.  So while using
>> COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
>> doesn't.  It's not even clear how a value of 10 relates to a value of 1
>> other than it's more expensive.
>>
>> ifcvt (and others) comparing to magic #s is more than a bit lame.  But
>> with BRANCH_COST having no meaning relative to anything else I can see
>> why Richard did things that way.
>
> Out of interest, what was the intended original meaning
> of branch costs if it was not to be relative to instructions?
I don't think it ever had one.  It's self-relative.  A cost of 2 is 
greater than a cost of 1.  No more, no less IIRC.   Lame?  Yes. 
Short-sighted?  Yes.  Should we try to fix it.  Yes.

If you look at how BRANCH_COST actually gets used, AFAIK it's tested 
only against "magic constants", which are themselves lame, short-sighted 
and need to be fixed.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-10 21:34   ` Jeff Law
  2015-09-11  8:51     ` Kyrill Tkachov
  2015-09-11  9:04     ` Bernd Schmidt
@ 2015-09-12 14:04     ` Eric Botcazou
  2 siblings, 0 replies; 60+ messages in thread
From: Eric Botcazou @ 2015-09-12 14:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Bernd Schmidt, James Greenhalgh, steven

> Some targets have -mbranch-cost to allow overriding the default costing.
>   visium has a branch cost of 10!

Yeah, the GR5 variant is pipelined but has no branch prediction; moreover 
there is an additional adverse effect coming for the instructions bus...

>   Several ports have a cost of 6 either unconditionally or when the branch
>   is not well predicted.

9 for UltraSPARC3, although this should probably be lowered if predictable_p.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-11 10:55         ` James Greenhalgh
@ 2015-09-25 15:06           ` James Greenhalgh
  2015-09-25 15:06             ` [Patch ifcvt 1/3] Factor out cost calculations from noce cases James Greenhalgh
                               ` (4 more replies)
  0 siblings, 5 replies; 60+ messages in thread
From: James Greenhalgh @ 2015-09-25 15:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 3733 bytes --]

Hi,

In relation to the patch I put up for review a few weeks ago to teach
RTL if-convert to handle multiple sets in a basic block [1], I was
asking about a sensible cost model to use. There was some consensus at
Cauldron that what should be done in this situation is to introduce a
target hook that delegates answering the question to the target.

This patch series introduces that new target hook to provide cost
decisions for the RTL ifcvt pass.

The idea is to give the target full visibility of the proposed
transformation, and allow it to respond as to whether if-conversion in that
way is profitable.

In order to preserve current behaviour across targets, we will need the
default implementation to keep to the strategy of simply comparing branch
cost against a magic number. Patch 1/3 performs this refactoring, which is
a bit hairy in some corner cases.

Patch 2/3 is a simple code move, pulling the definition of the if_info
structure used by RTL if-convert in to ifcvt.h where it can be included
by targets.

Patch 3/3 then introduces the new target hook, with the same default
behaviour as was previously in noce_is_profitable_p.

The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
I've verified with Spec2000 and Spec2006 runs that there are no code
generation differences for any of these three targets after the patch.

I also gave ultrasparc3 a quick go, from what I could see, I changed the
register allocation for the floating-point condition code registers.
Presumably this is a side effect of first constructing RTXen that I then
discard. I didn't see anything which looked like more frequent reloads or
substantial code generation changes, though I'm not familiar with the
intricacies of the Sparc condition registers :).

I've included a patch 4/3, to give an example of what a target might want
to do with this hook. It needs work for tuning and deciding how the function
should actually behave, but works if it is thought of as more of a
strawman/prototype than a patch submission.

Are parts 1, 2 and 3 OK?

Thanks,
James

[1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html

---
[Patch ifcvt 1/3] Factor out cost calculations from noce cases

2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Add a magic_number field :-(.
	(noce_is_profitable_p): New.
	(noce_try_store_flag_constants): Move cost calculation
	to after sequence generation, factor it out to noce_is_profitable_p.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_cmove_arith): Likewise.
	(noce_try_sign_mask): Add comment regarding cost calculations.

[Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h

2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Move to...
	* ifcvt.h (noce_if_info): ...Here.

[Patch ifcvt 3/3] Create a new target hook for deciding profitability
    of noce if-conversion

2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (costs): New hook vector.
	(ifcvt_noce_profitable_p): New hook.
	* doc/tm.texi.in: Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_ifcvt_noce_profitable_p): New.
	* targhooks.c (default_ifcvt_noce_profitable_p): New.
	* ifcvt.c (noce_profitable_p): Use new target hook.

[Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
    hook for AArch64

2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.c
	(aarch64_additional_branch_cost_for_probability): New.
	(aarch64_ifcvt_noce_profitable_p): Likewise.
	(TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch ifcvt 1/3] Factor out cost calculations from noce cases
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
@ 2015-09-25 15:06             ` James Greenhalgh
  2015-09-25 15:08             ` [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h James Greenhalgh
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2015-09-25 15:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]


Hi,

In this patch we try to pull out the cost calculations used by the
no-conditional-execution if-convert functions. We want to replicate the
logic of the current cost decisions, but to phrase it in a way which
can be pulled out as common. To preserve the current behaviour as best
as we can, this means asking the common question, "is a magic_number
less than or equal to branch_cost". Clearly this is not the question
we want to be asking longer term, but this preserves existing target
behaviour.

This is imperfect for a few reasons.

First, some of the more ambitious noce if-convert functions have a
(slightly) more complicated cost-model. This means that we have to jump
through hoops to present the cost calculation in the common form. These
hoops are not very big, but it does make the logic seem a bit... weird.

Second, because our long term goal is to hand the cost calculation off
to the target and make it better reflect a meaningful question, we must
first build the candidate ifcvt sequence for comparison. This will cause
a slight compile time regression as we now generate more sequences before
bailing out (each of which needs a cost calculation).

On the other hand, it should be clear from this point what we have to do
to lift this out to a target hook which can do a smart job, and I think
this fits with the overall direction we intend to take.

Bootstrapped and checked on x86_64-none-linux-gnu, aarch64-none-linux-gnu
and arm-none-linux-gnueabihf without issue. Comparison of Spec2000/Spec2006
code generation for these three targets showed no changes.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Add a magic_number field :-(.
	(noce_is_profitable_p): New.
	(noce_try_store_flag_constants): Move cost calculation
	to after sequence generation, factor it out to noce_is_profitable_p.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_cmove_arith): Likewise.
	(noce_try_sign_mask): Add comment regarding cost calculations.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-ifcvt-1-3-Factor-out-cost-calculations-from-no.patch --]
[-- Type: text/x-patch;  name=0001-Patch-ifcvt-1-3-Factor-out-cost-calculations-from-no.patch, Size: 10828 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..e89d567 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -820,6 +820,14 @@ struct noce_if_info
 
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
+
+  /* For if-convert transformations, the legacy way to decide whether
+     the transformation should be applied is a comparison of a magic
+     number against BRANCH_COST.  Ultimately, this should go away, but
+     to avoid regressing targets this field encodes that number so the
+     profitability analysis can remain unchanged.  */
+  unsigned int magic_number;
+
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -836,6 +844,19 @@ static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static bool noce_is_profitable_p (rtx_insn *, struct noce_if_info *);
+
+/* Given SEQ, which is a sequence we might want to generate after
+   if-conversion, and a basic-block structure in IF_INFO which represents
+   the code generation before if-conversion, return TRUE if this would
+   be a profitable transformation.  */
+
+static bool
+noce_is_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
+		      struct noce_if_info *if_info)
+{
+  return (if_info->branch_cost >= if_info->magic_number);
+}
 
 /* Helper function for noce_try_store_flag*.  */
 
@@ -1192,8 +1213,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   HOST_WIDE_INT itrue, ifalse, diff, tmp;
   int normalize;
   bool can_reverse;
+  bool no_cost_model = false;
   machine_mode mode = GET_MODE (if_info->x);;
   rtx common = NULL_RTX;
+  /* ??? There are paths through this function from which no cost function
+     is checked before conversion.  Maintain that behaviour by setting
+     the magic number used by noce_is_profitable_p to zero.  */
+  if_info->magic_number = 0;
 
   rtx a = if_info->a;
   rtx b = if_info->b;
@@ -1204,9 +1230,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       && CONST_INT_P (XEXP (a, 1))
       && CONST_INT_P (XEXP (b, 1))
       && rtx_equal_p (XEXP (a, 0), XEXP (b, 0))
-      && noce_operand_ok (XEXP (a, 0))
-      && if_info->branch_cost >= 2)
+      && noce_operand_ok (XEXP (a, 0)))
     {
+      if_info->magic_number = 2;
       common = XEXP (a, 0);
       a = XEXP (a, 1);
       b = XEXP (b, 1);
@@ -1278,23 +1304,33 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	    gcc_unreachable ();
 	}
-      else if (ifalse == 0 && exact_log2 (itrue) >= 0
-	       && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
-	normalize = 1;
-      else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+      else if (ifalse == 0 && exact_log2 (itrue) >= 0)
 	{
+	  if_info->magic_number = 2;
+	  if (STORE_FLAG_VALUE == 1)
+	    no_cost_model = true;
+	  normalize = 1;
+	}
+      else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse)
+	{
+	  if_info->magic_number = 2;
+	  if (STORE_FLAG_VALUE == 1)
+	    no_cost_model = true;
 	  normalize = 1;
 	  reversep = true;
 	}
-      else if (itrue == -1
-	       && (STORE_FLAG_VALUE == -1
-		   || if_info->branch_cost >= 2))
-	normalize = -1;
-      else if (ifalse == -1 && can_reverse
-	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
+      else if (itrue == -1)
 	{
+	  if_info->magic_number = 2;
+	  if (STORE_FLAG_VALUE == -1)
+	    no_cost_model = true;
+	  normalize = -1;
+	}
+      else if (ifalse == -1 && can_reverse)
+	{
+	  if_info->magic_number = 2;
+	  if (STORE_FLAG_VALUE == -1)
+	    no_cost_model = true;
 	  normalize = -1;
 	  reversep = true;
 	}
@@ -1385,6 +1421,10 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       if (!seq)
 	return FALSE;
 
+      /* Check if this is actually beneficial.  */
+      if (!no_cost_model && !noce_is_profitable_p (seq, if_info))
+	return FALSE;
+
       emit_insn_before_setloc (seq, if_info->jump,
 			       INSN_LOCATION (if_info->insn_a));
       return TRUE;
@@ -1446,8 +1486,7 @@ noce_try_addcc (struct noce_if_info *if_info)
 
       /* If that fails, construct conditional increment or decrement using
 	 setcc.  */
-      if (if_info->branch_cost >= 2
-	  && (XEXP (if_info->a, 1) == const1_rtx
+      if ((XEXP (if_info->a, 1) == const1_rtx
 	      || XEXP (if_info->a, 1) == constm1_rtx))
         {
 	  start_sequence ();
@@ -1477,6 +1516,11 @@ noce_try_addcc (struct noce_if_info *if_info)
 	      if (!seq)
 		return FALSE;
 
+	      /* Check if this is actually beneficial.  */
+	      if_info->magic_number = 2;
+	      if (!noce_is_profitable_p (seq, if_info))
+		return FALSE;
+
 	      emit_insn_before_setloc (seq, if_info->jump,
 				       INSN_LOCATION (if_info->insn_a));
 	      return TRUE;
@@ -1501,15 +1545,14 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
     return FALSE;
 
   reversep = 0;
-  if ((if_info->branch_cost >= 2
-       || STORE_FLAG_VALUE == -1)
-      && ((if_info->a == const0_rtx
+
+  if ((if_info->a == const0_rtx
 	   && rtx_equal_p (if_info->b, if_info->x))
 	  || ((reversep = (reversed_comparison_code (if_info->cond,
 						     if_info->jump)
 			   != UNKNOWN))
 	      && if_info->b == const0_rtx
-	      && rtx_equal_p (if_info->a, if_info->x))))
+	      && rtx_equal_p (if_info->a, if_info->x)))
     {
       start_sequence ();
       target = noce_emit_store_flag (if_info,
@@ -1523,7 +1566,7 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
       if (target)
 	{
-	  int old_cost, new_cost, insn_cost;
+	  int new_cost, old_cost;
 	  int speed_p;
 
 	  if (target != if_info->x)
@@ -1533,12 +1576,26 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 	  if (!seq)
 	    return FALSE;
 
+	  /* The previous costing code here calculated everything in the
+	     rtx_cost base, and compared it against
+	     COSTS_N_INSNS (if_info->branch_cost).  As we don't want to
+	     multiply branch cost, instead divide through by
+	     COSTS_N_INSNS (1) to get our magic_number.  We also need to
+	     take account of a possible magic_number of 2 which should
+	     be applied if STORE_FLAG_VALUE != -1.  */
 	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
-	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  old_cost = COSTS_N_INSNS (if_info->branch_cost) + insn_cost;
+	  old_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
 	  new_cost = seq_cost (seq, speed_p);
+	  if_info->magic_number = (new_cost - old_cost);
+	  if_info->magic_number
+	    = (if_info->magic_number / COSTS_N_INSNS (1))
+	      + ((if_info->magic_number % COSTS_N_INSNS (1)) ? 1 : 0);
 
-	  if (new_cost > old_cost)
+	  if_info->magic_number
+	    = MAX (((STORE_FLAG_VALUE != -1) ? 2 : 0),
+		   (if_info->magic_number));
+
+	  if (!noce_is_profitable_p (seq, if_info))
 	    return FALSE;
 
 	  emit_insn_before_setloc (seq, if_info->jump,
@@ -1703,9 +1760,7 @@ noce_try_cmove (struct noce_if_info *if_info)
 	 we don't know about, so give them a chance before trying this
 	 approach.  */
       else if (!targetm.have_conditional_execution ()
-		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b)
-		&& ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
-		    || if_info->branch_cost >= 3))
+		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b))
 	{
 	  machine_mode mode = GET_MODE (if_info->x);
 	  HOST_WIDE_INT ifalse = INTVAL (if_info->a);
@@ -1744,6 +1799,12 @@ noce_try_cmove (struct noce_if_info *if_info)
 	      if (!seq)
 		return FALSE;
 
+	      /* Check if this is actually beneficial.  */
+	      if_info->magic_number = STORE_FLAG_VALUE == -1
+				      ? 2 : 3;
+	      if (!noce_is_profitable_p (seq, if_info))
+		return FALSE;
+
 	      emit_insn_before_setloc (seq, if_info->jump,
 				   INSN_LOCATION (if_info->insn_a));
 	      return TRUE;
@@ -1930,7 +1991,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
      conditional on their addresses followed by a load.  Don't do this
      early because it'll screw alias analysis.  Note that we've
      already checked for no side effects.  */
-  /* ??? FIXME: Magic number 5.  */
+  /* ??? FIXME: Magic number 5.  We cost this here rather than through
+     noce_is_profitable_p as the fallback cases below can produce viable
+     transformations in the case where (if_info->branch_cost < 5).  */
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
       && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
@@ -1973,10 +2036,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   else
     else_cost = 0;
 
-  /* We're going to execute one of the basic blocks anyway, so
-     bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
-    return FALSE;
+  /* We want the most expensive of the above, divided through by
+     COSTS_N_INSNS (1).  */
+  if_info->magic_number = MAX (then_cost, else_cost);
+  if_info->magic_number
+    = (if_info->magic_number / COSTS_N_INSNS (1))
+      + ((if_info->magic_number % COSTS_N_INSNS (1)) ? 1 : 0);
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -2115,6 +2180,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!ifcvt_seq)
     return FALSE;
 
+  if (!noce_is_profitable_p (ifcvt_seq, if_info))
+    return FALSE;
+
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
   return TRUE;
@@ -2571,7 +2639,12 @@ noce_try_sign_mask (struct noce_if_info *if_info)
      non-zero (T) value and if INSN_B was taken from TEST_BB, or there was no
      INSN_B which can happen for e.g. conditional stores to memory.  For the
      cost computation use the block TEST_BB where the evaluation will end up
-     after the transformation.  */
+     after the transformation.
+     ??? The underlying calculation is a natural fit for the long term
+     direction of noce_is_profitable_p, but there is no way to transform
+     this cost calculation in to a comparison against branch_cost.  When
+     noce_is_profitable_p becomes a proper cost calulcation, this logic
+     should be cleaned up.  */
   t_unconditional =
     (t == if_info->b
      && (if_info->insn_b == NULL_RTX

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
  2015-09-25 15:06             ` [Patch ifcvt 1/3] Factor out cost calculations from noce cases James Greenhalgh
@ 2015-09-25 15:08             ` James Greenhalgh
  2015-09-25 15:14             ` [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 James Greenhalgh
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2015-09-25 15:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]


Simple code move. We're going to allow targets to work with this
information, so pull it somewhere they can see it.

No issues building toolchains after this transformation.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Move to...
	* ifcvt.h (noce_if_info): ...Here.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-Patch-ifcvt-2-3-Move-noce_if_info-in-to-ifcvt.h.patch --]
[-- Type: text/x-patch;  name=0002-Patch-ifcvt-2-3-Move-noce_if_info-in-to-ifcvt.h.patch, Size: 4899 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index e89d567..d7fc523 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -769,67 +769,6 @@ cond_exec_process_if_block (ce_if_block * ce_info,
   return FALSE;
 }
 \f
-/* Used by noce_process_if_block to communicate with its subroutines.
-
-   The subroutines know that A and B may be evaluated freely.  They
-   know that X is a register.  They should insert new instructions
-   before cond_earliest.  */
-
-struct noce_if_info
-{
-  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
-  basic_block test_bb, then_bb, else_bb, join_bb;
-
-  /* The jump that ends TEST_BB.  */
-  rtx_insn *jump;
-
-  /* The jump condition.  */
-  rtx cond;
-
-  /* New insns should be inserted before this one.  */
-  rtx_insn *cond_earliest;
-
-  /* Insns in the THEN and ELSE block.  There is always just this
-     one insns in those blocks.  The insns are single_set insns.
-     If there was no ELSE block, INSN_B is the last insn before
-     COND_EARLIEST, or NULL_RTX.  In the former case, the insn
-     operands are still valid, as if INSN_B was moved down below
-     the jump.  */
-  rtx_insn *insn_a, *insn_b;
-
-  /* The SET_SRC of INSN_A and INSN_B.  */
-  rtx a, b;
-
-  /* The SET_DEST of INSN_A.  */
-  rtx x;
-
-  /* True if this if block is not canonical.  In the canonical form of
-     if blocks, the THEN_BB is the block reached via the fallthru edge
-     from TEST_BB.  For the noce transformations, we allow the symmetric
-     form as well.  */
-  bool then_else_reversed;
-
-  /* True if the contents of then_bb and else_bb are a
-     simple single set instruction.  */
-  bool then_simple;
-  bool else_simple;
-
-  /* The total rtx cost of the instructions in then_bb and else_bb.  */
-  unsigned int then_cost;
-  unsigned int else_cost;
-
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
-
-  /* For if-convert transformations, the legacy way to decide whether
-     the transformation should be applied is a comparison of a magic
-     number against BRANCH_COST.  Ultimately, this should go away, but
-     to avoid regressing targets this field encodes that number so the
-     profitability analysis can remain unchanged.  */
-  unsigned int magic_number;
-
-};
-
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
 static int noce_try_store_flag (struct noce_if_info *);
diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
index 3e3dc5b..f1c2dc9 100644
--- a/gcc/ifcvt.h
+++ b/gcc/ifcvt.h
@@ -40,4 +40,64 @@ struct ce_if_block
   int pass;				/* Pass number.  */
 };
 
+/* Used by noce_process_if_block to communicate with its subroutines.
+
+   The subroutines know that A and B may be evaluated freely.  They
+   know that X is a register.  They should insert new instructions
+   before cond_earliest.  */
+
+struct noce_if_info
+{
+  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
+  basic_block test_bb, then_bb, else_bb, join_bb;
+
+  /* The jump that ends TEST_BB.  */
+  rtx_insn *jump;
+
+  /* The jump condition.  */
+  rtx cond;
+
+  /* New insns should be inserted before this one.  */
+  rtx_insn *cond_earliest;
+
+  /* Insns in the THEN and ELSE block.  There is always just this
+     one insns in those blocks.  The insns are single_set insns.
+     If there was no ELSE block, INSN_B is the last insn before
+     COND_EARLIEST, or NULL_RTX.  In the former case, the insn
+     operands are still valid, as if INSN_B was moved down below
+     the jump.  */
+  rtx_insn *insn_a, *insn_b;
+
+  /* The SET_SRC of INSN_A and INSN_B.  */
+  rtx a, b;
+
+  /* The SET_DEST of INSN_A.  */
+  rtx x;
+
+  /* True if this if block is not canonical.  In the canonical form of
+     if blocks, the THEN_BB is the block reached via the fallthru edge
+     from TEST_BB.  For the noce transformations, we allow the symmetric
+     form as well.  */
+  bool then_else_reversed;
+
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
+  /* Estimated cost of the particular branch instruction.  */
+  unsigned int branch_cost;
+
+  /* For some if-convert transformations, the canonical way to decide
+     whether the transformation should be applied is a comparison of
+     a magic number against BRANCH_COST.  Ultimately, this should go
+     away, but to avoid regressing targets this field encodes that
+     number so the profitability analysis can remain unchanged.  */
+  unsigned int magic_number;
+};
+
 #endif /* GCC_IFCVT_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
  2015-09-25 15:06             ` [Patch ifcvt 1/3] Factor out cost calculations from noce cases James Greenhalgh
  2015-09-25 15:08             ` [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h James Greenhalgh
@ 2015-09-25 15:14             ` James Greenhalgh
  2015-09-29 10:43               ` Richard Biener
  2015-09-25 15:28             ` [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion James Greenhalgh
  2015-09-29 10:36             ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs Richard Biener
  4 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-09-25 15:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]


Hi,

This patch is a simple prototype showing how a target might choose
to implement TARGET_COSTS_IFCVT_NOCE_IS_PROFITABLE_P.  It has not been
tuned, tested or looked at in any meaningful way.

While the patch is in need of more detailed analysis it is sufficient to
serve as an indication of what direction I was aiming for with this
patch set.

Clearly this is not OK for trunk without further work, but I thought I'd
include it as an afterthought for the costs rework.

Thanks,
James

---
2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.c
	(aarch64_additional_branch_cost_for_probability): New.
	(aarch64_ifcvt_noce_profitable_p): Likewise.
	(TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-Patch-Prototype-AArch64-ifcvt-4-3-Wire-up-the-new-if.patch --]
[-- Type: text/x-patch;  name=0004-Patch-Prototype-AArch64-ifcvt-4-3-Wire-up-the-new-if.patch, Size: 5538 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4fa6a4e..6a753e9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -76,6 +76,8 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "ifcvt.h"
+#include "math.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -13362,6 +13364,7 @@ aarch64_unspec_may_trap_p (const_rtx x, unsigned flags)
   return default_unspec_may_trap_p (x, flags);
 }
 
+
 /* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float.  */
 static tree
 aarch64_promoted_type (const_tree t)
@@ -13370,6 +13373,117 @@ aarch64_promoted_type (const_tree t)
     return float_type_node;
   return NULL_TREE;
 }
+
+/* Get an additional cost roughly mapping to the cycles cost of a
+   mispredicted branch, scaled by how predictable we think that branch
+   is.  Note that in the AArch64 back-end we try to map COSTS_N_INSNS (1)
+   to the latency of the cheapest instruction, with other instructions
+   scaled from that.
+
+   PREDICTABILITY is derived from the computed branch taken probability
+   and will be in the range [0..1], where 0.0 is a branch which should be
+   considered unpredictable, and 1.0 is a branch which is predictable.
+
+   Branches which are more predictable are less profitable to if-convert and
+   should return smaller values.  Branches which are more predictable are more
+   profitable to if-convert and should return higher values.
+
+   FORNOW: use math.h's log2 and take and return floats.
+   TODO: Design a function which gives more meaningful results.  */
+
+static float
+aarch64_additional_branch_cost_for_probability (float predictability)
+{
+  /* Scale to be in range [1..2], then take log2 of it.  Use this to
+     pick a point between two -mcpu (well, one day) dependent values.  */
+  unsigned int lower_bound = COSTS_N_INSNS (2);
+  unsigned int upper_bound = COSTS_N_INSNS (6);
+
+  return log2 (predictability  + 1.0f)
+	 * (upper_bound - lower_bound) + lower_bound;
+}
+
+/* Prototype only, return TRUE if SEQ is a profitable transformation of
+   the basic block structure defined in IF_INFO.
+
+   TODO: Design and analyze how this function should actually behave.
+   This is just guesswork.  */
+
+static bool
+aarch64_ifcvt_noce_profitable_p (rtx_insn *seq,
+				 struct noce_if_info *if_info)
+{
+  bool speed_p
+    = optimize_bb_for_speed_p (if_info->test_bb);
+
+  float sequence_cost (seq_cost (seq, speed_p));
+
+  /* We know that when expanding an if-convert sequence for AArch64 we
+     will generate a number of redundant comparisons.  Account for that
+     by slightly modifying the numbers.  */
+  sequence_cost /= 1.5f;
+
+  float then_cost (if_info->then_cost);
+  float else_cost = 0.0;
+
+  if (if_info->else_bb)
+    else_cost = if_info->else_cost;
+
+  /* Get the weighting factors.  */
+  edge te = EDGE_SUCC (if_info->test_bb, 0);
+  float taken_probability = ((float) te->probability) / REG_BR_PROB_BASE;
+
+  /* We have to reverse the branch taken probability if we have a
+     then_else_reversed branch structure.  We want to get this correct
+     so we scale the cost of the correct branch.  */
+  if (if_info->then_else_reversed)
+    taken_probability = 1.0f - taken_probability;
+
+  /* For branch_probability, we don't care which branch we are
+     considering, we just need a value in the range [0..0.5].  */
+  float branch_probability = taken_probability;
+  if (branch_probability > 0.5f)
+    branch_probability = 1.0f - branch_probability;
+
+  /* Taken_probability is in range [0.0..0.5].  Scale to be in
+     range [0.0..1.0], and subtract from 1.0 so a taken_probability of 0.5
+     gives a predictability of 0.0.  */
+  float predictability = 1.0 - (2.0 * branch_probability);
+
+  float weighted_average_cost = ((taken_probability
+				  * (then_cost - else_cost)) + then_cost);
+
+  if (!if_info->else_bb)
+    weighted_average_cost /= 2.0f;
+  float branch_cost
+    = aarch64_additional_branch_cost_for_probability (predictability);
+
+  float estimate_unconverted_cost = weighted_average_cost + branch_cost;
+
+  bool judgement = sequence_cost <= estimate_unconverted_cost;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\n AArch64 cost calculations for if-conversion:\n"
+	       "      taken probability %f\n"
+	       "      then block cost %f\n"
+	       "      else block cost %f\n"
+	       "      weighted average cost %f\n"
+	       "      additional branch cost %f\n"
+	       "      total unconverted cost %f\n"
+	       "      new sequence cost %f\n"
+	       "\n       Judgement: %s\n",
+	       taken_probability, then_cost, else_cost,
+	       weighted_average_cost, branch_cost,
+	       estimate_unconverted_cost, sequence_cost,
+	       judgement
+	       ? "Convert"
+	       : "Do not convert");
+    }
+
+  return judgement;
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -13674,6 +13788,10 @@ aarch64_promoted_type (const_tree t)
 #undef TARGET_USE_PSEUDO_PIC_REG
 #define TARGET_USE_PSEUDO_PIC_REG aarch64_use_pseudo_pic_reg
 
+#undef TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P
+#define TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P \
+  aarch64_ifcvt_noce_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
                               ` (2 preceding siblings ...)
  2015-09-25 15:14             ` [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 James Greenhalgh
@ 2015-09-25 15:28             ` James Greenhalgh
  2015-09-29 10:36             ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs Richard Biener
  4 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2015-09-25 15:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1512 bytes --]


Hi,

This patch introduces a new costs hook for deciding on the profitability
of an if-conversion candidate. We defer as much as possible for this
decision to the target, permitting the target to vary the outcome based
on the specific behaviours of a branch predictor in addition to any other
target-specific knowledge that might be available.

I had hoped to keep more of this generic, using rtx_costs and an additional
branch weighting factor to come up with a common formula, but that
proves troublesome for AArch64 where the expansion of multiple conditional
moves generates multiple redundant comparisons, which we know will be
later cleaned up.

As a target would have to make a judgement on how much of the new sequence
to cost, and can probably only do that reasonably with the old sequence as
context, I just expose both parts to the target and allow them to implement
whatever they feel best.

Bootstrapped on aarch64-none-linux-gnu, arm-none-linux-gnueabihf and
x86_64-none-linux-gnu with no issues, and checked code generation on these
platforms to ensure it has not changed.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (costs): New hook vector.
	(ifcvt_noce_profitable_p): New hook.
	* doc/tm.texi.in: Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_ifcvt_noce_profitable_p): New.
	* targhooks.c (default_ifcvt_noce_profitable_p): New.
	* ifcvt.c (noce_profitable_p): Use new target hook.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Patch-ifcvt-3-3-Create-a-new-target-hook-for-decidin.patch --]
[-- Type: text/x-patch;  name=0003-Patch-ifcvt-3-3-Create-a-new-target-hook-for-decidin.patch, Size: 3626 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index eb495a8..b169d7c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6190,6 +6190,12 @@ true for well-predicted branches. On many architectures the
 @code{BRANCH_COST} can be reduced then.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P (rtx_insn *@var{seq}, struct noce_if_info *@var{info})
+This hook should return TRUE if converting the IF-THEN-ELSE blocks
+  described in INFO with the if-converted sequence SEQ is expected to
+  be profitable.
+@end deftypefn
+
 Here are additional macros which do not specify precise relative costs,
 but only that certain actions are more expensive than GCC would
 ordinarily expect.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 92835c1..4765ec9 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4575,6 +4575,8 @@ true for well-predicted branches. On many architectures the
 @code{BRANCH_COST} can be reduced then.
 @end defmac
 
+@hook TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P
+
 Here are additional macros which do not specify precise relative costs,
 but only that certain actions are more expensive than GCC would
 ordinarily expect.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index d7fc523..e5e76bc 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -794,7 +794,7 @@ static bool
 noce_is_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
 		      struct noce_if_info *if_info)
 {
-  return (if_info->branch_cost >= if_info->magic_number);
+  return targetm.costs.ifcvt_noce_profitable_p (seq, if_info);
 }
 
 /* Helper function for noce_try_store_flag*.  */
diff --git a/gcc/target.def b/gcc/target.def
index f330709..996f31d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5876,6 +5876,21 @@ DEFHOOK
 
 HOOK_VECTOR_END (mode_switching)
 
+/* Cost functions.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_COSTS_"
+HOOK_VECTOR (TARGET_COSTS_, costs)
+
+DEFHOOK
+(ifcvt_noce_profitable_p,
+ "This hook should return TRUE if converting the IF-THEN-ELSE blocks\n\
+  described in INFO with the if-converted sequence SEQ is expected to\n\
+  be profitable.",
+ bool, (rtx_insn *seq, struct noce_if_info *info),
+ default_ifcvt_noce_profitable_p)
+
+HOOK_VECTOR_END (costs)
+
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
 
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..7b6dbe8 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "stringpool.h"
 #include "tree-ssanames.h"
+#include "ifcvt.h"
 
 
 bool
@@ -1922,4 +1923,14 @@ can_use_doloop_if_innermost (const widest_int &, const widest_int &,
   return loop_depth == 1;
 }
 
+/* For the default implementation, match the legacy logic by simply
+   comparing the estimated branch cost against a magic number.  */
+
+bool
+default_ifcvt_noce_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
+				 struct noce_if_info *if_info)
+{
+  return (if_info->branch_cost >= if_info->magic_number);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..076d513 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -240,4 +240,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 						  tree type ATTRIBUTE_UNUSED,
 						  int *pretend_arg_size ATTRIBUTE_UNUSED,
 						  int second_time ATTRIBUTE_UNUSED);
+
+extern bool default_ifcvt_noce_profitable_p (rtx_insn *,
+					     struct noce_if_info *);
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
                               ` (3 preceding siblings ...)
  2015-09-25 15:28             ` [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion James Greenhalgh
@ 2015-09-29 10:36             ` Richard Biener
  2015-09-29 15:28               ` James Greenhalgh
  4 siblings, 1 reply; 60+ messages in thread
From: Richard Biener @ 2015-09-29 10:36 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> Hi,
>
> In relation to the patch I put up for review a few weeks ago to teach
> RTL if-convert to handle multiple sets in a basic block [1], I was
> asking about a sensible cost model to use. There was some consensus at
> Cauldron that what should be done in this situation is to introduce a
> target hook that delegates answering the question to the target.

Err - the consensus was to _not_ add gazillion of special target hooks
but instead enhance what we have with rtx_cost so that passes can
rely on comparing before and after costs of a sequence of insns.

Richard.

> This patch series introduces that new target hook to provide cost
> decisions for the RTL ifcvt pass.
>
> The idea is to give the target full visibility of the proposed
> transformation, and allow it to respond as to whether if-conversion in that
> way is profitable.
>
> In order to preserve current behaviour across targets, we will need the
> default implementation to keep to the strategy of simply comparing branch
> cost against a magic number. Patch 1/3 performs this refactoring, which is
> a bit hairy in some corner cases.
>
> Patch 2/3 is a simple code move, pulling the definition of the if_info
> structure used by RTL if-convert in to ifcvt.h where it can be included
> by targets.
>
> Patch 3/3 then introduces the new target hook, with the same default
> behaviour as was previously in noce_is_profitable_p.
>
> The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
> I've verified with Spec2000 and Spec2006 runs that there are no code
> generation differences for any of these three targets after the patch.
>
> I also gave ultrasparc3 a quick go, from what I could see, I changed the
> register allocation for the floating-point condition code registers.
> Presumably this is a side effect of first constructing RTXen that I then
> discard. I didn't see anything which looked like more frequent reloads or
> substantial code generation changes, though I'm not familiar with the
> intricacies of the Sparc condition registers :).
>
> I've included a patch 4/3, to give an example of what a target might want
> to do with this hook. It needs work for tuning and deciding how the function
> should actually behave, but works if it is thought of as more of a
> strawman/prototype than a patch submission.
>
> Are parts 1, 2 and 3 OK?
>
> Thanks,
> James
>
> [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
>
> ---
> [Patch ifcvt 1/3] Factor out cost calculations from noce cases
>
> 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * ifcvt.c (noce_if_info): Add a magic_number field :-(.
>         (noce_is_profitable_p): New.
>         (noce_try_store_flag_constants): Move cost calculation
>         to after sequence generation, factor it out to noce_is_profitable_p.
>         (noce_try_addcc): Likewise.
>         (noce_try_store_flag_mask): Likewise.
>         (noce_try_cmove): Likewise.
>         (noce_try_cmove_arith): Likewise.
>         (noce_try_sign_mask): Add comment regarding cost calculations.
>
> [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
>
> 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * ifcvt.c (noce_if_info): Move to...
>         * ifcvt.h (noce_if_info): ...Here.
>
> [Patch ifcvt 3/3] Create a new target hook for deciding profitability
>     of noce if-conversion
>
> 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * target.def (costs): New hook vector.
>         (ifcvt_noce_profitable_p): New hook.
>         * doc/tm.texi.in: Document it.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_ifcvt_noce_profitable_p): New.
>         * targhooks.c (default_ifcvt_noce_profitable_p): New.
>         * ifcvt.c (noce_profitable_p): Use new target hook.
>
> [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
>     hook for AArch64
>
> 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.c
>         (aarch64_additional_branch_cost_for_probability): New.
>         (aarch64_ifcvt_noce_profitable_p): Likewise.
>         (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64
  2015-09-25 15:14             ` [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 James Greenhalgh
@ 2015-09-29 10:43               ` Richard Biener
  0 siblings, 0 replies; 60+ messages in thread
From: Richard Biener @ 2015-09-29 10:43 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch is a simple prototype showing how a target might choose
> to implement TARGET_COSTS_IFCVT_NOCE_IS_PROFITABLE_P.  It has not been
> tuned, tested or looked at in any meaningful way.
>
> While the patch is in need of more detailed analysis it is sufficient to
> serve as an indication of what direction I was aiming for with this
> patch set.
>
> Clearly this is not OK for trunk without further work, but I thought I'd
> include it as an afterthought for the costs rework.

First of all don't include math.h or use FP math on the host.  If you need
fractional arithmetic use sreal.

It looks like with your hook implementation you are mostly hiding magic
numbers in the target.  I'm not sure how this is better than exposing them
as user-accessible --params (and thus their defaults controllable by
the target).

Richard.

> Thanks,
> James
>
> ---
> 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.c
>         (aarch64_additional_branch_cost_for_probability): New.
>         (aarch64_ifcvt_noce_profitable_p): Likewise.
>         (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-29 10:36             ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs Richard Biener
@ 2015-09-29 15:28               ` James Greenhalgh
  2015-09-29 19:52                 ` Mike Stump
                                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: James Greenhalgh @ 2015-09-29 15:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> > Hi,
> >
> > In relation to the patch I put up for review a few weeks ago to teach
> > RTL if-convert to handle multiple sets in a basic block [1], I was
> > asking about a sensible cost model to use. There was some consensus at
> > Cauldron that what should be done in this situation is to introduce a
> > target hook that delegates answering the question to the target.
> 
> Err - the consensus was to _not_ add gazillion of special target hooks
> but instead enhance what we have with rtx_cost so that passes can
> rely on comparing before and after costs of a sequence of insns.

Ah, I was not able to attend Cauldron this year, so I was trying to pick out
"consensus" from the video. Rewatching it now, I see a better phrase would
be "suggestion with some support".

Watching the video a second time, it seems your proposal is that we improve
the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
get us some way to making a smart decision in if-convert, but I'm not
convinced it allows us to answer the question we are interested in.

We have the rtx for before and after, and we can generate costs for these
sequences. This allows us to calculate some weighted cost of the
instructions based on the calculated probabilities that each block is
executed. However, we are missing information on how expensive the branch
is, and we have no way to get that through an RTX-costs infrastructure.

We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
on its predictability. This is difficult as COSTS_N_INSNS units can differ
depending on whether you are talking about floating-point or integer code.
By this I mean, the compiler considers a SET which costs more than
COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
In reality, these instructions may have different latency performance
characteristics. What real world quantity are we trying to invoke when we
say a branch costs the same as 3 SET instructions of any type? It certainly
isn't mispredict penalty (likely measured in cycles, not relative to the cost
of a SET instruction, which may well be completely free on modern x86
processors), nor is it the cost of executing the branch instruction which
is often constant to resolve regardless of predicted/mispredicted status.

On the other side of the equation, we want a cost for the converted
sequence. We can build a cost of the generated rtl sequence, but for
targets like AArch64 this is going to be wildly off. AArch64 will expand
(a > b) ? x : y; as a set to the CC register, followed by a conditional
move based on the CC register. Consequently, where we have multiple sets
back to back we end up with:

  set CC (a > b)
  set x1 (CC ? x : y)
  set CC (a > b)
  set x2 (CC ? x : z)
  set CC (a > b)
  set x3 (CC ? x : k)

Which we know will be simplified later to:

  set CC (a > b)
  set x1 (CC ? x : y)
  set x2 (CC ? x : z)
  set x3 (CC ? x : k)

I imagine other targets have something similar in their expansion of
mov<mode>cc (though I haven't looked).

Our comparison for if-conversion then must be:

  weighted_old_cost = (taken_probability * (then_bb_cost)
			- (1 - taken_probability) * (else_bb_cost));
  branch_cost = branch_cost_in_insns (taken_probability)
  weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence)

  profitable = weighted_new_cost <= weighted_old_cost + branch_cost

And we must define:

  branch_cost_in_insns (taken_probability)
  redundancy_factor (new_sequence)

At that point, I feel you are better giving the entire sequence to the
target and asking it to implement whatever logic is needed to return a
profitable/unprofitable analysis of the transformation.

The "redundancy_factor" in particular is pretty tough to define in a way
which makes sense outside of if_convert, without adding some pretty
detailed analysis to decide what might or might not be eliminated by
later passes. The alternative is to weight the other side of the equation
by tuning the cost of branch_cost_in_insns high. This only serves to increase
the disconnect between a real-world cost and a number to tweak to game
code generation.

If you have a different way of phrasing the if-conversion question that
avoids the two very specific hooks, I'd be happy to try taking the patches
in that direction. I don't see a way to implement this as just queries to
a costing function which does not need substantial target and pass
dependent tweaking to make behave correctly.

Thanks,
James

> > This patch series introduces that new target hook to provide cost
> > decisions for the RTL ifcvt pass.
> >
> > The idea is to give the target full visibility of the proposed
> > transformation, and allow it to respond as to whether if-conversion in that
> > way is profitable.
> >
> > In order to preserve current behaviour across targets, we will need the
> > default implementation to keep to the strategy of simply comparing branch
> > cost against a magic number. Patch 1/3 performs this refactoring, which is
> > a bit hairy in some corner cases.
> >
> > Patch 2/3 is a simple code move, pulling the definition of the if_info
> > structure used by RTL if-convert in to ifcvt.h where it can be included
> > by targets.
> >
> > Patch 3/3 then introduces the new target hook, with the same default
> > behaviour as was previously in noce_is_profitable_p.
> >
> > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
> > I've verified with Spec2000 and Spec2006 runs that there are no code
> > generation differences for any of these three targets after the patch.
> >
> > I also gave ultrasparc3 a quick go, from what I could see, I changed the
> > register allocation for the floating-point condition code registers.
> > Presumably this is a side effect of first constructing RTXen that I then
> > discard. I didn't see anything which looked like more frequent reloads or
> > substantial code generation changes, though I'm not familiar with the
> > intricacies of the Sparc condition registers :).
> >
> > I've included a patch 4/3, to give an example of what a target might want
> > to do with this hook. It needs work for tuning and deciding how the function
> > should actually behave, but works if it is thought of as more of a
> > strawman/prototype than a patch submission.
> >
> > Are parts 1, 2 and 3 OK?
> >
> > Thanks,
> > James
> >
> > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
> >
> > ---
> > [Patch ifcvt 1/3] Factor out cost calculations from noce cases
> >
> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * ifcvt.c (noce_if_info): Add a magic_number field :-(.
> >         (noce_is_profitable_p): New.
> >         (noce_try_store_flag_constants): Move cost calculation
> >         to after sequence generation, factor it out to noce_is_profitable_p.
> >         (noce_try_addcc): Likewise.
> >         (noce_try_store_flag_mask): Likewise.
> >         (noce_try_cmove): Likewise.
> >         (noce_try_cmove_arith): Likewise.
> >         (noce_try_sign_mask): Add comment regarding cost calculations.
> >
> > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
> >
> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * ifcvt.c (noce_if_info): Move to...
> >         * ifcvt.h (noce_if_info): ...Here.
> >
> > [Patch ifcvt 3/3] Create a new target hook for deciding profitability
> >     of noce if-conversion
> >
> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * target.def (costs): New hook vector.
> >         (ifcvt_noce_profitable_p): New hook.
> >         * doc/tm.texi.in: Document it.
> >         * doc/tm.texi: Regenerate.
> >         * targhooks.h (default_ifcvt_noce_profitable_p): New.
> >         * targhooks.c (default_ifcvt_noce_profitable_p): New.
> >         * ifcvt.c (noce_profitable_p): Use new target hook.
> >
> > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
> >     hook for AArch64
> >
> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * config/aarch64/aarch64.c
> >         (aarch64_additional_branch_cost_for_probability): New.
> >         (aarch64_ifcvt_noce_profitable_p): Likewise.
> >         (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-29 15:28               ` James Greenhalgh
@ 2015-09-29 19:52                 ` Mike Stump
  2015-09-30  8:42                   ` Richard Biener
  2015-09-30  8:48                 ` Richard Biener
  2015-10-01  9:37                 ` Bernd Schmidt
  2 siblings, 1 reply; 60+ messages in thread
From: Mike Stump @ 2015-09-29 19:52 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Richard Biener, GCC Patches, Ramana Radhakrishnan, Bernd Schmidt,
	Jeff Law, Eric Botcazou, Steven Bosscher

On Sep 29, 2015, at 7:31 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>> 
>>> In relation to the patch I put up for review a few weeks ago to teach
>>> RTL if-convert to handle multiple sets in a basic block [1], I was
>>> asking about a sensible cost model to use. There was some consensus at
>>> Cauldron that what should be done in this situation is to introduce a
>>> target hook that delegates answering the question to the target.
>> 
>> Err - the consensus was to _not_ add gazillion of special target hooks
>> but instead enhance what we have with rtx_cost so that passes can
>> rely on comparing before and after costs of a sequence of insns.
> 
> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
> "consensus" from the video. Rewatching it now, I see a better phrase would
> be "suggestion with some support”.

I’m not a big fan of rtx_cost.  To me it feels more like a crude, sledge hammer.  Now, that is the gcc way, we have a ton of these things, but would be nice to refine the tools so that the big escape hatch isn’t used as often and we have more finer grained ways of doing things.  rtx_cost should be what a code-generator generates with most new ports when they use the nice api to do a port.  The old sledge hammer wielding ports may well always define rtx_cost themselves, but, we should shoot for something better.

As a concrete example, I now have a code-generator for enum reg_class, N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and oh my, it is so much nicer to user than the original api.  If you only ever have to write once these things, fine, but, if you develop and prototype CPUs, the existing interface is, well, less than ideal.  I can do things like:

gccrclass
  rc_gprs = “GENERAL”;

r gpr[] = { rc_gprs, Fixed, Used,
                "$zero", "$sp", "$fp", "$lr" };
r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"),
                "$sav1",   "$sav2",   "$sav3",   "$sav4”,

and get all the other goop I need for free.  I’d encourage people to find a way to do up an rtx_cost generator.  If you're a port maintainer, and want to redo your port to use a nicer api to do the registers, let me know.  I’d love to see progress made to rid gcc of the old crappy apis.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-29 19:52                 ` Mike Stump
@ 2015-09-30  8:42                   ` Richard Biener
  0 siblings, 0 replies; 60+ messages in thread
From: Richard Biener @ 2015-09-30  8:42 UTC (permalink / raw)
  To: Mike Stump
  Cc: James Greenhalgh, GCC Patches, Ramana Radhakrishnan,
	Bernd Schmidt, Jeff Law, Eric Botcazou, Steven Bosscher

On Tue, Sep 29, 2015 at 9:23 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Sep 29, 2015, at 7:31 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>>> <james.greenhalgh@arm.com> wrote:
>>>>
>>>> In relation to the patch I put up for review a few weeks ago to teach
>>>> RTL if-convert to handle multiple sets in a basic block [1], I was
>>>> asking about a sensible cost model to use. There was some consensus at
>>>> Cauldron that what should be done in this situation is to introduce a
>>>> target hook that delegates answering the question to the target.
>>>
>>> Err - the consensus was to _not_ add gazillion of special target hooks
>>> but instead enhance what we have with rtx_cost so that passes can
>>> rely on comparing before and after costs of a sequence of insns.
>>
>> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
>> "consensus" from the video. Rewatching it now, I see a better phrase would
>> be "suggestion with some support”.
>
> I’m not a big fan of rtx_cost.  To me it feels more like a crude, sledge hammer.  Now, that is the gcc way, we have a ton of these things, but would be nice to refine the tools so that the big escape hatch isn’t used as often and we have more finer grained ways of doing things.  rtx_cost should be what a code-generator generates with most new ports when they use the nice api to do a port.  The old sledge hammer wielding ports may well always define rtx_cost themselves, but, we should shoot for something better.
>
> As a concrete example, I now have a code-generator for enum reg_class, N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and oh my, it is so much nicer to user than the original api.  If you only ever have to write once these things, fine, but, if you develop and prototype CPUs, the existing interface is, well, less than ideal.  I can do things like:
>
> gccrclass
>   rc_gprs = “GENERAL”;
>
> r gpr[] = { rc_gprs, Fixed, Used,
>                 "$zero", "$sp", "$fp", "$lr" };
> r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"),
>                 "$sav1",   "$sav2",   "$sav3",   "$sav4”,
>
> and get all the other goop I need for free.  I’d encourage people to find a way to do up an rtx_cost generator.  If you're a port maintainer, and want to redo your port to use a nicer api to do the registers, let me know.  I’d love to see progress made to rid gcc of the old crappy apis.

I agree that rtx_cost isn't the nicest thing either.  But adding hooks
like my_transform_proftable_p (X *) with 'X' being pass private
data structure isn't very great design either.  In fact it's worse IMHO.

Richard.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-29 15:28               ` James Greenhalgh
  2015-09-29 19:52                 ` Mike Stump
@ 2015-09-30  8:48                 ` Richard Biener
  2015-09-30 19:01                   ` Mike Stump
  2015-10-01  9:37                 ` Bernd Schmidt
  2 siblings, 1 reply; 60+ messages in thread
From: Richard Biener @ 2015-09-30  8:48 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On Tue, Sep 29, 2015 at 4:31 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>> > Hi,
>> >
>> > In relation to the patch I put up for review a few weeks ago to teach
>> > RTL if-convert to handle multiple sets in a basic block [1], I was
>> > asking about a sensible cost model to use. There was some consensus at
>> > Cauldron that what should be done in this situation is to introduce a
>> > target hook that delegates answering the question to the target.
>>
>> Err - the consensus was to _not_ add gazillion of special target hooks
>> but instead enhance what we have with rtx_cost so that passes can
>> rely on comparing before and after costs of a sequence of insns.
>
> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
> "consensus" from the video. Rewatching it now, I see a better phrase would
> be "suggestion with some support".
>
> Watching the video a second time, it seems your proposal is that we improve
> the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
> get us some way to making a smart decision in if-convert, but I'm not
> convinced it allows us to answer the question we are interested in.
>
> We have the rtx for before and after, and we can generate costs for these
> sequences. This allows us to calculate some weighted cost of the
> instructions based on the calculated probabilities that each block is
> executed. However, we are missing information on how expensive the branch
> is, and we have no way to get that through an RTX-costs infrastructure.

Yeah, though during the meeting at the Cauldron I was asking on whether we
maybe want a replacement_cost hook that can assume the to-be-replaced
sequence is in the IL thus the hook can inspect insn_bb and thus get at
branch costs ...

Surely the proposed rtx_cost infrastructure enhancements will not
cover all cases
so the thing I wanted to throw in was that there was _not_ consensus that cost
should be computed by pass specific target hooks that allow the target
to inspect
pass private data.  Because that's a maintainance nightmare if you change a pass
and have to second guess 50 targets cost hook implementations.

> We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
> on its predictability. This is difficult as COSTS_N_INSNS units can differ
> depending on whether you are talking about floating-point or integer code.

Yes, which is why I suggested a replacement cost ...  Of course the question is
what you feed that hook as in principle it would be nice to avoid building the
replacement RTXen until we know doing that will be necessary (the replacement
is profitable).

Maybe as soon as CFG changes are involved we need to think about adding
a BB frequency to the hook though factoring in that can be done in the passes.
What is the interesting part for the target is probably the cost of
the branch itself
as that can vary depending on predictability (which is usually very
hard to assess
at compile-time anyway).  So what about a branch_cost hook that takes
taken/not-taken probabilities as argument?

Note that the agreement during the discussion was that all costs need to be
comparable.

> By this I mean, the compiler considers a SET which costs more than
> COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
> of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
> In reality, these instructions may have different latency performance
> characteristics. What real world quantity are we trying to invoke when we
> say a branch costs the same as 3 SET instructions of any type? It certainly
> isn't mispredict penalty (likely measured in cycles, not relative to the cost
> of a SET instruction, which may well be completely free on modern x86
> processors), nor is it the cost of executing the branch instruction which
> is often constant to resolve regardless of predicted/mispredicted status.
>
> On the other side of the equation, we want a cost for the converted
> sequence. We can build a cost of the generated rtl sequence, but for
> targets like AArch64 this is going to be wildly off. AArch64 will expand
> (a > b) ? x : y; as a set to the CC register, followed by a conditional
> move based on the CC register. Consequently, where we have multiple sets
> back to back we end up with:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set CC (a > b)
>   set x2 (CC ? x : z)
>   set CC (a > b)
>   set x3 (CC ? x : k)
>
> Which we know will be simplified later to:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set x2 (CC ? x : z)
>   set x3 (CC ? x : k)
>
> I imagine other targets have something similar in their expansion of
> mov<mode>cc (though I haven't looked).
>
> Our comparison for if-conversion then must be:
>
>   weighted_old_cost = (taken_probability * (then_bb_cost)
>                         - (1 - taken_probability) * (else_bb_cost));
>   branch_cost = branch_cost_in_insns (taken_probability)
>   weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence)
>
>   profitable = weighted_new_cost <= weighted_old_cost + branch_cost
>
> And we must define:
>
>   branch_cost_in_insns (taken_probability)
>   redundancy_factor (new_sequence)
>
> At that point, I feel you are better giving the entire sequence to the
> target and asking it to implement whatever logic is needed to return a
> profitable/unprofitable analysis of the transformation.

Sure, that was what was suggested at the Cauldron - rtx_cost on individual
insns (rtx cost doesn't even work on that level usually!) isn't coarse enough.
We're C++ now so we'd pass the hook an iterator which it can use to
iterate over the insns it should compute the cost of.

> The "redundancy_factor" in particular is pretty tough to define in a way
> which makes sense outside of if_convert, without adding some pretty
> detailed analysis to decide what might or might not be eliminated by
> later passes. The alternative is to weight the other side of the equation
> by tuning the cost of branch_cost_in_insns high. This only serves to increase
> the disconnect between a real-world cost and a number to tweak to game
> code generation.
>
> If you have a different way of phrasing the if-conversion question that
> avoids the two very specific hooks, I'd be happy to try taking the patches
> in that direction. I don't see a way to implement this as just queries to
> a costing function which does not need substantial target and pass
> dependent tweaking to make behave correctly.

From the patch I can't even see what the question is ;)  I only see
"is the transform profitable".

I saw from your hook implementation that it's basically a set of magic
numbers.  So I suggested to make those numbers target configurable
instead.

IMHO we shouldn't accept new cost hooks that are only implemented
by a single target.  Instead it should be proved that the hook can
improve code on multiple targets.

Richard.

> Thanks,
> James
>
>> > This patch series introduces that new target hook to provide cost
>> > decisions for the RTL ifcvt pass.
>> >
>> > The idea is to give the target full visibility of the proposed
>> > transformation, and allow it to respond as to whether if-conversion in that
>> > way is profitable.
>> >
>> > In order to preserve current behaviour across targets, we will need the
>> > default implementation to keep to the strategy of simply comparing branch
>> > cost against a magic number. Patch 1/3 performs this refactoring, which is
>> > a bit hairy in some corner cases.
>> >
>> > Patch 2/3 is a simple code move, pulling the definition of the if_info
>> > structure used by RTL if-convert in to ifcvt.h where it can be included
>> > by targets.
>> >
>> > Patch 3/3 then introduces the new target hook, with the same default
>> > behaviour as was previously in noce_is_profitable_p.
>> >
>> > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
>> > I've verified with Spec2000 and Spec2006 runs that there are no code
>> > generation differences for any of these three targets after the patch.
>> >
>> > I also gave ultrasparc3 a quick go, from what I could see, I changed the
>> > register allocation for the floating-point condition code registers.
>> > Presumably this is a side effect of first constructing RTXen that I then
>> > discard. I didn't see anything which looked like more frequent reloads or
>> > substantial code generation changes, though I'm not familiar with the
>> > intricacies of the Sparc condition registers :).
>> >
>> > I've included a patch 4/3, to give an example of what a target might want
>> > to do with this hook. It needs work for tuning and deciding how the function
>> > should actually behave, but works if it is thought of as more of a
>> > strawman/prototype than a patch submission.
>> >
>> > Are parts 1, 2 and 3 OK?
>> >
>> > Thanks,
>> > James
>> >
>> > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
>> >
>> > ---
>> > [Patch ifcvt 1/3] Factor out cost calculations from noce cases
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         * ifcvt.c (noce_if_info): Add a magic_number field :-(.
>> >         (noce_is_profitable_p): New.
>> >         (noce_try_store_flag_constants): Move cost calculation
>> >         to after sequence generation, factor it out to noce_is_profitable_p.
>> >         (noce_try_addcc): Likewise.
>> >         (noce_try_store_flag_mask): Likewise.
>> >         (noce_try_cmove): Likewise.
>> >         (noce_try_cmove_arith): Likewise.
>> >         (noce_try_sign_mask): Add comment regarding cost calculations.
>> >
>> > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         * ifcvt.c (noce_if_info): Move to...
>> >         * ifcvt.h (noce_if_info): ...Here.
>> >
>> > [Patch ifcvt 3/3] Create a new target hook for deciding profitability
>> >     of noce if-conversion
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         * target.def (costs): New hook vector.
>> >         (ifcvt_noce_profitable_p): New hook.
>> >         * doc/tm.texi.in: Document it.
>> >         * doc/tm.texi: Regenerate.
>> >         * targhooks.h (default_ifcvt_noce_profitable_p): New.
>> >         * targhooks.c (default_ifcvt_noce_profitable_p): New.
>> >         * ifcvt.c (noce_profitable_p): Use new target hook.
>> >
>> > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
>> >     hook for AArch64
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenhalgh@arm.com>
>> >
>> >         * config/aarch64/aarch64.c
>> >         (aarch64_additional_branch_cost_for_probability): New.
>> >         (aarch64_ifcvt_noce_profitable_p): Likewise.
>> >         (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
>>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-30  8:48                 ` Richard Biener
@ 2015-09-30 19:01                   ` Mike Stump
  0 siblings, 0 replies; 60+ messages in thread
From: Mike Stump @ 2015-09-30 19:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: James Greenhalgh, GCC Patches, Ramana Radhakrishnan,
	Bernd Schmidt, Jeff Law, Eric Botcazou, Steven Bosscher

On Sep 30, 2015, at 1:04 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> So what about a branch_cost hook that takes taken/not-taken probabilities as argument?

So, for my port, I need to know %prediction as well to calculate cost.  I know, kinda sucks.  Or put another way, I want to explain the cost taken, predicted, not-taken, predicted, taken, mis-predicted, and not-taken-mis-predicted and let the caller sort out if the branch will be predicted or mis-predicted, as it can do the math itself and that math is target independent.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-09-29 15:28               ` James Greenhalgh
  2015-09-29 19:52                 ` Mike Stump
  2015-09-30  8:48                 ` Richard Biener
@ 2015-10-01  9:37                 ` Bernd Schmidt
  2015-10-09 11:28                   ` Bernd Schmidt
  2 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2015-10-01  9:37 UTC (permalink / raw)
  To: James Greenhalgh, Richard Biener
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On 09/29/2015 04:31 PM, James Greenhalgh wrote:
> On the other side of the equation, we want a cost for the converted
> sequence. We can build a cost of the generated rtl sequence, but for
> targets like AArch64 this is going to be wildly off. AArch64 will expand
> (a > b) ? x : y; as a set to the CC register, followed by a conditional
> move based on the CC register. Consequently, where we have multiple sets
> back to back we end up with:
>
>    set CC (a > b)
>    set x1 (CC ? x : y)
>    set CC (a > b)
>    set x2 (CC ? x : z)
>    set CC (a > b)
>    set x3 (CC ? x : k)
>
> Which we know will be simplified later to:
>
>    set CC (a > b)
>    set x1 (CC ? x : y)
>    set x2 (CC ? x : z)
>    set x3 (CC ? x : k)

I guess the transformation you want to make is a bit unusual in that it 
generates such extra instructions. rtx_cost has problems taking such 
secondary considerations into account.

I haven't quite made up my mind about the new target hook, but I wonder 
if it might be a good idea to try and simplify the above sequence on the 
spot before calculating costs for it. (Incidentally, which pass removes 
the extra CC sets?)


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-10-01  9:37                 ` Bernd Schmidt
@ 2015-10-09 11:28                   ` Bernd Schmidt
  2015-10-09 15:28                     ` Jeff Law
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
  0 siblings, 2 replies; 60+ messages in thread
From: Bernd Schmidt @ 2015-10-09 11:28 UTC (permalink / raw)
  To: James Greenhalgh, Richard Biener
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On 10/01/2015 11:37 AM, Bernd Schmidt wrote:
> On 09/29/2015 04:31 PM, James Greenhalgh wrote:
>> On the other side of the equation, we want a cost for the converted
>> sequence. We can build a cost of the generated rtl sequence, but for
>> targets like AArch64 this is going to be wildly off. AArch64 will expand
>> (a > b) ? x : y; as a set to the CC register, followed by a conditional
>> move based on the CC register. Consequently, where we have multiple sets
>> back to back we end up with:
>>
>>    set CC (a > b)
>>    set x1 (CC ? x : y)
>>    set CC (a > b)
>>    set x2 (CC ? x : z)
>>    set CC (a > b)
>>    set x3 (CC ? x : k)
>>
>> Which we know will be simplified later to:
>>
>>    set CC (a > b)
>>    set x1 (CC ? x : y)
>>    set x2 (CC ? x : z)
>>    set x3 (CC ? x : k)
>
> I guess the transformation you want to make is a bit unusual in that it
> generates such extra instructions. rtx_cost has problems taking such
> secondary considerations into account.
>
> I haven't quite made up my mind about the new target hook, but I wonder
> if it might be a good idea to try and simplify the above sequence on the
> spot before calculating costs for it. (Incidentally, which pass removes
> the extra CC sets?)

I don't know whether you've done any more work on the patch series, but 
I think I've made up my mind that optimizing the sequence before 
computing its cost would be a good thing to try first. Either with a 
better expander interface which generates the right thing immediately, 
or running something like cselib over the generated code. I think this 
would give more reliable results rather than guesstimating a redundancy 
factor in the cost function.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.
  2015-10-09 11:28                   ` Bernd Schmidt
@ 2015-10-09 15:28                     ` Jeff Law
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff Law @ 2015-10-09 15:28 UTC (permalink / raw)
  To: Bernd Schmidt, James Greenhalgh, Richard Biener
  Cc: GCC Patches, Ramana Radhakrishnan, Bernd Schmidt, Eric Botcazou,
	Steven Bosscher

On 10/09/2015 05:28 AM, Bernd Schmidt wrote:
>
> I don't know whether you've done any more work on the patch series, but
> I think I've made up my mind that optimizing the sequence before
> computing its cost would be a good thing to try first. Either with a
> better expander interface which generates the right thing immediately,
> or running something like cselib over the generated code. I think this
> would give more reliable results rather than guesstimating a redundancy
> factor in the cost function.
Another great example of what we could do if we could give an RTL 
sequence and mini CFG to the various optimizers and have them just work 
on the sequence (making the appropriate assumptions about live-in and 
live-out objects).  Retrofitting this capability would likely be very 
hard at this point :(

Even just DCE & CSE would probably be a notable step forward.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions
  2015-09-10 18:24 ` Bernd Schmidt
  2015-09-10 21:34   ` Jeff Law
@ 2015-10-30 18:09   ` James Greenhalgh
  2015-11-04 11:04     ` Bernd Schmidt
  1 sibling, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-10-30 18:09 UTC (permalink / raw)
  To: gcc-patches
  Cc: bernds_cb1, law, ebotcazou, steven, kyrylo.tkachov, richard.guenther

[-- Attachment #1: Type: text/plain, Size: 4899 bytes --]


On Thu, Sep 10, 2015 at 07:23:28PM +0100, Bernd Schmidt wrote:
> On 09/08/2015 04:53 PM, James Greenhalgh wrote:
> > One big question I have with this patch is how I ought to write a meaningful
> > cost model I've used. It seems like yet another misuse of RTX costs, and
> > another bit of stuff for targets to carefully balance. Now, if the
> > relative cost of branches and conditional move instructions is not
> > carefully managed, you may enable or disable these optimisations. This is
> > probably acceptable, but I dislike adding more and more gotcha's to
> > target costs, as I get bitten by them hard enough as is!
>
> The code you have seems reasonable, except that for compile time it
> might make sense to not even attempt the optimization if the number of
> sets is too large. I'm not too worried about that, but maybe you could
> bail out early if your cost estimate goes too much above the branch cost.

Thanks all for the review of this patch. I was hoping to look at fixing the
costs model first, but that seems likely to continue to be controversial
and worthy of a more considerable period to work through kinks in the
design rather than rushed in to the closing days of Stage 1. I'll get back
to the list soon with what I'd like to achieve with a costs rework and
how I plan to go about it.

In the interim, I'd like to try to find a route to get this patch accepted
and committed.

The goal after I rewrite the costs should be for us to use whatever
new replacement_cost (or similar) hook we end up building. That will
necessitate building and optimising some RTX which we are going to throw
away, I think that is the proffered direction (though still needs
discussed) and compile time will be a necessary sacrifice I'm afraid.

In the FORNOW cost model, we just count instructions. We can do that
early and avoid creating the extra RTX. That's what I do here.

> > +      /* If we were supposed to read from an earlier write in this block,
> > +	 we've changed the register allocation.  Rewire the read.  While
> > +	 we are looking, also try to catch a swap idiom.  */
>
> So this is one interesting case; do you also have to worry about others
> (such as maybe setting the same register multiple times)?

I'm hoping that cases like that will have either been cleaned up already
or will be cleaned up later. For the particular example you mention, we
would have something like:

    if (a > b)
      x = 1;
      y = 2;
      x = x;
      y = x;

And would if-convert (catching with the existing swap idiom check) to:

  tmp_x = (a > b) ? 1 : x;
  tmp_y = (a > b) ? 2 : y;
  tmp_x2 = (a > b) ? 1 : x;
  tmp_y2 = (a > b) ? 1 : y;
  x = tmp_x
  y = tmp_y
  x = tmp_x2
  y = tmp_y2

Which, while ugly, one would hope would be cleaned to:

  tmp_x2 = (a > b) ? 1 : x;
  tmp_y2 = (a > b) ? 1 : y;
  x = tmp_x2
  y = tmp_y2

I've expanded the comment to talk about this case.

> > +  /* We must have seen some sort of insn to insert, otherwise we were
> > +     given an empty BB to convert, and we can't handle that.  */
> > +  if (unmodified_insns.is_empty ())
> > +    {
> > +      end_sequence ();
> > +      return FALSE;
> > +    }
>
> Looks like some of the error conditions are tested twice across the two
> new functions? I think it would be better to get rid of one copy or turn
> the second one into a gcc_assert.

I've cleaned this up a little, thanks.

>
>  > No testcase provided, as currently I don't know of targets with a high
>  > enough branch cost to actually trigger the optimisation.
>
> Hmm, so the code would not actually be used right now? In that case I'll
> leave it to others to decide whether we want to apply it. Other than the
> points above it looks OK to me.

We have more than a few targets with BRANCH_COST >= 2 for unpredictable
branches, so I've put together a testcase which exercises at least some of
the code. I'll need to rely on target maintainers to XFAIL their target
(or at least to shout at me if a FAIL pops up, and I can XFAIL their
target).

I've boosttrapped this revision on aarch64-none-linux-gnu and
x86_64-none-linux-gnu with no issues.

By inspection of the x86_64 and aarch64 generated code for Spec2006 we
if-convert in a few additional places (though within the defined cost
model) and we end up with inverse condition codes for the cmoves we put
out. Neither of these looked bad to me.

Is this OK to progress as is while I build myself up to wrestle with
costs for GCC 7?

Thanks,
James

---
gcc/

2015-10-30  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.

gcc/testsuite/

2015-10-30  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.dg/ifcvt-4.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Re-Patch-ifcvt-Teach-RTL-ifcvt-to-handle-multiple-si.patch --]
[-- Type: text/x-patch;  name=0001-Re-Patch-ifcvt-Teach-RTL-ifcvt-to-handle-multiple-si.patch, Size: 9333 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 592e86d..4071490 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3025,6 +3025,249 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+     if (x > y)
+       { i = a; j = b; k = c; }
+
+   Make it:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? b : j;
+     tmp_k = (x > y) ? c : k;
+     i = tmp_i; <- Should be cleaned up
+     j = tmp_j; <- Likewise.
+     k = tmp_k; <- Likewise.
+
+   Look for special cases such as writes to one register which are
+   read back in anotyher SET, as might occur in a swap idiom or
+   similar.
+
+   These look like:
+
+   if (x > y)
+     i = a;
+     j = i;
+
+   Which we want to rewrite to:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? tmp_i : j;
+     i = tmp_i;
+     j = tmp_j;
+
+   We can catch these when looking at (SET x y) by keeping a list of the
+   registers we would have targetted before if-conversion and looking back
+   through it for an overlap with Y.  If we find one, we rewire the
+   conditional set to use the temporary we introduced earlier.
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec<rtx> targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  vec<rtx> temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec<rtx_insn *> unmodified_insns = vNULL;
+  int count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+    {
+      /* Skip over non-insns.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx target = SET_DEST (set);
+      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx new_val = SET_SRC (set);
+      rtx old_val = target;
+
+      /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+      for (int i = count - 1; i >= 0; --i)
+	{
+	  if (reg_overlap_mentioned_p (new_val, targets[i]))
+	    {
+	      /* Catch a "swap" style idiom.  */
+	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+		{
+		  /* The write to targets[i] is only live until the read
+		     here.  As the condition codes match, we can propagate
+		     the set to here.  */
+		   new_val = SET_SRC (single_set (unmodified_insns[i]));
+		}
+	      else
+		new_val = temporaries[i];
+	      break;
+	    }
+	}
+
+      /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */
+      if (if_info->then_else_reversed)
+	std::swap (old_val, new_val);
+
+      /* Actually emit the conditional move.  */
+      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				       x, y, new_val, old_val);
+
+      /* If we failed to expand the conditional move, drop out and don't
+	 try to continue.  */
+      if (temp_dest == NULL_RTX)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+
+      /* Bookkeeping.  */
+      count++;
+      targets.safe_push (target);
+      temporaries.safe_push (temp_dest);
+      unmodified_insns.safe_push (insn);
+    }
+
+  /* We must have seen some sort of insn to insert, otherwise we were
+     given an empty BB to convert, and we can't handle that.  */
+  gcc_assert (!unmodified_insns.is_empty ());
+
+  /* Now fixup the assignments.  */
+  for (int i = 0; i < count; i++)
+    noce_emit_move_insn (targets[i], temporaries[i]);
+
+  /* Actually emit the sequence.  */
+  rtx_insn *seq = get_insns ();
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
+  /* Mark all our temporaries and targets as used.  */
+  for (int i = 0; i < count; i++)
+    {
+      set_used_flags (temporaries[i]);
+      set_used_flags (targets[i]);
+    }
+
+  set_used_flags (cond);
+  set_used_flags (x);
+  set_used_flags (y);
+
+  unshare_all_rtl_in_chain (seq);
+  end_sequence ();
+
+  if (!seq)
+    return FALSE;
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    if (JUMP_P (insn)
+	|| recog_memoized (insn) == -1)
+      return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			   INSN_LOCATION (unmodified_insns.last ()));
+
+  /* Clean up THEN_BB and the edges in and out of it.  */
+  remove_edge (find_edge (test_bb, join_bb));
+  remove_edge (find_edge (then_bb, join_bb));
+  redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb);
+  delete_basic_block (then_bb);
+  num_true_changes++;
+
+  /* Maybe merge blocks now the jump is simple enough.  */
+  if (can_merge_blocks_p (test_bb, join_bb))
+    {
+      merge_blocks (test_bb, join_bb);
+      num_true_changes++;
+    }
+
+  num_updated_if_blocks++;
+  return TRUE;
+}
+
+/* Return true iff basic block TEST_BB is comprised of only
+   (SET (REG) (REG)) insns suitable for conversion to a series
+   of conditional moves.  FORNOW: Use II to find the expected cost of
+   the branch into/over TEST_BB.
+
+   TODO: This creates an implicit "magic number" for branch_cost.
+   II->branch_cost now guides the maximum number of set instructions in
+   a basic block which is considered profitable to completely
+   if-convert.  */
+
+static bool
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
+				      struct noce_if_info *ii)
+{
+  rtx_insn *insn;
+
+  /* We must have at least one real insn to convert, or there will
+     be trouble!  */
+  unsigned count = 0;
+
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      /* Skip over notes etc.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      /* We only handle SET insns.  */
+      rtx set = single_set (insn);
+      if (set == NULL_RTX)
+	return false;
+
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* We can possibly relax this, but for now only handle REG to REG
+	 moves.  This avoids any issues that might come from introducing
+	 loads/stores that might violate data-race-freedom guarantees.  */
+      if (!(REG_P (src) && REG_P (dest)))
+	return false;
+
+      /* Destination must be appropriate for a conditional write.  */
+      if (!noce_operand_ok (dest))
+	return false;
+
+      /* We must be able to conditionally move in this mode.  */
+      if (!can_conditionally_move_p (GET_MODE (dest)))
+	return false;
+
+      ++count;
+    }
+
+  /* FORNOW: Our cost model is a count of the number of instructions we
+     would if-convert.  This is suboptimal, and should be improved as part
+     of a wider rework of branch_cost.  */
+  if (count > ii->branch_cost)
+    return FALSE;
+
+  return count > 0;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -3047,12 +3290,22 @@ noce_process_if_block (struct noce_if_info *if_info)
      (1) if (...) x = a; else x = b;
      (2) x = b; if (...) x = a;
      (3) if (...) x = a;   // as if with an initial x = x.
-
+     (4) if (...) { x = a; y = b; z = c; }  // Like 3, for multiple SETS.
      The later patterns require jumps to be more expensive.
      For the if (...) x = a; else x = b; case we allow multiple insns
      inside the then and else blocks as long as their only effect is
      to calculate a value for x.
-     ??? For future expansion, look for multiple X in such patterns.  */
+     ??? For future expansion, further expand the "multiple X" rules.  */
+
+  /* First look for multiple SETS.  */
+  if (!else_bb
+      && HAVE_conditional_move
+      && !HAVE_cc0
+      && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+    {
+      if (noce_convert_multiple_sets (if_info))
+	return TRUE;
+    }
 
   if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
 				    &if_info->then_simple))
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
new file mode 100644
index 0000000..16be2b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -0,0 +1,16 @@
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+int
+foo (int x, int y, int a)
+{
+  int i = x;
+  int j = y;
+  /* Try to make taking the branch likely.  */
+  __builtin_expect (x > y, 1);
+  if (x > y)
+    {
+      i = a;
+      j = i;
+    }
+  return i * j;
+}
+/* { dg-final { scan-rtl-dump "2 true changes made" "ce1" } } */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions
  2015-10-30 18:09   ` [Patch ifcvt] " James Greenhalgh
@ 2015-11-04 11:04     ` Bernd Schmidt
  2015-11-04 15:37       ` James Greenhalgh
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2015-11-04 11:04 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: bernds_cb1, law, ebotcazou, steven, kyrylo.tkachov, richard.guenther

On 10/30/2015 07:03 PM, James Greenhalgh wrote:
> +     i = tmp_i; <- Should be cleaned up

Maybe reword as "Subsequent passes are expected to clean up the extra 
moves", otherwise it sounds like a TODO item.

> +   read back in anotyher SET, as might occur in a swap idiom or

Typo.

> +	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> +		{
> +		  /* The write to targets[i] is only live until the read
> +		     here.  As the condition codes match, we can propagate
> +		     the set to here.  */
> +		   new_val = SET_SRC (single_set (unmodified_insns[i]));
> +		}

Shouldn't use braces around single statements (also goes for the 
surrounding for loop).

> +  /* We must have at least one real insn to convert, or there will
> +     be trouble!  */
> +  unsigned count = 0;

The comment seems a bit strange in this context - I think it's left over 
from the earlier version?

As far as I'm concerned this is otherwise ok.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions
  2015-11-04 11:04     ` Bernd Schmidt
@ 2015-11-04 15:37       ` James Greenhalgh
  2015-11-06  9:13         ` Christophe Lyon
  0 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2015-11-04 15:37 UTC (permalink / raw)
  To: gcc-patches
  Cc: bernds_cb1, law, ebotcazou, steven, Kyrylo.Tkachov, richard.guenther

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]


On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
> >+     i = tmp_i; <- Should be cleaned up
>
> Maybe reword as "Subsequent passes are expected to clean up the
> extra moves", otherwise it sounds like a TODO item.
>
> >+   read back in anotyher SET, as might occur in a swap idiom or
>
> Typo.
>
> >+	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> >+		{
> >+		  /* The write to targets[i] is only live until the read
> >+		     here.  As the condition codes match, we can propagate
> >+		     the set to here.  */
> >+		   new_val = SET_SRC (single_set (unmodified_insns[i]));
> >+		}
>
> Shouldn't use braces around single statements (also goes for the
> surrounding for loop).
>
> >+  /* We must have at least one real insn to convert, or there will
> >+     be trouble!  */
> >+  unsigned count = 0;
>
> The comment seems a bit strange in this context - I think it's left
> over from the earlier version?
>
> As far as I'm concerned this is otherwise ok.

Thanks,

I've updated the patch with those issues addressed. As the cost model was
controversial in an earlier revision, I'll leave this on list for 24 hours
and, if nobody jumps in to object, commit it tomorrow.

I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
just to check that I got the braces right, with no issues.

Thanks,
James

---
gcc/

2015-11-04  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.

gcc/testsuite/

2015-11-04  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.dg/ifcvt-4.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Re-Patch-ifcvt-Teach-RTL-ifcvt-to-handle-multiple-si.patch --]
[-- Type: text/x-patch;  name=0001-Re-Patch-ifcvt-Teach-RTL-ifcvt-to-handle-multiple-si.patch, Size: 9221 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f23d9afd..1c33283 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3016,6 +3016,244 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+     if (x > y)
+       { i = a; j = b; k = c; }
+
+   Make it:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? b : j;
+     tmp_k = (x > y) ? c : k;
+     i = tmp_i;
+     j = tmp_j;
+     k = tmp_k;
+
+   Subsequent passes are expected to clean up the extra moves.
+
+   Look for special cases such as writes to one register which are
+   read back in another SET, as might occur in a swap idiom or
+   similar.
+
+   These look like:
+
+   if (x > y)
+     i = a;
+     j = i;
+
+   Which we want to rewrite to:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? tmp_i : j;
+     i = tmp_i;
+     j = tmp_j;
+
+   We can catch these when looking at (SET x y) by keeping a list of the
+   registers we would have targeted before if-conversion and looking back
+   through it for an overlap with Y.  If we find one, we rewire the
+   conditional set to use the temporary we introduced earlier.
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec<rtx> targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  vec<rtx> temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec<rtx_insn *> unmodified_insns = vNULL;
+  int count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+    {
+      /* Skip over non-insns.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx target = SET_DEST (set);
+      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx new_val = SET_SRC (set);
+      rtx old_val = target;
+
+      /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+      for (int i = count - 1; i >= 0; --i)
+	if (reg_overlap_mentioned_p (new_val, targets[i]))
+	  {
+	    /* Catch a "swap" style idiom.  */
+	    if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+	      /* The write to targets[i] is only live until the read
+		 here.  As the condition codes match, we can propagate
+		 the set to here.  */
+	      new_val = SET_SRC (single_set (unmodified_insns[i]));
+	    else
+	      new_val = temporaries[i];
+	    break;
+	  }
+
+      /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */
+      if (if_info->then_else_reversed)
+	std::swap (old_val, new_val);
+
+      /* Actually emit the conditional move.  */
+      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				       x, y, new_val, old_val);
+
+      /* If we failed to expand the conditional move, drop out and don't
+	 try to continue.  */
+      if (temp_dest == NULL_RTX)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+
+      /* Bookkeeping.  */
+      count++;
+      targets.safe_push (target);
+      temporaries.safe_push (temp_dest);
+      unmodified_insns.safe_push (insn);
+    }
+
+  /* We must have seen some sort of insn to insert, otherwise we were
+     given an empty BB to convert, and we can't handle that.  */
+  gcc_assert (!unmodified_insns.is_empty ());
+
+  /* Now fixup the assignments.  */
+  for (int i = 0; i < count; i++)
+    noce_emit_move_insn (targets[i], temporaries[i]);
+
+  /* Actually emit the sequence.  */
+  rtx_insn *seq = get_insns ();
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
+  /* Mark all our temporaries and targets as used.  */
+  for (int i = 0; i < count; i++)
+    {
+      set_used_flags (temporaries[i]);
+      set_used_flags (targets[i]);
+    }
+
+  set_used_flags (cond);
+  set_used_flags (x);
+  set_used_flags (y);
+
+  unshare_all_rtl_in_chain (seq);
+  end_sequence ();
+
+  if (!seq)
+    return FALSE;
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    if (JUMP_P (insn)
+	|| recog_memoized (insn) == -1)
+      return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			   INSN_LOCATION (unmodified_insns.last ()));
+
+  /* Clean up THEN_BB and the edges in and out of it.  */
+  remove_edge (find_edge (test_bb, join_bb));
+  remove_edge (find_edge (then_bb, join_bb));
+  redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb);
+  delete_basic_block (then_bb);
+  num_true_changes++;
+
+  /* Maybe merge blocks now the jump is simple enough.  */
+  if (can_merge_blocks_p (test_bb, join_bb))
+    {
+      merge_blocks (test_bb, join_bb);
+      num_true_changes++;
+    }
+
+  num_updated_if_blocks++;
+  return TRUE;
+}
+
+/* Return true iff basic block TEST_BB is comprised of only
+   (SET (REG) (REG)) insns suitable for conversion to a series
+   of conditional moves.  FORNOW: Use II to find the expected cost of
+   the branch into/over TEST_BB.
+
+   TODO: This creates an implicit "magic number" for branch_cost.
+   II->branch_cost now guides the maximum number of set instructions in
+   a basic block which is considered profitable to completely
+   if-convert.  */
+
+static bool
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
+				      struct noce_if_info *ii)
+{
+  rtx_insn *insn;
+  unsigned count = 0;
+
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      /* Skip over notes etc.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      /* We only handle SET insns.  */
+      rtx set = single_set (insn);
+      if (set == NULL_RTX)
+	return false;
+
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* We can possibly relax this, but for now only handle REG to REG
+	 moves.  This avoids any issues that might come from introducing
+	 loads/stores that might violate data-race-freedom guarantees.  */
+      if (!(REG_P (src) && REG_P (dest)))
+	return false;
+
+      /* Destination must be appropriate for a conditional write.  */
+      if (!noce_operand_ok (dest))
+	return false;
+
+      /* We must be able to conditionally move in this mode.  */
+      if (!can_conditionally_move_p (GET_MODE (dest)))
+	return false;
+
+      ++count;
+    }
+
+  /* FORNOW: Our cost model is a count of the number of instructions we
+     would if-convert.  This is suboptimal, and should be improved as part
+     of a wider rework of branch_cost.  */
+  if (count > ii->branch_cost)
+    return FALSE;
+
+  return count > 0;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -3038,12 +3276,22 @@ noce_process_if_block (struct noce_if_info *if_info)
      (1) if (...) x = a; else x = b;
      (2) x = b; if (...) x = a;
      (3) if (...) x = a;   // as if with an initial x = x.
-
+     (4) if (...) { x = a; y = b; z = c; }  // Like 3, for multiple SETS.
      The later patterns require jumps to be more expensive.
      For the if (...) x = a; else x = b; case we allow multiple insns
      inside the then and else blocks as long as their only effect is
      to calculate a value for x.
-     ??? For future expansion, look for multiple X in such patterns.  */
+     ??? For future expansion, further expand the "multiple X" rules.  */
+
+  /* First look for multiple SETS.  */
+  if (!else_bb
+      && HAVE_conditional_move
+      && !HAVE_cc0
+      && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+    {
+      if (noce_convert_multiple_sets (if_info))
+	return TRUE;
+    }
 
   if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
 				    &if_info->then_simple))
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
new file mode 100644
index 0000000..16be2b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -0,0 +1,16 @@
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+int
+foo (int x, int y, int a)
+{
+  int i = x;
+  int j = y;
+  /* Try to make taking the branch likely.  */
+  __builtin_expect (x > y, 1);
+  if (x > y)
+    {
+      i = a;
+      j = i;
+    }
+  return i * j;
+}
+/* { dg-final { scan-rtl-dump "2 true changes made" "ce1" } } */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions
  2015-11-04 15:37       ` James Greenhalgh
@ 2015-11-06  9:13         ` Christophe Lyon
  0 siblings, 0 replies; 60+ messages in thread
From: Christophe Lyon @ 2015-11-06  9:13 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, bernds_cb1, Jeff Law, ebotcazou, steven,
	Kyrylo Tkachov, Richard Biener

On 4 November 2015 at 16:37, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
>> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
>> >+     i = tmp_i; <- Should be cleaned up
>>
>> Maybe reword as "Subsequent passes are expected to clean up the
>> extra moves", otherwise it sounds like a TODO item.
>>
>> >+   read back in anotyher SET, as might occur in a swap idiom or
>>
>> Typo.
>>
>> >+          if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
>> >+            {
>> >+              /* The write to targets[i] is only live until the read
>> >+                 here.  As the condition codes match, we can propagate
>> >+                 the set to here.  */
>> >+               new_val = SET_SRC (single_set (unmodified_insns[i]));
>> >+            }
>>
>> Shouldn't use braces around single statements (also goes for the
>> surrounding for loop).
>>
>> >+  /* We must have at least one real insn to convert, or there will
>> >+     be trouble!  */
>> >+  unsigned count = 0;
>>
>> The comment seems a bit strange in this context - I think it's left
>> over from the earlier version?
>>
>> As far as I'm concerned this is otherwise ok.
>
> Thanks,
>
> I've updated the patch with those issues addressed. As the cost model was
> controversial in an earlier revision, I'll leave this on list for 24 hours
> and, if nobody jumps in to object, commit it tomorrow.
>
> I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
> just to check that I got the braces right, with no issues.
>

The new test does not pass on some ARM configurations, I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Christophe.

> Thanks,
> James
>
> ---
> gcc/
>
> 2015-11-04  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
>         (noce_convert_multiple_sets): Likewise.
>         (noce_process_if_block): Call them.
>
> gcc/testsuite/
>
> 2015-11-04  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.dg/ifcvt-4.c: New.
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 4/6] Modify cost model for noce_cmove_arith
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
  2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
@ 2016-06-02 16:54                       ` James Greenhalgh
  2016-06-02 16:55                       ` [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
                                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]


Hi,

This patch clears up the cost model for noce_try_cmove_arith. We lose
the "??? FIXME: Magic number 5" comment, and gain a more realistic cost
model for if-converting memory accesses.

This is the patch that will cause the largest behavioural change for most
targets - the current heuristic does not take in to consideration the cost
of a conditional move - once we add that the cost of the converted sequence
often looks much higher than (BRANCH_COST * COSTS_N_INSNS (1)).

I think that missing the cost of the conditional move from these sequences
is not a good idea, and that the cost model should rely on the target giving
back good information. A target that finds tests failing after this patch
should consider either reducing the cost of a conditional move sequence, or
increasing TARGET_RTX_BRANCH_COST.

OK?

Thanks,
James

---
gcc/

2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_code_is_comparison_p): New.
	(noce_cmove_cost): Likewise.
	(noce_try_cmove_arith): Use it.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-RFC-Patch-4-6-Modify-cost-model-for-noce_cmove_arith.patch --]
[-- Type: text/x-patch;  name=0004-RFC-Patch-4-6-Modify-cost-model-for-noce_cmove_arith.patch, Size: 5102 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index b192c85..bd3f55d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -830,6 +830,78 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* Return TRUE if CODE is an RTX comparison operator.  */
+
+static bool
+noce_code_is_comparison_p (rtx_code code)
+{
+  switch (code)
+    {
+    case NE:
+    case EQ:
+    case GE:
+    case GT:
+    case LE:
+    case LT:
+    case GEU:
+    case GTU:
+    case LEU:
+    case LTU:
+    case UNORDERED:
+    case ORDERED:
+    case UNEQ:
+    case UNGE:
+    case UNGT:
+    case UNLE:
+    case UNLT:
+    case LTGT:
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Return the estimated cost of a single conditional move, where the
+   condition is calculated using the COMPARISON operator in mode CMODE,
+   and the store is in mode SMODE, depending on whether we are compiling
+   for SPEED_P.  */
+
+static unsigned int
+noce_cmove_estimate_cost (machine_mode cmode, machine_mode smode,
+			  rtx_code comparison, bool speed_p)
+{
+  unsigned int cost = 0;
+
+  gcc_checking_assert (noce_code_is_comparison_p (comparison));
+
+  start_sequence ();
+
+  /* We're only estimating, so we don't need to be too cautious about
+     getting the operands correct, but we would like an estimate.  We
+     do need at least two registers, to avoid the comparison being
+     folded.  */
+  rtx creg = gen_reg_rtx (cmode);
+  rtx creg2 = gen_reg_rtx (cmode);
+  rtx sreg = gen_reg_rtx (smode);
+  rtx sreg2 = gen_reg_rtx (smode);
+  rtx dest = emit_conditional_move (sreg, comparison, creg, creg2,
+				    cmode, sreg, sreg2, smode, false);
+  if (!dest)
+    {
+      /* Set something suitably high in here, as our best guess
+	 is that the if-conversion will fail.  */
+      cost = COSTS_N_INSNS (32);
+    }
+  else
+    {
+      rtx_insn *seq = get_insns ();
+      cost = seq_cost (seq, speed_p);
+    }
+  end_sequence ();
+
+  return cost;
+}
+
 /* This function is always called when we would expand a number of "cheap"
    instructions.  Multiply NINSNS by COSTS_N_INSNS (1) to approximate the
    RTX cost of those cheap instructions.  */
@@ -2040,7 +2112,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx a = if_info->a;
   rtx b = if_info->b;
   rtx x = if_info->x;
-  rtx orig_a, orig_b;
+  rtx orig_a = a;
+  rtx orig_b = b;
   rtx_insn *insn_a, *insn_b;
   bool a_simple = if_info->then_simple;
   bool b_simple = if_info->else_simple;
@@ -2050,16 +2123,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   int is_mem = 0;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
+  bool speed_p = optimize_bb_for_speed_p (if_info->test_bb);
 
   /* A conditional move from two memory sources is equivalent to a
      conditional on their addresses followed by a load.  Don't do this
      early because it'll screw alias analysis.  Note that we've
      already checked for no side effects.  */
-  /* ??? FIXME: Magic number 5.  */
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
-      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-      && noce_estimate_conversion_profitable_p (if_info, 5))
+      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
     {
       machine_mode address_mode = get_address_mode (a);
 
@@ -2087,6 +2159,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_b = if_info->insn_b;
 
   machine_mode x_mode = GET_MODE (x);
+  machine_mode cmode = GET_MODE (XEXP (if_info->cond, 0));
 
   if (!can_conditionally_move_p (x_mode))
     return FALSE;
@@ -2103,12 +2176,32 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   else
     else_cost = 0;
 
-  /* We're going to execute one of the basic blocks anyway, so
-     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (!is_mem)
+    {
+      /* If convert in the case that:
+
+	 then_cost + else_cost + noce_cmove_estimate_cost (x_mode)
+	   <= MIN (then_cost, else_cost) + if_info->rtx_edge_cost
 
-  /* TODO: Revisit cost model.  */
-  if (MAX (then_cost, else_cost) > if_info->rtx_edge_cost)
-    return FALSE;
+	   Which we rearrange using the rule
+	     (a + b - MIN (a, b) == MAX (a, b))
+	   to get...  */
+      if ((MAX (then_cost, else_cost)
+	   + noce_cmove_estimate_cost (cmode, x_mode, code, speed_p))
+	  > if_info->rtx_edge_cost)
+	{
+          return FALSE;
+	}
+    }
+  else
+    {
+      /* We're trying to convert a branch and a conditional load in to an
+	 unconditional load and a conditional move.  Cost those directly.  */
+      if ((then_cost
+	   + noce_cmove_estimate_cost (cmode, x_mode, code, speed_p))
+          > if_info->rtx_edge_cost + MIN (then_cost, else_cost))
+        return FALSE;
+    }
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models
  2015-10-09 11:28                   ` Bernd Schmidt
  2015-10-09 15:28                     ` Jeff Law
@ 2016-06-02 16:54                     ` James Greenhalgh
  2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
                                         ` (7 more replies)
  1 sibling, 8 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 4727 bytes --]

Hi,

When I was working in the area last year, I promised to revisit the cost
model for noce if-conversion and see if I could improve the modeling. This
turned out to be more tricky than I expected.

This patch set rewrites the cost model for noce if-conversion. The goal is
to rationalise the units used in the calculations away from BRANCH_COST,
which is only defined relative to itself.

I've introduced a new target hook "rtx_branch_cost" which is defined to return
the cost of a branch in RTX cost units, suitable for comparing directly with
the calculated cost of a conditional move, or the conditionally executed
branches.

If you're looking at that and thinking it doesn't sound much different from
our current call to BRANCH_COST, you're right. This isn't as large a
departure from the existing cost model as I had originally intended. I
started out experimenting with much larger hooks (with many
parameters/pass-specific data), or hooks that passed the whole edge to
the target asking for the cost. These ended up feeling quite silly
to write in the target, and don't match with the direction discussed at
last year's cauldron. We don't want to go around leaking pass-internal
data around to back-ends. That is a path of madness as the passes change but
find that targets have invented baroque calculations to help invent a
magic number.

I tried implementing a "replacement_cost" hook, which would take before
and after code sequences and try to guess profitability, but because you
want to take edge probabilities in to consideration while trying to calculate
the costs of an if-then-else structure, the code gets hairy quickly. Worse
is that this would need duplicating across any target implementing the
hook. I found that I was constructing lots of RTX just to throw it away
again, and sometimes we were constructing RTX that would trivially be
optimised by a future pass. As a metric for if-conversion, this hook
seemed more harmful than useful for both the quality of the decision we'd
make, and for the quality of the GCC source.

As I iterated through versions of this patch set, I realised that all we
really wanted for ifcvt was a way to estimate the cost of a branch in units
that were comparable to the cost of instructions. The trouble with BRANCH_COST
wasn't that it was returning a magic number, it was just that it was returning
a magic number which had inconsistent meanings in the compiler. Otherwise,
BRANCH_COST was a straightforward, low-complexity target hook.

So the new hook simply defines the relative units that it will use and
splits off the use in ifcvt from other BRANCH_COST calls.

Having introduced the hook, and added some framework to make use of it, the
rest of the patch set works through each of the cost models in ifcvt.c,
makes them consistent, and moves them to the new hook.

This act of making the cost models consistent will cause code generation
changes on a number of targets - most notably x86_64. On x86_64 the RTX
cost of a conditional move comes out at "20" - this is far higher than
COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots
of if-conversion. The easy fix for this would be to implement the new hook.
I measured the performance impact on Spec2000 as a smoke test, it didn't
seem to harm anything, and the result was a slight (< 3%) uplift on
Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for
the reasons.

Having worked through the patch set, I'd say it is probably a small
improvement over what we currently do, but I'm not very happy with it. I'm
posting it for comment so we can discuss any directions for costs that I
haven't thought about or prototyped. I'm also happy to drop the costs
rewrite if this seems like complexity for no benefit.

Any thoughts?

I've bootstrapped and tested the patch set on x86_64 and aarch64, but
they probably need some further polishing if we were to decide this was a
useful direction.

Thanks,
James

James Greenhalgh (6):
  [RFC: Patch 1/6] New target hook: rtx_branch_cost
  [RFC: Patch 2/6] Factor out the comparisons against magic numbers in
    ifcvt
  [RFC: Patch 3/6] Remove if_info->branch_cost
  [RFC: Patch 4/6] Modify cost model for noce_cmove_arith
  [RFC: Patch 5/6] Improve the cost model for multiple-sets
  [RFC: Patch 6/6] Remove second cost model from
    noce_try_store_flag_mask

 gcc/doc/tm.texi    |  10 +++
 gcc/doc/tm.texi.in |   2 +
 gcc/ifcvt.c        | 204 +++++++++++++++++++++++++++++++++++++++++------------
 gcc/target.def     |  14 ++++
 gcc/targhooks.c    |  10 +++
 gcc/targhooks.h    |   2 +
 6 files changed, 197 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 1/6] New target hook: rtx_branch_cost
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
@ 2016-06-02 16:54                       ` James Greenhalgh
  2016-06-03 10:39                         ` Richard Biener
  2016-06-02 16:54                       ` [RFC: Patch 4/6] Modify cost model for noce_cmove_arith James Greenhalgh
                                         ` (6 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]


Hi,

This patch introduces a new target hook, to be used like BRANCH_COST but
with a guaranteed unit of measurement. We want this to break away from
the current ambiguous uses of BRANCH_COST.

BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
instruction counts - where it is used as the limit on the number of new
instructions we are permitted to generate. The other (after multiplying
by COSTS_N_INSNS (1)) directly against RTX costs.

Of these, a comparison against RTX costs is the more easily understood
metric across the compiler, and the one I've pulled out to the new hook.
To keep things consistent for targets which don't migrate, this new hook
has a default value of BRANCH_COST * COSTS_N_INSNS (1).

OK?

Thanks,
James

---
2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (rtx_branch_cost): New.
	* doc/tm.texi.in (TARGET_RTX_BRANCH_COST): Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_rtx_branch_cost): New.
	* targhooks.c (default_rtx_branch_cost): New.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-RFC-Patch-1-6-New-target-hook-rtx_branch_cost.patch --]
[-- Type: text/x-patch;  name=0001-RFC-Patch-1-6-New-target-hook-rtx_branch_cost.patch, Size: 3660 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..32efa1f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6499,6 +6499,16 @@ should probably only be given to addresses with different numbers of
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} {unsigned int} TARGET_RTX_BRANCH_COST (bool @var{speed_p}, bool @var{predictable_p})
+This hook should return a cost in the same units as
+  @code{TARGET_RTX_COSTS}, giving the estimated cost of a branch.
+  @code{speed_p} is true if we are compiling for speed.
+  @code{predictable_p} is true if analysis suggests that the branch
+  will be predictable.  The default implementation of this hook
+  multiplies @code{BRANCH_COST} by the cost of a cheap instruction to
+  approximate the cost of a branch in the appropriate units.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..92461b0 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4748,6 +4748,8 @@ Define this macro if a non-short-circuit operation produced by
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_RTX_BRANCH_COST
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @node Scheduling
diff --git a/gcc/target.def b/gcc/target.def
index 6392e73..f049a8b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3559,6 +3559,20 @@ registers on machines with lots of registers.",
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Give a cost, in RTX Costs units, for an edge.  Like BRANCH_COST, but with
+   well defined units.  */
+DEFHOOK
+(rtx_branch_cost,
+ "This hook should return a cost in the same units as\n\
+  @code{TARGET_RTX_COSTS}, giving the estimated cost of a branch.\n\
+  @code{speed_p} is true if we are compiling for speed.\n\
+  @code{predictable_p} is true if analysis suggests that the branch\n\
+  will be predictable.  The default implementation of this hook\n\
+  multiplies @code{BRANCH_COST} by the cost of a cheap instruction to\n\
+  approximate the cost of a branch in the appropriate units.",
+  unsigned int, (bool speed_p, bool predictable_p),
+  default_rtx_branch_cost)
+
 /* Permit speculative instructions in delay slots during delayed-branch 
    scheduling.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6b4601b..dcffeb8 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "predict.h"
 
 
 bool
@@ -1965,4 +1966,13 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Default implementation of TARGET_RTX_BRANCH_COST.  */
+
+unsigned int
+default_rtx_branch_cost (bool speed_p,
+			 bool predictable_p)
+{
+  return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7687c39..b7ff94c 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -254,4 +254,6 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern unsigned int default_rtx_branch_cost (bool, bool);
+
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
  2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
  2016-06-02 16:54                       ` [RFC: Patch 4/6] Modify cost model for noce_cmove_arith James Greenhalgh
@ 2016-06-02 16:55                       ` James Greenhalgh
  2016-06-02 16:55                       ` [RFC: Patch 5/6] Improve the cost model for multiple-sets James Greenhalgh
                                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]


Hi,

This patch pulls the comparisons between if_info->branch_cost and a magic
number representing an instruction count to a common function. While I'm
doing it, I've documented the instructions that the magic numbers relate
to, and updated them where they were inconsistent.

If our measure of the cost of a branch is now in rtx costs units, we can
get to an estimate for the cost of an expression from the number of
instructions by multiplying through by COSTS_N_INSNS (1).

This isn't the great revolution in ifcvt costs I hoped for when I sat down
to take on the work. But, it looks like the best I can do short of
constructing ADD and SUB rtx all over the place just to get back a
judgement from the target that they are "cheap". The nicest thing about
doing it this way is that it mostly preserves behaviour, and by factoring
it out we have an upgrade path to a more detailed cost model should we want
it.

OK?

Thanks,
James

---

2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): New field: rtx_edge_cost.
	(noce_estimate_conversion_profitable_p): New.
	(noce_try_store_flag_constants): Use it.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_cmove_arith): Likewise.
	(noce_find_if_block): Record targetm.rtx_edge_cost.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-RFC-Patch-2-6-Factor-out-the-comparisons-against-mag.patch --]
[-- Type: text/x-patch;  name=0002-RFC-Patch-2-6-Factor-out-the-comparisons-against-mag.patch, Size: 5114 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 44ae020..22cb5e7 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -813,6 +813,7 @@ struct noce_if_info
 
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
+  unsigned int rtx_edge_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -830,6 +831,17 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* This function is always called when we would expand a number of "cheap"
+   instructions.  Multiply NINSNS by COSTS_N_INSNS (1) to approximate the
+   RTX cost of those cheap instructions.  */
+
+inline static bool
+noce_estimate_conversion_profitable_p (struct noce_if_info *if_info,
+				       unsigned int ninsns)
+{
+  return (if_info->rtx_edge_cost >= ninsns * COSTS_N_INSNS (1));
+}
+
 /* Helper function for noce_try_store_flag*.  */
 
 static rtx
@@ -1279,7 +1291,8 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       && (REG_P (XEXP (a, 0))
 	  || (noce_operand_ok (XEXP (a, 0))
 	      && ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0))))
-      && if_info->branch_cost >= 2)
+      /* We need one instruction, the ADD of the store flag.  */
+      && noce_estimate_conversion_profitable_p (if_info, 1))
     {
       common = XEXP (a, 0);
       a = XEXP (a, 1);
@@ -1352,22 +1365,32 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	    gcc_unreachable ();
 	}
+      /* Is this (cond) ? 2^n : 0?  */
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
+		   /* We need ASHIFT, IOR.   */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	normalize = 1;
+      /* Is this (cond) ? 0 : 2^n?  */
       else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	       && (STORE_FLAG_VALUE == 1
+		   /* We need ASHIFT, IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	{
 	  normalize = 1;
 	  reversep = true;
 	}
+      /* Is this (cond) ? -1 : x?  */
       else if (itrue == -1
 	       && (STORE_FLAG_VALUE == -1
-		   || if_info->branch_cost >= 2))
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	normalize = -1;
+      /* Is this (cond) ? x : -1?  */
       else if (ifalse == -1 && can_reverse
-	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
+	       && (STORE_FLAG_VALUE == -1
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	{
 	  normalize = -1;
 	  reversep = true;
@@ -1519,8 +1542,8 @@ noce_try_addcc (struct noce_if_info *if_info)
 	}
 
       /* If that fails, construct conditional increment or decrement using
-	 setcc.  */
-      if (if_info->branch_cost >= 2
+	 setcc.  We'd only need an ADD/SUB for this.  */
+      if (noce_estimate_conversion_profitable_p (if_info, 1)
 	  && (XEXP (if_info->a, 1) == const1_rtx
 	      || XEXP (if_info->a, 1) == constm1_rtx))
         {
@@ -1575,7 +1598,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
     return FALSE;
 
   reversep = 0;
-  if ((if_info->branch_cost >= 2
+
+  /* Two insns, AND, NEG.  */
+  if ((noce_estimate_conversion_profitable_p (if_info, 2)
        || STORE_FLAG_VALUE == -1)
       && ((if_info->a == const0_rtx
 	   && rtx_equal_p (if_info->b, if_info->x))
@@ -1778,8 +1803,11 @@ noce_try_cmove (struct noce_if_info *if_info)
 	 approach.  */
       else if (!targetm.have_conditional_execution ()
 		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b)
-		&& ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
-		    || if_info->branch_cost >= 3))
+		/* If STORE_FLAG_VALUE is -1, we need SUB, AND, PLUS.  */
+		&& ((noce_estimate_conversion_profitable_p (if_info, 3)
+		     && STORE_FLAG_VALUE == -1)
+		    /* Otherwise, we need NEG, SUB, AND, PLUS.  */
+		    || noce_estimate_conversion_profitable_p (if_info, 4)))
 	{
 	  machine_mode mode = GET_MODE (if_info->x);
 	  HOST_WIDE_INT ifalse = INTVAL (if_info->a);
@@ -2031,7 +2059,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
       && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-      && if_info->branch_cost >= 5)
+      && noce_estimate_conversion_profitable_p (if_info, 5))
     {
       machine_mode address_mode = get_address_mode (a);
 
@@ -3967,6 +3995,9 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.then_else_reversed = then_else_reversed;
   if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
 				     predictable_edge_p (then_edge));
+  if_info.rtx_edge_cost
+    = targetm.rtx_branch_cost (optimize_bb_for_speed_p (test_bb),
+			       predictable_edge_p (then_edge));
 
   /* Do the real work.  */
 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 5/6] Improve the cost model for multiple-sets
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
                                         ` (2 preceding siblings ...)
  2016-06-02 16:55                       ` [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
@ 2016-06-02 16:55                       ` James Greenhalgh
  2016-06-02 16:56                       ` [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
                                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 331 bytes --]


Hi,

This patch is a small rewrite to the cost model for
bb_ok_for_noce_multiple_sets to use the new noce_cmove_estimate_cost
function added in the previous patches.

Thanks,
James

---
2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_of_for_noce_convert_multiple_sets): Change cost model.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-RFC-Patch-5-6-Improve-the-cost-model-for-multiple-se.patch --]
[-- Type: text/x-patch;  name=0005-RFC-Patch-5-6-Improve-the-cost-model-for-multiple-se.patch, Size: 2664 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index bd3f55d..f71889e 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3373,12 +3373,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 /* Return true iff basic block TEST_BB is comprised of only
    (SET (REG) (REG)) insns suitable for conversion to a series
    of conditional moves.  FORNOW: Use II to find the expected cost of
-   the branch into/over TEST_BB.
-
-   TODO: This creates an implicit "magic number" for if conversion.
-   II->rtx_edge_cost now guides the maximum number of set instructions in
-   a basic block which is considered profitable to completely
-   if-convert.  */
+   the branch into/over TEST_BB.  */
 
 static bool
 bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
@@ -3387,8 +3382,11 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  /* TODO:  Revisit this cost model.  */
-  unsigned limit = MIN (ii->rtx_edge_cost / COSTS_N_INSNS (1), param);
+  unsigned cost_limit = ii->rtx_edge_cost;
+  unsigned cost = 0;
+  bool speed_p = optimize_bb_for_speed_p (ii->test_bb);
+  rtx_code code = GET_CODE (ii->cond);
+  machine_mode cmode = GET_MODE (XEXP (ii->cond, 0));
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3404,6 +3402,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
       rtx dest = SET_DEST (set);
       rtx src = SET_SRC (set);
 
+      cost += noce_cmove_estimate_cost (cmode, GET_MODE (dest),
+					code, speed_p);
+
       /* We can possibly relax this, but for now only handle REG to REG
 	 moves.  This avoids any issues that might come from introducing
 	 loads/stores that might violate data-race-freedom guarantees.  */
@@ -3418,14 +3419,14 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
       if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
-      /* FORNOW: Our cost model is a count of the number of instructions we
-	 would if-convert.  This is suboptimal, and should be improved as part
-	 of a wider rework of branch_cost.  */
-      if (++count > limit)
-	return false;
+      count++;
     }
 
-  return count > 1;
+  /* If we would only put out one conditional move, the other strategies
+     this pass tries are better optimized and will be more appropriate.
+     If the cost in instructions is higher than the limit we've imposed,
+     also give up.  */
+  return (count > 1 && cost <= cost_limit && count <= param);
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
                                         ` (3 preceding siblings ...)
  2016-06-02 16:55                       ` [RFC: Patch 5/6] Improve the cost model for multiple-sets James Greenhalgh
@ 2016-06-02 16:56                       ` James Greenhalgh
  2016-06-02 16:56                       ` [RFC: Patch 3/6] Remove if_info->branch_cost James Greenhalgh
                                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]


Hi,

This transformation tries two cost models, one estimating the number
of insns to use, one estimating the RTX cost of the transformed sequence.
This is inconsistent with the other cost models used in ifcvt.c and
unnecessary - eliminate the second cost model.

Thanks,
James

---
2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_try_store_flag_mask): Delete redundant cost model.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0006-RFC-Patch-6-6-Remove-second-cost-model-from-noce_try.patch --]
[-- Type: text/x-patch;  name=0006-RFC-Patch-6-6-Remove-second-cost-model-from-noce_try.patch, Size: 1329 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f71889e..6e9997e 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1670,8 +1670,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
   reversep = 0;
 
-  /* Two insns, AND, NEG.  */
-  if ((noce_estimate_conversion_profitable_p (if_info, 2)
+  /* One insn, AND.  */
+  if ((noce_estimate_conversion_profitable_p (if_info, 1)
        || STORE_FLAG_VALUE == -1)
       && ((if_info->a == const0_rtx
 	   && rtx_equal_p (if_info->b, if_info->x))
@@ -1693,9 +1693,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
       if (target)
 	{
-	  int old_cost, new_cost, insn_cost;
-	  int speed_p;
-
 	  if (target != if_info->x)
 	    noce_emit_move_insn (if_info->x, target);
 
@@ -1703,15 +1700,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 	  if (!seq)
 	    return FALSE;
 
-	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
-	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  /* TODO: Revisit this cost model.  */
-	  old_cost = if_info->rtx_edge_cost + insn_cost;
-	  new_cost = seq_cost (seq, speed_p);
-
-	  if (new_cost > old_cost)
-	    return FALSE;
-
 	  emit_insn_before_setloc (seq, if_info->jump,
 				   INSN_LOCATION (if_info->insn_a));
 	  return TRUE;

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 3/6] Remove if_info->branch_cost
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
                                         ` (4 preceding siblings ...)
  2016-06-02 16:56                       ` [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
@ 2016-06-02 16:56                       ` James Greenhalgh
  2016-06-03  9:32                       ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models Bernd Schmidt
  2016-06-09 16:58                       ` Jeff Law
  7 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-02 16:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]


Hi,

This patch removes what is left of branch_cost uses, moving them to use
the new hook and tagging each left over spot with a TODO to revisit them.
All these uses are in rtx costs units, so we don't have more work to do at
this point.

OK?

Thanks,
James

---
2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Remove branch_cost.
	(noce_try_store_flag_mask): Use rtx_edge_cost rather than
	branch_cost, tag as a TODO..
	(noce_try_cmove_arith): Likewise.
	(noce_convert_multiple_sets): Likewise.
	(bb_ok_for_noce_convert_multiple_sets): Likewise.
	(noce_find_if_block): Remove set of branch_cost.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-RFC-Patch-3-6-Remove-if_info-branch_cost.patch --]
[-- Type: text/x-patch;  name=0003-RFC-Patch-3-6-Remove-if_info-branch_cost.patch, Size: 2720 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 22cb5e7..b192c85 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -812,7 +812,6 @@ struct noce_if_info
   unsigned int else_cost;
 
   /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
   unsigned int rtx_edge_cost;
 };
 
@@ -1634,7 +1633,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
 	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  old_cost = COSTS_N_INSNS (if_info->branch_cost) + insn_cost;
+	  /* TODO: Revisit this cost model.  */
+	  old_cost = if_info->rtx_edge_cost + insn_cost;
 	  new_cost = seq_cost (seq, speed_p);
 
 	  if (new_cost > old_cost)
@@ -2105,7 +2105,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
      bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+
+  /* TODO: Revisit cost model.  */
+  if (MAX (then_cost, else_cost) > if_info->rtx_edge_cost)
     return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -3280,8 +3282,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
    of conditional moves.  FORNOW: Use II to find the expected cost of
    the branch into/over TEST_BB.
 
-   TODO: This creates an implicit "magic number" for branch_cost.
-   II->branch_cost now guides the maximum number of set instructions in
+   TODO: This creates an implicit "magic number" for if conversion.
+   II->rtx_edge_cost now guides the maximum number of set instructions in
    a basic block which is considered profitable to completely
    if-convert.  */
 
@@ -3292,7 +3294,8 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  unsigned limit = MIN (ii->branch_cost, param);
+  /* TODO:  Revisit this cost model.  */
+  unsigned limit = MIN (ii->rtx_edge_cost / COSTS_N_INSNS (1), param);
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3993,8 +3996,6 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.cond_earliest = cond_earliest;
   if_info.jump = jump;
   if_info.then_else_reversed = then_else_reversed;
-  if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
-				     predictable_edge_p (then_edge));
   if_info.rtx_edge_cost
     = targetm.rtx_branch_cost (optimize_bb_for_speed_p (test_bb),
 			       predictable_edge_p (then_edge));

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
                                         ` (5 preceding siblings ...)
  2016-06-02 16:56                       ` [RFC: Patch 3/6] Remove if_info->branch_cost James Greenhalgh
@ 2016-06-03  9:32                       ` Bernd Schmidt
  2016-06-09 16:58                       ` Jeff Law
  7 siblings, 0 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-03  9:32 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, ramana.radhakrishnan, bernds_cb1, law, ebotcazou, steven

On 06/02/2016 06:53 PM, James Greenhalgh wrote:
> As I iterated through versions of this patch set, I realised that all we
> really wanted for ifcvt was a way to estimate the cost of a branch in units
> that were comparable to the cost of instructions. The trouble with BRANCH_COST
> wasn't that it was returning a magic number, it was just that it was returning
> a magic number which had inconsistent meanings in the compiler. Otherwise,
> BRANCH_COST was a straightforward, low-complexity target hook.

[...]

> Having worked through the patch set, I'd say it is probably a small
> improvement over what we currently do, but I'm not very happy with it. I'm
> posting it for comment so we can discuss any directions for costs that I
> haven't thought about or prototyped. I'm also happy to drop the costs
> rewrite if this seems like complexity for no benefit.
>
> Any thoughts?

I think it all looks fairly reasonable, and on the whole lower 
complexity is likely a better approach. A few comments on individual 
patches:

> +unsigned int
> +default_rtx_branch_cost (bool speed_p,
> +			 bool predictable_p)

No need to wrap the line.

> +noce_estimate_conversion_profitable_p (struct noce_if_info *if_info,
> +				       unsigned int ninsns)
> +{
> +  return (if_info->rtx_edge_cost >= ninsns * COSTS_N_INSNS (1));

Please no parens around return. There are several examples across the 
series.

NINSNS is the number of simple instructions we're going to add, right? 
How about the instructions we're going to remove, shouldn't these be 
counted too? I think that kind of thing was implicit in the old tests vs 
branch_cost.

>    if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
>  				     predictable_edge_p (then_edge));
> +  if_info.rtx_edge_cost
> +    = targetm.rtx_branch_cost (optimize_bb_for_speed_p (test_bb),
> +			       predictable_edge_p (then_edge));

This I have the most problems with, mostly as an issue with naming. 
Calling it an edge_cost implies that it depends on whether the branch is 
taken or not, which I believe is not the case. Maybe the interface ought 
to be able to provide taken/not-taken information, although I can't 
off-hand think of a way to make use of such information.

Here, I'd rather call the field branch_cost, but there's already one 
with that name. Are there still places that use the old one after your 
patch series?

Hmm, I guess information about whether the branch is likely taken/not 
taken/unpredictable would be of use to add the instructions behind it 
into the cost of the existing code.

> +/* Return TRUE if CODE is an RTX comparison operator.  */
> +
> +static bool
> +noce_code_is_comparison_p (rtx_code code)

Isn't there some way to do this based on GET_RTX_CLASS?

In the noce_cmove_arith patch, is it possible to just construct the 
actual sequence we want to use and test its cost (much like the 
combiner's approach), rather than building up a random one for 
estimation? Seems like bailing out early based on a cost estimate is no 
longer profitable for compile-time if getting the estimate is as much 
work as doing the conversion in the first place.

> I've bootstrapped and tested the patch set on x86_64 and aarch64, but
> they probably need some further polishing if we were to decide this was a
> useful direction.

Also, I'd like some information on what this does to code generation on 
a few different targets.

Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 1/6] New target hook: rtx_branch_cost
  2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
@ 2016-06-03 10:39                         ` Richard Biener
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
  0 siblings, 1 reply; 60+ messages in thread
From: Richard Biener @ 2016-06-03 10:39 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: GCC Patches, nd, Ramana Radhakrishnan, Bernd Schmidt, Jeff Law,
	Eric Botcazou, Steven Bosscher

On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch introduces a new target hook, to be used like BRANCH_COST but
> with a guaranteed unit of measurement. We want this to break away from
> the current ambiguous uses of BRANCH_COST.
>
> BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
> instruction counts - where it is used as the limit on the number of new
> instructions we are permitted to generate. The other (after multiplying
> by COSTS_N_INSNS (1)) directly against RTX costs.
>
> Of these, a comparison against RTX costs is the more easily understood
> metric across the compiler, and the one I've pulled out to the new hook.
> To keep things consistent for targets which don't migrate, this new hook
> has a default value of BRANCH_COST * COSTS_N_INSNS (1).
>
> OK?

How does the caller compute "predictable"?  There are some archs where
an information on whether this is a forward or backward jump is more
useful I guess.  Also at least for !speed_p the distance of the branch is
important given not all targets support arbitrary branch offsets.

I remember that at the last Cauldron we discussed to change things to
compare costs of sequences of instructions rather than giving targets no
context with just asking for single (sub-)insn rtx costs.

That said, the patch is certainly an improvement.

Thanks,
Richard.

> Thanks,
> James
>
> ---
> 2016-06-02  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * target.def (rtx_branch_cost): New.
>         * doc/tm.texi.in (TARGET_RTX_BRANCH_COST): Document it.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_rtx_branch_cost): New.
>         * targhooks.c (default_rtx_branch_cost): New.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models
  2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
                                         ` (6 preceding siblings ...)
  2016-06-03  9:32                       ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models Bernd Schmidt
@ 2016-06-09 16:58                       ` Jeff Law
  2016-06-10 10:45                         ` James Greenhalgh
  7 siblings, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-06-09 16:58 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, ramana.radhakrishnan, bernds_cb1, ebotcazou, steven

On 06/02/2016 10:53 AM, James Greenhalgh wrote:
> Hi,
>
> When I was working in the area last year, I promised to revisit the cost
> model for noce if-conversion and see if I could improve the modeling. This
> turned out to be more tricky than I expected.
>
> This patch set rewrites the cost model for noce if-conversion. The goal is
> to rationalise the units used in the calculations away from BRANCH_COST,
> which is only defined relative to itself.
Right.  I think we all agreed that the key weakness of BRANCH_COST was 
that its meaning is only defined relative to itself.

What we want is a costing metric that would allow us to estimate the 
cost of different forms of the computation, which might include branches 
and which may include edge probabilty information.

>
> If you're looking at that and thinking it doesn't sound much different from
> our current call to BRANCH_COST, you're right. This isn't as large a
> departure from the existing cost model as I had originally intended.
Perhaps not as large of a change as you intended, but I think you're 
hitting the key issue with BRANCH_COST.


> This act of making the cost models consistent will cause code generation
> changes on a number of targets - most notably x86_64. On x86_64 the RTX
> cost of a conditional move comes out at "20" - this is far higher than
> COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots
> of if-conversion. The easy fix for this would be to implement the new hook.
> I measured the performance impact on Spec2000 as a smoke test, it didn't
> seem to harm anything, and the result was a slight (< 3%) uplift on
> Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for
> the reasons.
I'd be comfortable with Uros guiding the implementation of the target 
hook for x86 so that we don't take a major step backward.


Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models
  2016-06-09 16:58                       ` Jeff Law
@ 2016-06-10 10:45                         ` James Greenhalgh
  0 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-10 10:45 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, nd, ramana.radhakrishnan, bernds_cb1, ebotcazou,
	steven, kyrtka01

On Thu, Jun 09, 2016 at 10:58:52AM -0600, Jeff Law wrote:
> On 06/02/2016 10:53 AM, James Greenhalgh wrote:
> >Hi,
> >
> >When I was working in the area last year, I promised to revisit the cost
> >model for noce if-conversion and see if I could improve the modeling. This
> >turned out to be more tricky than I expected.
> >
> >This patch set rewrites the cost model for noce if-conversion. The goal is
> >to rationalise the units used in the calculations away from BRANCH_COST,
> >which is only defined relative to itself.
> Right.  I think we all agreed that the key weakness of BRANCH_COST
> was that its meaning is only defined relative to itself.
> 
> What we want is a costing metric that would allow us to estimate the
> cost of different forms of the computation, which might include
> branches and which may include edge probabilty information.
> 
> >
> >If you're looking at that and thinking it doesn't sound much different from
> >our current call to BRANCH_COST, you're right. This isn't as large a
> >departure from the existing cost model as I had originally intended.
> Perhaps not as large of a change as you intended, but I think you're
> hitting the key issue with BRANCH_COST.
> 
> 
> >This act of making the cost models consistent will cause code generation
> >changes on a number of targets - most notably x86_64. On x86_64 the RTX
> >cost of a conditional move comes out at "20" - this is far higher than
> >COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots
> >of if-conversion. The easy fix for this would be to implement the new hook.
> >I measured the performance impact on Spec2000 as a smoke test, it didn't
> >seem to harm anything, and the result was a slight (< 3%) uplift on
> >Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for
> >the reasons.
> I'd be comfortable with Uros guiding the implementation of the
> target hook for x86 so that we don't take a major step backward.

The trouble I'm having with all targets is noce_try_cmove_arith and the
testcases added as part of that patch set.

The current cost model for noce_try_cmove_arith doesn't take in to
consideration the cost of a conditional move at all, it just checks the
cost of each branch, takes the maximum of that, and compares it against
COSTS_N_INSNS (BRANCH_COST).

As we move to my patch set, I want to compute the cost of the whole
if-converted sequence, which is going to include the cost of a conditional
move. On x86_64, the total cost for a simple conditional move
is COSTS_N_INSNS (5). This alone would prevent if-conversion for most
x86_64 subtargets, once you add in the max cost of one of the two branches,
you guarantee that no x86_64 target will be converting through
noce_try_cmove_arith.

From the test results I'm seeing, this holds true for other targets which
show regressions with the new cost models. Clearly my idea for the default
hook implementation is not going to fly. If more targets had implemented the
PARAM_MAX_RTL_IF_CONVERSION_INSNS hook, I could use that to guide the
default implementation, but that is currently x86_64 only. I could add a
multiplier against BRANCH_COST in the new hook, but then we'd be guaranteeing
that the "cheap" if-conversions, where we use STORE_FLAG_VALUE rather than
introducing a conditional move, always fired unless the target had a
branch_cost of 0 (this might not be a bad model actually...). Finding what
this multiplier should be will be tough, as it depends directly on the cost
of a conditional move, which targets don't expose easily, and which I don't
want to construct junk conditional moves just to find through rtx_cost.

I'll keep giving it some thought, and I'd appreciate any suggestions for the
default hook implementation.

I've respun the patch set around Bernd's suggestion, and now that we don't
check the cost model until after constructing the new sequence the patch set
looks much nicer. I'll wait a bit before sending the respin out in the hope
that I'll have a good idea for the default hook implementation to reduce
the number of performance changes I'll introduce.

Thanks,
James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 3/6 v2] Remove if_info->branch_cost
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
  2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
@ 2016-06-21 15:51                             ` James Greenhalgh
  2016-07-13 21:19                               ` Jeff Law
  2016-06-21 15:51                             ` [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
                                               ` (5 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]


Hi,

This patch removes what is left of branch_cost uses, moving them to use
the new hook and tagging each left over spot with a TODO to revisit them.
All these uses are in rtx costs units, so we don't have more work to do at
this point.

Bootstrapped as part of the patch series on aarch64 and x86-64.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): Remove branch_cost.
	(noce_try_store_flag_mask): Use max_seq_cost rather than
	branch_cost, tag as a TODO..
	(noce_try_cmove_arith): Likewise.
	(noce_convert_multiple_sets): Likewise.
	(bb_ok_for_noce_convert_multiple_sets): Likewise.
	(noce_find_if_block): Remove set of branch_cost.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-RFC-Patch-3-6-v2-Remove-if_info-branch_cost.patch --]
[-- Type: text/x-patch;  name=0003-RFC-Patch-3-6-v2-Remove-if_info-branch_cost.patch, Size: 2842 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 0b97114..f4ad037 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -811,9 +811,6 @@ struct noce_if_info
   unsigned int then_cost;
   unsigned int else_cost;
 
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
-
   /* Maximum permissible cost for the unconditional sequence we should
      generate to replace this branch.  */
   unsigned int max_seq_cost;
@@ -1683,7 +1680,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
 	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  old_cost = COSTS_N_INSNS (if_info->branch_cost) + insn_cost;
+	  /* TODO: Revisit this cost model.  */
+	  old_cost = if_info->max_seq_cost + insn_cost;
 	  new_cost = seq_cost (seq, speed_p);
 
 	  if (new_cost > old_cost)
@@ -2159,7 +2157,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
      bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+
+  /* TODO: Revisit cost model.  */
+  if (MAX (then_cost, else_cost) > if_info->max_seq_cost)
     return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -3341,8 +3341,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
    of conditional moves.  FORNOW: Use II to find the expected cost of
    the branch into/over TEST_BB.
 
-   TODO: This creates an implicit "magic number" for branch_cost.
-   II->branch_cost now guides the maximum number of set instructions in
+   TODO: This creates an implicit "magic number" for if conversion.
+   II->max_seq_cost now guides the maximum number of set instructions in
    a basic block which is considered profitable to completely
    if-convert.  */
 
@@ -3353,7 +3353,8 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  unsigned limit = MIN (ii->branch_cost, param);
+  /* TODO:  Revisit this cost model.  */
+  unsigned limit = MIN (ii->max_seq_cost / COSTS_N_INSNS (1), param);
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -4070,8 +4071,6 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.cond_earliest = cond_earliest;
   if_info.jump = jump;
   if_info.then_else_reversed = then_else_reversed;
-  if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
-				     predictable_edge_p (then_edge));
   if_info.max_seq_cost
     = targetm.max_noce_ifcvt_seq_cost (optimize_bb_for_speed_p (test_bb),
 				       then_edge);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
  2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
  2016-06-21 15:51                             ` [RFC: Patch 3/6 v2] Remove if_info->branch_cost James Greenhalgh
@ 2016-06-21 15:51                             ` James Greenhalgh
  2016-07-13 21:18                               ` Jeff Law
  2016-06-21 15:53                             ` [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
                                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]


Hi,

This patch pulls the comparisons between if_info->branch_cost and a magic
number representing an instruction count to a common function. While I'm
doing it, I've documented the instructions that the magic numbers relate
to, and updated them where they were inconsistent.

If our measure of the cost of a branch is now in rtx costs units, we can
get to an estimate for the cost of an expression from the number of
instructions by multiplying through by COSTS_N_INSNS (1).

Alternatively, we could actually construct the cheap sequences and
check the sequence. But in these cases we're expecting to if-convert on
almost all targets, the transforms in this patch are almost universally
a good idea, even for targets with a very powerful branch predictor,
eliminating the branch eliminates a basic block boundary so might be
helpful for scheduling, combine, and other RTL optimizers.

Bootstrapped on x86-64 and aarch64 as part of the full sequence.

OK?

Thanks,
James

---

2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): New field: max_seq_cost.
	(noce_estimate_conversion_profitable_p): New.
	(noce_try_store_flag_constants): Use it.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_cmove_arith): Likewise.
	(noce_find_if_block): Record targetm.max_noce_ifcvt_seq_cost.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-RFC-Patch-2-6-v2-Factor-out-the-comparisons-against-.patch --]
[-- Type: text/x-patch;  name=0002-RFC-Patch-2-6-v2-Factor-out-the-comparisons-against-.patch, Size: 5287 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fd29516..0b97114 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -814,6 +814,10 @@ struct noce_if_info
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
 
+  /* Maximum permissible cost for the unconditional sequence we should
+     generate to replace this branch.  */
+  unsigned int max_seq_cost;
+
   /* The name of the noce transform that succeeded in if-converting
      this structure.  Used for debugging.  */
   const char *transform_name;
@@ -835,6 +839,17 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* This function is always called when we would expand a number of "cheap"
+   instructions.  Multiply NINSNS by COSTS_N_INSNS (1) to approximate the
+   RTX cost of those cheap instructions.  */
+
+inline static bool
+noce_estimate_conversion_profitable_p (struct noce_if_info *if_info,
+				       unsigned int ninsns)
+{
+  return if_info->max_seq_cost >= ninsns * COSTS_N_INSNS (1);
+}
+
 /* Helper function for noce_try_store_flag*.  */
 
 static rtx
@@ -1320,7 +1335,8 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
       && (REG_P (XEXP (a, 0))
 	  || (noce_operand_ok (XEXP (a, 0))
 	      && ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0))))
-      && if_info->branch_cost >= 2)
+      /* We need one instruction, the ADD of the store flag.  */
+      && noce_estimate_conversion_profitable_p (if_info, 1))
     {
       common = XEXP (a, 0);
       a = XEXP (a, 1);
@@ -1393,22 +1409,32 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	    gcc_unreachable ();
 	}
+      /* Is this (cond) ? 2^n : 0?  */
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	       && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
+		   /* We need ASHIFT, IOR.   */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	normalize = 1;
+      /* Is this (cond) ? 0 : 2^n?  */
       else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	       && (STORE_FLAG_VALUE == 1
+		   /* We need ASHIFT, IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	{
 	  normalize = 1;
 	  reversep = true;
 	}
+      /* Is this (cond) ? -1 : x?  */
       else if (itrue == -1
 	       && (STORE_FLAG_VALUE == -1
-		   || if_info->branch_cost >= 2))
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	normalize = -1;
+      /* Is this (cond) ? x : -1?  */
       else if (ifalse == -1 && can_reverse
-	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
+	       && (STORE_FLAG_VALUE == -1
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	{
 	  normalize = -1;
 	  reversep = true;
@@ -1564,8 +1590,8 @@ noce_try_addcc (struct noce_if_info *if_info)
 	}
 
       /* If that fails, construct conditional increment or decrement using
-	 setcc.  */
-      if (if_info->branch_cost >= 2
+	 setcc.  We'd only need an ADD/SUB for this.  */
+      if (noce_estimate_conversion_profitable_p (if_info, 1)
 	  && (XEXP (if_info->a, 1) == const1_rtx
 	      || XEXP (if_info->a, 1) == constm1_rtx))
         {
@@ -1621,7 +1647,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
     return FALSE;
 
   reversep = 0;
-  if ((if_info->branch_cost >= 2
+
+  /* One instruction, AND.  */
+  if ((noce_estimate_conversion_profitable_p (if_info, 1)
        || STORE_FLAG_VALUE == -1)
       && ((if_info->a == const0_rtx
 	   && rtx_equal_p (if_info->b, if_info->x))
@@ -1828,8 +1856,11 @@ noce_try_cmove (struct noce_if_info *if_info)
 	 approach.  */
       else if (!targetm.have_conditional_execution ()
 		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b)
-		&& ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
-		    || if_info->branch_cost >= 3))
+		/* If STORE_FLAG_VALUE is -1, we need SUB, AND, PLUS.  */
+		&& ((noce_estimate_conversion_profitable_p (if_info, 3)
+		     && STORE_FLAG_VALUE == -1)
+		    /* Otherwise, we need NEG, SUB, AND, PLUS.  */
+		    || noce_estimate_conversion_profitable_p (if_info, 4)))
 	{
 	  machine_mode mode = GET_MODE (if_info->x);
 	  HOST_WIDE_INT ifalse = INTVAL (if_info->a);
@@ -2082,7 +2113,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
       && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-      && if_info->branch_cost >= 5)
+      && noce_estimate_conversion_profitable_p (if_info, 5))
     {
       machine_mode address_mode = get_address_mode (a);
 
@@ -4041,6 +4072,9 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.then_else_reversed = then_else_reversed;
   if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
 				     predictable_edge_p (then_edge));
+  if_info.max_seq_cost
+    = targetm.max_noce_ifcvt_seq_cost (optimize_bb_for_speed_p (test_bb),
+				       then_edge);
 
   /* Do the real work.  */
 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
@ 2016-06-21 15:51                             ` James Greenhalgh
  2016-07-13 21:23                               ` Jeff Law
  2016-06-21 15:51                             ` [RFC: Patch 3/6 v2] Remove if_info->branch_cost James Greenhalgh
                                               ` (6 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]


Hi,

This patch is rewrites the cost model for bb_ok_for_noce_multiple_sets
to use the max_seq_cost heuristic added in earlier patch revisions.

As with the previous patch, I've used the new parameters to ensure that
the testsuite is still testing the functionality rather than relying on
the target setting the costs appropriately.

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_convert_multiple sets): Move cost model to here,
	check the sequence cost after constructing the converted sequence.
	(bb_of_for_noce_convert_multiple_sets): Move cost model.

gcc/testsuite/

2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-RFC-Patch-5-6-v2-Improve-the-cost-model-for-multiple.patch --]
[-- Type: text/x-patch;  name=0005-RFC-Patch-5-6-v2-Improve-the-cost-model-for-multiple.patch, Size: 4748 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 78906d3..8f892b0 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3191,6 +3191,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx_insn *jump = if_info->jump;
   rtx_insn *cond_earliest;
   rtx_insn *insn;
+  bool speed_p = optimize_bb_for_speed_p (if_info->test_bb);
 
   start_sequence ();
 
@@ -3273,9 +3274,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   for (int i = 0; i < count; i++)
     noce_emit_move_insn (targets[i], temporaries[i]);
 
-  /* Actually emit the sequence.  */
+  /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
+  /*  Check the cost model to ensure this is profitable.  */
+  if (seq_cost (seq, speed_p)
+      > if_info->max_seq_cost)
+    {
+      end_sequence ();
+      return FALSE;
+    }
+
   for (insn = seq; insn; insn = NEXT_INSN (insn))
     set_used_flags (insn);
 
@@ -3325,23 +3334,16 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
 /* Return true iff basic block TEST_BB is comprised of only
    (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  FORNOW: Use II to find the expected cost of
-   the branch into/over TEST_BB.
-
-   TODO: This creates an implicit "magic number" for if conversion.
-   II->max_seq_cost now guides the maximum number of set instructions in
-   a basic block which is considered profitable to completely
-   if-convert.  */
+   of conditional moves.  Also check that we have more than one set
+   (other routines can handle a single set better than we would), and
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
-				      struct noce_if_info *ii)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  /* TODO:  Revisit this cost model.  */
-  unsigned limit = MIN (ii->max_seq_cost / COSTS_N_INSNS (1), param);
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3377,14 +3379,15 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
       if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
-      /* FORNOW: Our cost model is a count of the number of instructions we
-	 would if-convert.  This is suboptimal, and should be improved as part
-	 of a wider rework of branch_cost.  */
-      if (++count > limit)
-	return false;
+      count++;
     }
 
-  return count > 1;
+  /* If we would only put out one conditional move, the other strategies
+     this pass tries are better optimized and will be more appropriate.
+     Some targets want to strictly limit the number of conditional moves
+     that are emitted, they set this through PARAM, we need to respect
+     that.  */
+  return count > 1 && count <= param;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
@@ -3420,7 +3423,7 @@ noce_process_if_block (struct noce_if_info *if_info)
   if (!else_bb
       && HAVE_conditional_move
       && !HAVE_cc0
-      && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+      && bb_ok_for_noce_convert_multiple_sets (then_bb))
     {
       if (noce_convert_multiple_sets (if_info))
 	{
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index 319b583..0d1671c 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -1,4 +1,4 @@
-/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* visium-*-*" } }  */
 
diff --git a/gcc/testsuite/gcc.dg/ifcvt-5.c b/gcc/testsuite/gcc.dg/ifcvt-5.c
index 818099a..d2a9476 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-5.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-5.c
@@ -1,7 +1,8 @@
 /* Check that multi-insn if-conversion is not done if the override
-   parameter would not allow it.  */
+   parameter would not allow it.  Set the cost parameter very high
+   to ensure that the limiting factor is actually the count parameter.  */
 
-/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=1" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=1 --param max-rtl-if-conversion-unpredictable-cost=200" } */
 
 typedef int word __attribute__((mode(word)));
 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost
  2016-06-03 10:39                         ` Richard Biener
@ 2016-06-21 15:51                           ` James Greenhalgh
  2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
                                               ` (7 more replies)
  0 siblings, 8 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 4545 bytes --]


On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > This patch introduces a new target hook, to be used like BRANCH_COST but
> > with a guaranteed unit of measurement. We want this to break away from
> > the current ambiguous uses of BRANCH_COST.
> >
> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
> > instruction counts - where it is used as the limit on the number of new
> > instructions we are permitted to generate. The other (after multiplying
> > by COSTS_N_INSNS (1)) directly against RTX costs.
> >
> > Of these, a comparison against RTX costs is the more easily understood
> > metric across the compiler, and the one I've pulled out to the new hook.
> > To keep things consistent for targets which don't migrate, this new hook
> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).
> >
> > OK?
>
> How does the caller compute "predictable"?  There are some archs where
> an information on whether this is a forward or backward jump is more
> useful I guess.  Also at least for !speed_p the distance of the branch is
> important given not all targets support arbitrary branch offsets.

Just through a call to predictable_edge_p. It isn't perfect. My worry
with adding more details of the branch is that you end up with a nonsense
target implementation that tries way too hard to be clever. But, I don't
mind passing the edge through to the target hook, that way a target has
it if they want it. In this patch revision, I pass the edge through.

> I remember that at the last Cauldron we discussed to change things to
> compare costs of sequences of instructions rather than giving targets no
> context with just asking for single (sub-)insn rtx costs.

I've made better use of seq_cost in this respin. Bernd was right,
constructing dummy RTX just for costs, then discarding it, then
constructing the actual RTX for matching doesn't make sense as a pipeline.
Better just to construct the real sequence and use the cost of that.

In this patch revision, I started by removing the idea that this costs
a branch at all. It doesn't, the use of this hook is really a target
trying to limit if-convert to not end up pulling too much on to the
unconditional path. It seems better to expose that limit directly by
explicitly asking for the maximum cost of an unconditional sequence we
would create, and comparing against seq_cost of the new RTL. This saves
a target trying to figure out what is meant by a cost of a branch.

Having done that, I think I can see a clearer path to getting the
default hook implementation in shape. I've introduced two new params,
which give maximum costs for the generated sequence (one for a "predictable"
branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
to be useful to give the user control in the case we are compiling for
size - whether this is a size win or not is independent of whether the
branch is predictable.

For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation. This means we're
still potentially going to introduce performance regressions for targets
that don't provide an implementation of the new hook, or a default value
for the new parameters. It does mean we can keep the testsuite clean by
setting parameter values suitably high for all targets that have
conditional move instructions.

The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.

This first patch introduces the two new parameters, and uses them in the
default implementation of the target hook.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (max_noce_ifcvt_seq_cost): New.
	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
	* doc/invoke.texi: Document new params.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-RFC-Patch-1-6-v2-New-target-hook-max_noce_ifcvt_seq_.patch --]
[-- Type: text/x-patch;  name=0001-RFC-Patch-1-6-v2-New-target-hook-max_noce_ifcvt_seq_.patch, Size: 8082 bytes --]

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e000218..b71968f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8816,6 +8816,17 @@ considered for if-conversion.  The default is 10, though the compiler will
 also use other heuristics to decide whether if-conversion is likely to be
 profitable.
 
+@item max-rtl-if-conversion-precitable-cost
+@item max-rtl-if-conversion-unprecitable-cost
+RTL if-conversion tries to remove conditional branches around a block and
+replace them with conditionally executed instructions.  These parameters
+give the maximum permissible cost for the sequence that would be generated
+by if-conversion depending on whether the branch is statically determined
+to be predictable or not.  The units for this parameter are the same as
+those for the GCC internal seq_cost metric.  The compiler will try to
+provide a reasonable default for this parameter using the BRANCH_COST
+target macro.
+
 @item max-crossjump-edges
 The maximum number of incoming edges to consider for cross-jumping.
 The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..bbf6c1b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6526,6 +6526,31 @@ should probably only be given to addresses with different numbers of
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} {unsigned int} TARGET_MAX_NOCE_IFCVT_SEQ_COST (bool @var{speed_p}, edge @var{e})
+This hook should return a value in the same units as
+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for
+a sequence generated by the RTL if-conversion pass when conditional
+execution is not available.  The RTL if-conversion pass attempts
+to convert conditional operations that would require a branch to a
+series of unconditional operations and @code{mov@var{mode}cc} insns.
+This hook gives the maximum cost of the unconditional instructions and
+the @code{mov@var{mode}cc} insns.  RTL if-conversion is cancelled if the
+cost of the converted sequence is greater than the value returned by this
+hook.
+
+@code{speed_p} is true if we are compiling for speed.
+@code{predictable_p} is true if analysis suggests that the branch
+will be predictable.  A target may decide to implement this hook to
+return a lower maximum cost for branches that the compiler believes
+will be predictable.
+
+The default implementation of this hook uses
+@code{BRANCH_COST * COSTS_N_INSNS (1)} if we are compiling for size,
+uses the @code{max-rtl-if-conversion-[un]predictable} parameters if they
+are set, and uses a multiple of @code{BRANCH_COST} if we are compiling
+for speed and the appropriate parameter is not set.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1e8423c..d2b7f41 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4762,6 +4762,8 @@ Define this macro if a non-short-circuit operation produced by
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_MAX_NOCE_IFCVT_SEQ_COST
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @node Scheduling
diff --git a/gcc/params.def b/gcc/params.def
index 894b7f3..682adbd 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1217,6 +1217,20 @@ DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
 	  "if-conversion.",
 	  10, 0, 99)
 
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST,
+	  "max-rtl-if-conversion-predictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch which "
+	  "is considered predictable.",
+	  20, 0, 200)
+
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST,
+	  "max-rtl-if-conversion-unpredictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch which "
+	  "is considered predictable.",
+	  40, 0, 200)
+
 DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
 	  "hsa-gen-debug-stores",
 	  "Level of hsa debug stores verbosity",
diff --git a/gcc/target.def b/gcc/target.def
index a4df363..22e4898 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3572,6 +3572,35 @@ registers on machines with lots of registers.",
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Give a cost, in RTX Costs units, for an edge.  Like BRANCH_COST, but with
+   well defined units.  */
+DEFHOOK
+(max_noce_ifcvt_seq_cost,
+ "This hook should return a value in the same units as\n\
+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\
+a sequence generated by the RTL if-conversion pass when conditional\n\
+execution is not available.  The RTL if-conversion pass attempts\n\
+to convert conditional operations that would require a branch to a\n\
+series of unconditional operations and @code{mov@var{mode}cc} insns.\n\
+This hook gives the maximum cost of the unconditional instructions and\n\
+the @code{mov@var{mode}cc} insns.  RTL if-conversion is cancelled if the\n\
+cost of the converted sequence is greater than the value returned by this\n\
+hook.\n\
+\n\
+@code{speed_p} is true if we are compiling for speed.\n\
+@code{predictable_p} is true if analysis suggests that the branch\n\
+will be predictable.  A target may decide to implement this hook to\n\
+return a lower maximum cost for branches that the compiler believes\n\
+will be predictable.\n\
+\n\
+The default implementation of this hook uses\n\
+@code{BRANCH_COST * COSTS_N_INSNS (1)} if we are compiling for size,\n\
+uses the @code{max-rtl-if-conversion-[un]predictable} parameters if they\n\
+are set, and uses a multiple of @code{BRANCH_COST} if we are compiling\n\
+for speed and the appropriate parameter is not set.",
+unsigned int, (bool speed_p, edge e),
+default_max_noce_ifcvt_seq_cost)
+
 /* Permit speculative instructions in delay slots during delayed-branch 
    scheduling.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 3e089e7..42dea3b 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -74,6 +74,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "predict.h"
+#include "params.h"
 
 
 bool
@@ -1977,4 +1979,29 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Default implementation of TARGET_RTX_BRANCH_COST.  */
+
+unsigned int
+default_max_noce_ifcvt_seq_cost (bool speed_p, edge e)
+{
+  bool predictable_p = predictable_edge_p (e);
+  /* For size, some targets like to set a BRANCH_COST of zero to disable
+     ifcvt, continue to allow that.  Then multiply through by
+     COSTS_N_INSNS (1) so we're in a comparable base.  */
+
+  if (!speed_p)
+    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);
+
+  enum compiler_param param = predictable_p
+			      ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
+			      : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST;
+
+  /* If we have a parameter set, use that, otherwise take a guess using
+     BRANCH_COST.  */
+  if (global_options_set.x_param_values[param])
+    return PARAM_VALUE (param);
+  else
+    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (3);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index d6581cf..e1bae6b 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -255,4 +255,6 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern unsigned int default_max_noce_ifcvt_seq_cost (bool, edge);
+
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
                                               ` (2 preceding siblings ...)
  2016-06-21 15:51                             ` [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
@ 2016-06-21 15:53                             ` James Greenhalgh
  2016-07-13 21:24                               ` Jeff Law
  2016-06-21 15:53                             ` [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith James Greenhalgh
                                               ` (3 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]


Hi,

This transformation tries two cost models, one estimating the number
of insns to use, one estimating the RTX cost of the transformed sequence.
This is inconsistent with the other cost models used in ifcvt.c and
unneccesary - eliminate the second cost model.

Thanks,
James

---
2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_try_store_flag_mask): Delete redundant cost model.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0006-RFC-Patch-6-6-v2-Remove-second-cost-model-from-noce_.patch --]
[-- Type: text/x-patch;  name=0006-RFC-Patch-6-6-v2-Remove-second-cost-model-from-noce_.patch, Size: 967 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 8f892b0..0cb8280 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1668,9 +1668,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
       if (target)
 	{
-	  int old_cost, new_cost, insn_cost;
-	  int speed_p;
-
 	  if (target != if_info->x)
 	    noce_emit_move_insn (if_info->x, target);
 
@@ -1678,15 +1675,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 	  if (!seq)
 	    return FALSE;
 
-	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
-	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  /* TODO: Revisit this cost model.  */
-	  old_cost = if_info->max_seq_cost + insn_cost;
-	  new_cost = seq_cost (seq, speed_p);
-
-	  if (new_cost > old_cost)
-	    return FALSE;
-
 	  emit_insn_before_setloc (seq, if_info->jump,
 				   INSN_LOCATION (if_info->insn_a));
 	  if_info->transform_name = "noce_try_store_flag_mask";

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
                                               ` (3 preceding siblings ...)
  2016-06-21 15:53                             ` [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
@ 2016-06-21 15:53                             ` James Greenhalgh
  2016-07-13 21:22                               ` Jeff Law
  2016-06-21 21:31                             ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost Bernhard Reutner-Fischer
                                               ` (2 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-06-21 15:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]


Hi,

This patch clears up the cost model for noce_try_cmove_arith. We lose
the "??? FIXME: Magic number 5" comment, and gain a more realistic cost
model for if-converting memory accesses.

This is the patch that has the chance to cause the largest behavioural
changes for most targets - the current heuristic does not take in to
consideration the cost of a conditional move - once we add that the cost
of the converted sequence often looks higher than we allowed before.

I think that missing the cost of the conditional move from these sequences
is not a good idea, and that the cost model should rely on the target giving
back good information. A target that finds tests failing after this patch
should consider either reducing the cost of a conditional move sequence, or
increasing TARGET_MAX_NOCE_IFCVT_SEQ_COST.

As this ups the cost of if-convert dramatically, I've used the new
parameters to ensure that the tests in the testsuite continue to pass on
all targets.

Bootstrapped in series on aarch64 and x86-64.

OK?

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_try_cmove_arith): Check costs after constructing
	new sequence.

gcc/testsuite/

2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-RFC-Patch-4-6-v2-Modify-cost-model-for-noce_cmove_ar.patch --]
[-- Type: text/x-patch;  name=0004-RFC-Patch-4-6-v2-Modify-cost-model-for-noce_cmove_ar.patch, Size: 3904 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f4ad037..78906d3 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2092,7 +2092,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx a = if_info->a;
   rtx b = if_info->b;
   rtx x = if_info->x;
-  rtx orig_a, orig_b;
+  rtx orig_a = a;
+  rtx orig_b = b;
   rtx_insn *insn_a, *insn_b;
   bool a_simple = if_info->then_simple;
   bool b_simple = if_info->else_simple;
@@ -2102,16 +2103,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   int is_mem = 0;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
+  bool speed_p = optimize_bb_for_speed_p (if_info->test_bb);
 
   /* A conditional move from two memory sources is equivalent to a
      conditional on their addresses followed by a load.  Don't do this
      early because it'll screw alias analysis.  Note that we've
      already checked for no side effects.  */
-  /* ??? FIXME: Magic number 5.  */
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
-      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-      && noce_estimate_conversion_profitable_p (if_info, 5))
+      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
     {
       machine_mode address_mode = get_address_mode (a);
 
@@ -2143,25 +2143,6 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!can_conditionally_move_p (x_mode))
     return FALSE;
 
-  unsigned int then_cost;
-  unsigned int else_cost;
-  if (insn_a)
-    then_cost = if_info->then_cost;
-  else
-    then_cost = 0;
-
-  if (insn_b)
-    else_cost = if_info->else_cost;
-  else
-    else_cost = 0;
-
-  /* We're going to execute one of the basic blocks anyway, so
-     bail out if the most expensive of the two blocks is unacceptable.  */
-
-  /* TODO: Revisit cost model.  */
-  if (MAX (then_cost, else_cost) > if_info->max_seq_cost)
-    return FALSE;
-
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
     {
@@ -2353,6 +2334,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!ifcvt_seq)
     return FALSE;
 
+  /* Check that our cost model will allow the transform.  */
+
+  if (seq_cost (ifcvt_seq, speed_p) > if_info->max_seq_cost)
+    /* Just return false, the sequence has already been finalized.  */
+    return FALSE;
+
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
   if_info->transform_name = "noce_try_cmove_arith";
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
index e0e1728..73e0dcc 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-2.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 
 typedef unsigned char uint8_t;
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
index 44233d4..b250bc1 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-3.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { { aarch64*-*-* i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 typedef long long s64;
 
diff --git a/gcc/testsuite/gcc.dg/pr68435.c b/gcc/testsuite/gcc.dg/pr68435.c
index 765699a..f86b7f8 100644
--- a/gcc/testsuite/gcc.dg/pr68435.c
+++ b/gcc/testsuite/gcc.dg/pr68435.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
-/* { dg-options "-fdump-rtl-ce1 -O2 -w" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 -w --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 typedef struct cpp_reader cpp_reader;
 enum cpp_ttype

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
                                               ` (4 preceding siblings ...)
  2016-06-21 15:53                             ` [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith James Greenhalgh
@ 2016-06-21 21:31                             ` Bernhard Reutner-Fischer
  2016-06-30 12:01                             ` Bernd Schmidt
  2016-07-13 21:16                             ` Jeff Law
  7 siblings, 0 replies; 60+ messages in thread
From: Bernhard Reutner-Fischer @ 2016-06-21 21:31 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

On June 21, 2016 5:50:26 PM GMT+02:00, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
>On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
>> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This patch introduces a new target hook, to be used like
>BRANCH_COST but
>> > with a guaranteed unit of measurement. We want this to break away
>from
>> > the current ambiguous uses of BRANCH_COST.
>> >
>> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One
>against
>> > instruction counts - where it is used as the limit on the number of
>new
>> > instructions we are permitted to generate. The other (after
>multiplying
>> > by COSTS_N_INSNS (1)) directly against RTX costs.
>> >
>> > Of these, a comparison against RTX costs is the more easily
>understood
>> > metric across the compiler, and the one I've pulled out to the new
>hook.
>> > To keep things consistent for targets which don't migrate, this new
>hook
>> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).
>> >
>> > OK?
>>
>> How does the caller compute "predictable"?  There are some archs
>where
>> an information on whether this is a forward or backward jump is more
>> useful I guess.  Also at least for !speed_p the distance of the
>branch is
>> important given not all targets support arbitrary branch offsets.
>
>Just through a call to predictable_edge_p. It isn't perfect. My worry
>with adding more details of the branch is that you end up with a
>nonsense
>target implementation that tries way too hard to be clever. But, I
>don't
>mind passing the edge through to the target hook, that way a target has
>it if they want it. In this patch revision, I pass the edge through.
>
>> I remember that at the last Cauldron we discussed to change things to
>> compare costs of sequences of instructions rather than giving targets
>no
>> context with just asking for single (sub-)insn rtx costs.
>
>I've made better use of seq_cost in this respin. Bernd was right,
>constructing dummy RTX just for costs, then discarding it, then
>constructing the actual RTX for matching doesn't make sense as a
>pipeline.
>Better just to construct the real sequence and use the cost of that.
>
>In this patch revision, I started by removing the idea that this costs
>a branch at all. It doesn't, the use of this hook is really a target
>trying to limit if-convert to not end up pulling too much on to the
>unconditional path. It seems better to expose that limit directly by
>explicitly asking for the maximum cost of an unconditional sequence we
>would create, and comparing against seq_cost of the new RTL. This saves
>a target trying to figure out what is meant by a cost of a branch.
>
>Having done that, I think I can see a clearer path to getting the
>default hook implementation in shape. I've introduced two new params,
>which give maximum costs for the generated sequence (one for a
>"predictable"
>branch, one for "unpredictable") in the speed_p cases. I'm not
>expecting it
>to be useful to give the user control in the case we are compiling for
>size - whether this is a size win or not is independent of whether the
>branch is predictable.
>
>For the default implementation, if the parameters are not set, I just
>multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
>COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still
>short
>of ideas on how best to form the default implementation.

How bad is it in e.g. CSiBE?

>we're
>still potentially going to introduce performance regressions for
>targets
>that don't provide an implementation of the new hook, or a default
>value
>for the new parameters. It does mean we can keep the testsuite clean by
>setting parameter values suitably high for all targets that have
>conditional move instructions.
>
>The new default causes some changes in generated conditional move
>sequences
>for x86_64. Whether these changes are for the better or not I can't
>say.
>
>This first patch introduces the two new parameters, and uses them in
>the
>default implementation of the target hook.

s/precitable/predictable/ ?

Present tense in documentation (s/will try to/tries to/).
s/should return/returns/

TARGET_MAX_NOCE_IFCVT_SEQ_COST (bool @var{speed_p}, edge @var{e}) talks about predictable_p but doesn't document e.


+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST, +	 "max-rtl-if-conversion-unpredictable-cost", +	 "Maximum permissible cost for the sequence that would be " +	 "generated by the RTL if-conversion pass for a branch which " +	 "is considered predictable.", +	 40, 0, 200)

unpredictable.

Present tense also in target.def.

+@code{predictable_p} is true

no predictable_p anymore but e missing in docs.

/Then multiply through by/s/through by/with/

thanks,
>
>Bootstrapped on x86_64 and aarch64 with no issues.
>
>OK?
>
>Thanks,
>James
>
>---
>2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
>	* target.def (max_noce_ifcvt_seq_cost): New.
>	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
>	* doc/tm.texi: Regenerate.
>	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
>	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
>	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
>	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
>	* doc/invoke.texi: Document new params.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
                                               ` (5 preceding siblings ...)
  2016-06-21 21:31                             ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost Bernhard Reutner-Fischer
@ 2016-06-30 12:01                             ` Bernd Schmidt
  2016-07-13 21:16                             ` Jeff Law
  7 siblings, 0 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-30 12:01 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1, law,
	ebotcazou, steven

On 06/21/2016 05:50 PM, James Greenhalgh wrote:
> For the default implementation, if the parameters are not set, I just
> multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
> COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
> of ideas on how best to form the default implementation.

Yeah, this does seem kind of arbitrary. It looks especialy odd 
considering that BRANCH_COST is likely to already vary between the 
size/speed cases. What's wrong with just multiplying through by CNI(1)?

I'm not sure we want params for this; targets should just eventually 
upgrade their cost models.

> The new default causes some changes in generated conditional move sequences
> for x86_64. Whether these changes are for the better or not I can't say.

How about arm/aarch64? I think some benchmark results might be good to have.

Bernhard already pointed out some issues with the patch; I'll omit these.

> +(max_noce_ifcvt_seq_cost,
> + "This hook should return a value in the same units as\n\
> +@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\
> +a sequence generated by the RTL if-conversion pass when conditional\n\
> +execution is not available.

There's still the issue that we're also replacing instructions when 
doing if-conversion. Let's say in this case,

  /* Convert "if (test) x = a; else x = b", for A and B constant.
    Also allow A = y + c1, B = y + c2, with a common y between A
    and B.  */

we're removing two assignments for the purposes of optimizing for size, 
and one assignment when considering optimization for speed. This needs 
to factor into the cost calculations somehow if we want to do it 
properly. I think we can leave the hook as-is, but maybe add 
documentation to the effect of "The caller should increase the limit by 
the cost of whatever instructions are removed in the transformation."

> +/* Default implementation of TARGET_RTX_BRANCH_COST.  */

Wrong name for the hook.

> +
> +unsigned int
> +default_max_noce_ifcvt_seq_cost (bool speed_p, edge e)
> +{
> +  bool predictable_p = predictable_edge_p (e);
> +  /* For size, some targets like to set a BRANCH_COST of zero to disable
> +     ifcvt, continue to allow that.  Then multiply through by
> +     COSTS_N_INSNS (1) so we're in a comparable base.  */
> +
> +  if (!speed_p)
> +    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);

Blank line before the comment would be more readable.

> +  enum compiler_param param = predictable_p
> +			      ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
> +			      : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST;

When splitting expressions across multiple lines, wrap in parens so that 
emacs formats them automatically.

Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost
  2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
                                               ` (6 preceding siblings ...)
  2016-06-30 12:01                             ` Bernd Schmidt
@ 2016-07-13 21:16                             ` Jeff Law
  2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
  7 siblings, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:16 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
>> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces a new target hook, to be used like BRANCH_COST but
>>> with a guaranteed unit of measurement. We want this to break away from
>>> the current ambiguous uses of BRANCH_COST.
>>>
>>> BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
>>> instruction counts - where it is used as the limit on the number of new
>>> instructions we are permitted to generate. The other (after multiplying
>>> by COSTS_N_INSNS (1)) directly against RTX costs.
>>>
>>> Of these, a comparison against RTX costs is the more easily understood
>>> metric across the compiler, and the one I've pulled out to the new hook.
>>> To keep things consistent for targets which don't migrate, this new hook
>>> has a default value of BRANCH_COST * COSTS_N_INSNS (1).
>>>
>>> OK?
>>
>> How does the caller compute "predictable"?  There are some archs where
>> an information on whether this is a forward or backward jump is more
>> useful I guess.  Also at least for !speed_p the distance of the branch is
>> important given not all targets support arbitrary branch offsets.
>
> Just through a call to predictable_edge_p. It isn't perfect. My worry
> with adding more details of the branch is that you end up with a nonsense
> target implementation that tries way too hard to be clever. But, I don't
> mind passing the edge through to the target hook, that way a target has
> it if they want it. In this patch revision, I pass the edge through.
There are so many things that can factor into this decision.  But I 
suspect we get most of the benefit from a small amount of work (ie, 
using the prediction information we've already generated).  If the 
target wants to override based on other factors, there's a mechanism for 
that, but I don't think it's likely to be heavily, if at all, used.


> In this patch revision, I started by removing the idea that this costs
> a branch at all. It doesn't, the use of this hook is really a target
> trying to limit if-convert to not end up pulling too much on to the
> unconditional path. It seems better to expose that limit directly by
> explicitly asking for the maximum cost of an unconditional sequence we
> would create, and comparing against seq_cost of the new RTL. This saves
> a target trying to figure out what is meant by a cost of a branch.
Seems sensible.  Essentially you're just asking a different but related 
(and more direct) question that side-steps the need to be able to 
compare branch cost with costs of other insns.

A target maintainer may have a sense of how branch cost relates to 
normal insns on their target and they may choose to define the hooks in 
those terms (though I would discourage that as it re-introduces the 
precise problem we're trying to get around).

>
> Having done that, I think I can see a clearer path to getting the
> default hook implementation in shape. I've introduced two new params,
> which give maximum costs for the generated sequence (one for a "predictable"
> branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
> to be useful to give the user control in the case we are compiling for
> size - whether this is a size win or not is independent of whether the
> branch is predictable.
Maybe not for MIPS :-)  But I think that should be tabled.


>
> For the default implementation, if the parameters are not set, I just
> multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
> COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
> of ideas on how best to form the default implementation. This means we're
> still potentially going to introduce performance regressions for targets
> that don't provide an implementation of the new hook, or a default value
> for the new parameters. It does mean we can keep the testsuite clean by
> setting parameter values suitably high for all targets that have
> conditional move instructions.
But I think that's an OK place to be -- I don't think it's sensible for 
you to have to figure that out for every target.  It's something the 
target maintainers ought to be able to guesstimate more accurately.  My 
only objection is conceptual based on mixing BRANCH_COST & 
COSTS_N_INSNS.  But there may simply not be another way to set the default.

>
> The new default causes some changes in generated conditional move sequences
> for x86_64. Whether these changes are for the better or not I can't say.
You might consider passing those along to Uros to get his opinion.

>
> This first patch introduces the two new parameters, and uses them in the
> default implementation of the target hook.
>
> Bootstrapped on x86_64 and aarch64 with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* target.def (max_noce_ifcvt_seq_cost): New.
> 	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
> 	* doc/tm.texi: Regenerate.
> 	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
> 	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
> 	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
> 	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
> 	* doc/invoke.texi: Document new params.
This is fine IMHO.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt
  2016-06-21 15:51                             ` [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
@ 2016-07-13 21:18                               ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:18 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> Hi,
>
> This patch pulls the comparisons between if_info->branch_cost and a magic
> number representing an instruction count to a common function. While I'm
> doing it, I've documented the instructions that the magic numbers relate
> to, and updated them where they were inconsistent.
>
> If our measure of the cost of a branch is now in rtx costs units, we can
> get to an estimate for the cost of an expression from the number of
> instructions by multiplying through by COSTS_N_INSNS (1).
>
> Alternatively, we could actually construct the cheap sequences and
> check the sequence. But in these cases we're expecting to if-convert on
> almost all targets, the transforms in this patch are almost universally
> a good idea, even for targets with a very powerful branch predictor,
> eliminating the branch eliminates a basic block boundary so might be
> helpful for scheduling, combine, and other RTL optimizers.
>
> Bootstrapped on x86-64 and aarch64 as part of the full sequence.
>
> OK?
>
> Thanks,
> James
>
> ---
>
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* ifcvt.c (noce_if_info): New field: max_seq_cost.
> 	(noce_estimate_conversion_profitable_p): New.
> 	(noce_try_store_flag_constants): Use it.
> 	(noce_try_addcc): Likewise.
> 	(noce_try_store_flag_mask): Likewise.
> 	(noce_try_cmove): Likewise.
> 	(noce_try_cmove_arith): Likewise.
> 	(noce_find_if_block): Record targetm.max_noce_ifcvt_seq_cost.
>
LGTM.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 3/6 v2] Remove if_info->branch_cost
  2016-06-21 15:51                             ` [RFC: Patch 3/6 v2] Remove if_info->branch_cost James Greenhalgh
@ 2016-07-13 21:19                               ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:19 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> Hi,
>
> This patch removes what is left of branch_cost uses, moving them to use
> the new hook and tagging each left over spot with a TODO to revisit them.
> All these uses are in rtx costs units, so we don't have more work to do at
> this point.
>
> Bootstrapped as part of the patch series on aarch64 and x86-64.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* ifcvt.c (noce_if_info): Remove branch_cost.
> 	(noce_try_store_flag_mask): Use max_seq_cost rather than
> 	branch_cost, tag as a TODO..
> 	(noce_try_cmove_arith): Likewise.
> 	(noce_convert_multiple_sets): Likewise.
> 	(bb_ok_for_noce_convert_multiple_sets): Likewise.
> 	(noce_find_if_block): Remove set of branch_cost.
>
LGTM

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith
  2016-06-21 15:53                             ` [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith James Greenhalgh
@ 2016-07-13 21:22                               ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:22 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> Hi,
>
> This patch clears up the cost model for noce_try_cmove_arith. We lose
> the "??? FIXME: Magic number 5" comment, and gain a more realistic cost
> model for if-converting memory accesses.
>
> This is the patch that has the chance to cause the largest behavioural
> changes for most targets - the current heuristic does not take in to
> consideration the cost of a conditional move - once we add that the cost
> of the converted sequence often looks higher than we allowed before.
>
> I think that missing the cost of the conditional move from these sequences
> is not a good idea, and that the cost model should rely on the target giving
> back good information. A target that finds tests failing after this patch
> should consider either reducing the cost of a conditional move sequence, or
> increasing TARGET_MAX_NOCE_IFCVT_SEQ_COST.
>
> As this ups the cost of if-convert dramatically, I've used the new
> parameters to ensure that the tests in the testsuite continue to pass on
> all targets.
>
> Bootstrapped in series on aarch64 and x86-64.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* ifcvt.c (noce_try_cmove_arith): Check costs after constructing
> 	new sequence.
>
> gcc/testsuite/
>
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
> 	* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
> 	* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.
>
LGTM as well.  And yes, the cost of the cmove needs to be accounted for. 
  Thanks for doing something sensible on those tests.  I would support 
similar tweaks if we find others after installing this series.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets
  2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
@ 2016-07-13 21:23                               ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:23 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> Hi,
>
> This patch is rewrites the cost model for bb_ok_for_noce_multiple_sets
> to use the max_seq_cost heuristic added in earlier patch revisions.
>
> As with the previous patch, I've used the new parameters to ensure that
> the testsuite is still testing the functionality rather than relying on
> the target setting the costs appropriately.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* ifcvt.c (noce_convert_multiple sets): Move cost model to here,
> 	check the sequence cost after constructing the converted sequence.
> 	(bb_of_for_noce_convert_multiple_sets): Move cost model.
>
> gcc/testsuite/
>
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
> 	* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.
>
OK.
jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask
  2016-06-21 15:53                             ` [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
@ 2016-07-13 21:24                               ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-07-13 21:24 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven

On 06/21/2016 09:50 AM, James Greenhalgh wrote:
>
> Hi,
>
> This transformation tries two cost models, one estimating the number
> of insns to use, one estimating the RTX cost of the transformed sequence.
> This is inconsistent with the other cost models used in ifcvt.c and
> unneccesary - eliminate the second cost model.
>
> Thanks,
> James
>
> ---
> 2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* ifcvt.c (noce_try_store_flag_mask): Delete redundant cost model.
OK.

At this point I think the series is fully ack'd.  Given folks have had 
nearly a month to object to the overall direction, I think you should go 
ahead and commit.  We can deal with any fallout at the target maintainer 
level.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost
  2016-07-13 21:16                             ` Jeff Law
@ 2016-07-20  9:52                               ` James Greenhalgh
  2016-07-20  9:52                                 ` [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources James Greenhalgh
                                                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-07-20  9:52 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven, rep.dot.nop, law

[-- Attachment #1: Type: text/plain, Size: 10391 bytes --]


Splicing replies to Bernd, Bernhard and Jeff.

Jeff, thanks for reviewing the patch set, I appreciate the ack, though I've
held off committing while I was working through Bernd's criticism of the
size cost model that this patch introduced and trying to get that right.
Sorry to cause extra reviewing work, but I have respun the patch set to
try to improve the consistency of how we're costing things, and to better
handle the size cases. I'm a bit happier with how it has turned out and I
think the approach is now a little easier to justify. Hopefully it will
still be acceptable for trunk.

There are essentially two families of cost models in this file. The true
before/after comparisons (noce_cmove_arith, noce_convert_multiple_sets),
and the "magic numbers" comparisons (noce_try_store_flag_constants,
noce_try_addcc, noce_try_store_flag_mask, noce_try_cmove). In the first
revisions of this patch set, I refactored the magic numbers comparisons,
but I didn't try to solve their "magic" as comparing two integers was
a suitably fast routine, and the comparison seemed accurate enough.

But the magic numbers are potentially inaccurate for a variable-length
instruction architecture, and given the number of times we actually manage to
spot these if-convert opportunities, the compile time overhead of moving
every cost model to a before/after comparison is probably not all that
high. Then we have everything going through one single function, making

Additionally, if we can rework most of the costs to actually calculate
the before/after costs, we can then drop the "size" case from this hook
entirely - we can just look at the size of the sequences directly rather
than asking the target to guess at an acceptable size growth.

This is good as it will completely remove magic numbers from ifcvt and
make everything dependent on a simple question to the target, when
compiling for speed; "What is the maximum cost of extra execution that
you'd like to see on the unconditional path?"

Unfortunately, disentangling this makes it harder to layout the patch set
quite as neatly as before. The changes follow the same structure, but I've
had to squash all the cost changes in to patch 2/2. Fortunately these now
look reasonably mechanical, and consequently the patch is not much more
difficult to review.

Patches 3/2 and 4/2 are not strictly needed as part of the cost model work,
but they do help the cost model by performing some simplifications early.
This reduces the chance of us rejecting if-conversion based on too many
simple moves that a future pass would have cleared up anyway. The csibe
numbers below rely on these two patches having been applied. Without them,
we get a couple of decisions wrong and some files from csibe increase
by < 3%.

On Tue, Jun 21, 2016 at 11:30:17PM +0200, Bernhard Reutner-Fischer wrote:
>
> >For the default implementation, if the parameters are not set, I just
> >multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
> >COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still
> >short
> >of ideas on how best to form the default implementation.
>
> How bad is it in e.g. CSiBE?

I'm not completely sure I've set it up right, but these are the >0.5% size
 differences for an x86_64 compiler I built last Friday using -Os:

Smaller:

Relative size	Test name

93.33	flex-2.5.31,tables_shared
94.37	teem-1.6.0-src,src/limn/qn
97.27	teem-1.6.0-src,src/nrrd/kernel
98.31	teem-1.6.0-src,src/ten/miscTen
98.60	teem-1.6.0-src,src/ell/genmat
98.69	teem-1.6.0-src,src/nrrd/measure
99.03	teem-1.6.0-src,src/ten/mod
99.04	libpng-1.2.5,pngwtran
99.08	jpeg-6b,jdcoefct
99.14	teem-1.6.0-src,src/dye/convertDye
99.15	teem-1.6.0-src,src/ten/glyph
99.16	teem-1.6.0-src,src/bane/gkmsPvg
99.20	teem-1.6.0-src,src/limn/splineEval
99.25	teem-1.6.0-src,src/nrrd/accessors
99.28	teem-1.6.0-src,src/hest/parseHest
99.33	teem-1.6.0-src,src/limn/transform
99.40	teem-1.6.0-src,src/alan/coreAlan
99.48	teem-1.6.0-src,src/air/miscAir

Larger:

Relative size	Test name

101.43	teem-1.6.0-src,src/ten/tendEvec
101.57	teem-1.6.0-src,src/ten/tendEval

However, the total size difference is indistinguishable from noise
(< 0.08%).

Running the same experiment with an AArch64 cross compiler, I get the
following changes:

Smaller:

Relative size	Test name

97.78	libpng-1.2.5,pngrio
98.02	libpng-1.2.5,pngwio
98.82	replaypc-0.4.0.preproc,ReplayPC
99.21	lwip-0.5.3.preproc,src/core/inet
99.48	jpeg-6b,wrppm

Larger:

Relative size	Test name

100.52	jpeg-6b,wrbmp
100.82	libpng-1.2.5,pngwtran
100.91	zlib-1.1.4,infcodes

And the overall size difference was tiny (< 0.01%).

There were no >0.5% changes for the ARM port (expected as it doesn't use
noce).

I looked in to each of the regressions, and generally they occur where
we relying on a future pass to clean up after us. This is especially true
for the large x86_64 regressions, which as far as I can see are a
consequence of x86_64's floating-point conditional move expanding out to
bitwise operations. Taken individually, these look huge, but when you have
multiple conditional moves feeding each other some of the bitwise
expressions simplify and you get a size saving. We can't model that in our
cost model, and in many ways we just got lucky previously.

> s/precitable/predictable/ ?

This, and all your other comments regarding spelling and grammar have been
fixed. Thanks.

On Thu, Jun 30, 2016 at 01:58:52PM +0200, Bernd Schmidt wrote:
> On 06/21/2016 05:50 PM, James Greenhalgh wrote:
> >For the default implementation, if the parameters are not set, I just
> >multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
> >COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
> >of ideas on how best to form the default implementation.
>
> Yeah, this does seem kind of arbitrary. It looks especialy odd
> considering that BRANCH_COST is likely to already vary between the
> size/speed cases. What's wrong with just multiplying through by
> CNI(1)?
>
> I'm not sure we want params for this; targets should just eventually
> upgrade their cost models.

We've used params in the past as a migration path, they are particularly
handy in this case as they allow us to override target settings when in the
testsuite.

Removing these would be easy, but I've left them in for now, as I like
the three tiered flexibility:

  * Target does nothing - hook uses BRANCH_COST,
  * Target only needs a simple model so sets params - hook uses params
  * Target thinks it can do something very smart - target implements hook

> >The new default causes some changes in generated conditional move sequences
> >for x86_64. Whether these changes are for the better or not I can't say.
>
> How about arm/aarch64? I think some benchmark results might be good to have.

The ARM port isn't very interesting as it has conditional execution and
therefore mostly uses other paths through this file.

On AArch64 I get an increase in the number of CSEL and FCSEL instructions
generated when compiling Spec2006. With the current definition of
BRANCH_COST for AArch64 we lose some important if-cvt opportunities, but
these can be restored by setting the BRANCH_COST for predictable branches
higher. With this done I see some small improvements in Spec2006, but
nothing meaningful and no regressions. This is probably exactly where I want
to be with this patch set - no change is a good thing.

> Bernhard already pointed out some issues with the patch; I'll omit these.

As mentioned above, I've fixed Bernhard's issues in this patch revision.

> >+(max_noce_ifcvt_seq_cost,
> >+ "This hook should return a value in the same units as\n\
> >+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\
> >+a sequence generated by the RTL if-conversion pass when conditional\n\
> >+execution is not available.
>
> There's still the issue that we're also replacing instructions when
> doing if-conversion. Let's say in this case,
>
>  /* Convert "if (test) x = a; else x = b", for A and B constant.
>    Also allow A = y + c1, B = y + c2, with a common y between A
>    and B.  */
>
> we're removing two assignments for the purposes of optimizing for
> size, and one assignment when considering optimization for speed.
> This needs to factor into the cost calculations somehow if we want
> to do it properly. I think we can leave the hook as-is, but maybe
> add documentation to the effect of "The caller should increase the
> limit by the cost of whatever instructions are removed in the
> transformation."

Yes, I see what you mean. Hopefully I've addressed this in this patchset
revision.

>
> >+/* Default implementation of TARGET_RTX_BRANCH_COST.  */
>
> Wrong name for the hook.

Thanks, fixed.

> >+unsigned int
> >+default_max_noce_ifcvt_seq_cost (bool speed_p, edge e)
> >+{
> >+  bool predictable_p = predictable_edge_p (e);
> >+  /* For size, some targets like to set a BRANCH_COST of zero to disable
> >+     ifcvt, continue to allow that.  Then multiply through by
> >+     COSTS_N_INSNS (1) so we're in a comparable base.  */
> >+
> >+  if (!speed_p)
> >+    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);
>
> Blank line before the comment would be more readable.

Fixed by virtue of removing this code.

> >+  enum compiler_param param = predictable_p
> >+			      ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
> >+			      : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST;
>
> When splitting expressions across multiple lines, wrap in parens so
> that emacs formats them automatically.

Fixed.

This patch, and all others in this series bootstrapped and tested on x86_64
and aarch64 with no issues.

OK?

Thanks,
James

---

2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (max_noce_ifcvt_seq_cost): New.
	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
	* doc/invoke.texi: Document new params.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Re-RFC-Patch-1-2-v3-New-target-hook-max_noce_ifcvt_s.patch --]
[-- Type: text/x-patch;  name=0001-Re-RFC-Patch-1-2-v3-New-target-hook-max_noce_ifcvt_s.patch, Size: 7182 bytes --]

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a4db38..94d2b48 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8865,6 +8865,17 @@ considered for if-conversion.  The default is 10, though the compiler will
 also use other heuristics to decide whether if-conversion is likely to be
 profitable.
 
+@item max-rtl-if-conversion-predictable-cost
+@item max-rtl-if-conversion-unpredictable-cost
+RTL if-conversion will try to remove conditional branches around a block
+and replace them with conditionally executed instructions.  These parameters
+give the maximum permissible cost for the sequence that would be generated
+by if-conversion depending on whether the branch is statically determined
+to be predictable or not.  The units for this parameter are the same as
+those for the GCC internal seq_cost metric.  The compiler will try to
+provide a reasonable default for this parameter using the BRANCH_COST
+target macro.
+
 @item max-crossjump-edges
 The maximum number of incoming edges to consider for cross-jumping.
 The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..28fba6b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6526,6 +6526,26 @@ should probably only be given to addresses with different numbers of
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} {unsigned int} TARGET_MAX_NOCE_IFCVT_SEQ_COST (edge @var{e})
+This hook returns a value in the same units as @code{TARGET_RTX_COSTS},
+giving the maximum acceptable cost for a sequence generated by the RTL
+if-conversion pass when conditional execution is not available.
+The RTL if-conversion pass attempts to convert conditional operations
+that would require a branch to a series of unconditional operations and
+@code{mov@var{mode}cc} insns.  This hook returns the maximum cost of the
+unconditional instructions and the @code{mov@var{mode}cc} insns.
+RTL if-conversion is cancelled if the cost of the converted sequence
+is greater than the value returned by this hook.
+
+@code{e} is the edge between the basic block containing the conditional
+branch to the basic block which would be executed if the condition
+were true.
+
+The default implementation of this hook uses the
+@code{max-rtl-if-conversion-[un]predictable} parameters if they are set,
+and uses a multiple of @code{BRANCH_COST} otherwise.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1e8423c..d2b7f41 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4762,6 +4762,8 @@ Define this macro if a non-short-circuit operation produced by
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_MAX_NOCE_IFCVT_SEQ_COST
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @node Scheduling
diff --git a/gcc/params.def b/gcc/params.def
index b86d592..166032e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1222,6 +1222,20 @@ DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
 	  "if-conversion.",
 	  10, 0, 99)
 
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST,
+	  "max-rtl-if-conversion-predictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch that "
+	  "is considered predictable.",
+	  20, 0, 200)
+
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST,
+	  "max-rtl-if-conversion-unpredictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch that "
+	  "is considered unpredictable.",
+	  40, 0, 200)
+
 DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
 	  "hsa-gen-debug-stores",
 	  "Level of hsa debug stores verbosity",
diff --git a/gcc/target.def b/gcc/target.def
index a4df363..b2139ce 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3572,6 +3572,30 @@ registers on machines with lots of registers.",
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Give a cost, in RTX Costs units, for an edge.  Like BRANCH_COST, but with
+   well defined units.  */
+DEFHOOK
+(max_noce_ifcvt_seq_cost,
+ "This hook returns a value in the same units as @code{TARGET_RTX_COSTS},\n\
+giving the maximum acceptable cost for a sequence generated by the RTL\n\
+if-conversion pass when conditional execution is not available.\n\
+The RTL if-conversion pass attempts to convert conditional operations\n\
+that would require a branch to a series of unconditional operations and\n\
+@code{mov@var{mode}cc} insns.  This hook returns the maximum cost of the\n\
+unconditional instructions and the @code{mov@var{mode}cc} insns.\n\
+RTL if-conversion is cancelled if the cost of the converted sequence\n\
+is greater than the value returned by this hook.\n\
+\n\
+@code{e} is the edge between the basic block containing the conditional\n\
+branch to the basic block which would be executed if the condition\n\
+were true.\n\
+\n\
+The default implementation of this hook uses the\n\
+@code{max-rtl-if-conversion-[un]predictable} parameters if they are set,\n\
+and uses a multiple of @code{BRANCH_COST} otherwise.",
+unsigned int, (edge e),
+default_max_noce_ifcvt_seq_cost)
+
 /* Permit speculative instructions in delay slots during delayed-branch 
    scheduling.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 3e089e7..08136eb 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -74,6 +74,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "predict.h"
+#include "params.h"
 
 
 bool
@@ -1977,4 +1979,24 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Default implementation of TARGET_MAX_NOCE_IFCVT_SEQ_COST.  */
+
+unsigned int
+default_max_noce_ifcvt_seq_cost (edge e)
+{
+  bool predictable_p = predictable_edge_p (e);
+
+  enum compiler_param param
+    = (predictable_p
+       ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
+       : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST);
+
+  /* If we have a parameter set, use that, otherwise take a guess using
+     BRANCH_COST.  */
+  if (global_options_set.x_param_values[param])
+    return PARAM_VALUE (param);
+  else
+    return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (3);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index d6581cf..b7b5ba3 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -255,4 +255,6 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern unsigned int default_max_noce_ifcvt_seq_cost (edge);
+
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources
  2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
@ 2016-07-20  9:52                                 ` James Greenhalgh
  2016-07-20  9:53                                 ` [RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt James Greenhalgh
                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-07-20  9:52 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven, rep.dot.nop, law

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]


Hi,

This patch adds a short-circuit to optabs.c for the case where both
source operands are identical (i.e. we would be assigning the same
value in both branches).

This can show up for the memory optimisation in noce_cmove_arith in ifcvt.c,
if both branches would load from the same address. This is an odd situation
to arrise. It showed up in my csibe runs, but I couldn't reproduce it in
a small test case.

Bootstrapped on x86_64-none-linux-gnu and aarch64-none-linux-gnu with no
issues.

OK?

Thanks,
James

---
2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>

	* optabs.c (emit_condiitonal_move): Short circuit for identical
	sources.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Patch-RFC-3-2-v3-Don-t-expand-a-conditional-move-bet.patch --]
[-- Type: text/x-patch;  name=0003-Patch-RFC-3-2-v3-Don-t-expand-a-conditional-move-bet.patch, Size: 644 bytes --]

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 51e10e2..87b4f97 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4214,6 +4214,17 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   enum insn_code icode;
   enum rtx_code reversed;
 
+  /* If the two source operands are identical, that's just a move.  */
+
+  if (rtx_equal_p (op2, op3))
+    {
+      if (!target)
+	target = gen_reg_rtx (mode);
+
+      emit_move_insn (target, op3);
+      return target;
+    }
+
   /* If one operand is constant, make it the second one.  Only do this
      if the other operand is not constant as well.  */
 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt.
  2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
  2016-07-20  9:52                                 ` [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources James Greenhalgh
@ 2016-07-20  9:53                                 ` James Greenhalgh
  2016-07-20  9:53                                 ` [Patch RFC 4/2 v3] Refactor noce_try_cmove_arith James Greenhalgh
  2016-07-20 11:41                                 ` [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Bernd Schmidt
  3 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-07-20  9:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven, rep.dot.nop, law

[-- Attachment #1: Type: text/plain, Size: 3813 bytes --]


Hi,

This patch modifies the way we calculate costs in ifcvt.c. Rather than
using a combination of magic numbers and approximations to descide if we
should perform the transformation before constructing the new RTL, we
instead construct the new RTL and use the cost of that to form our cost model.

We want slightly different behaviour when compiling for speed than what we
want when compiling for size.

For size, we just want to look at what the size of code would have been before
the transformation, and what we plan to generate now. We need a little bit of
guess work to try to figure what cost to assign to the compare (for which we
don't keep track of the full insn) and branch (which insn_rtx_cost won't
handle), but otherwise the cost model is easy to calculate.

For speed, we want to use the max_noce_ifcvt_seq_cost hook defined in
patch 1/4. Here we don't care about the original cost, our hook is defined
in terms of how expensive the instructions which are brought on to the
unconditional path are permitted to be. For speed then, we have a simple
numerical comparison between the new cost and the cost returned by the
hook.

To acheieve this, first we abstract all the cost logic in to
noce_conversion_profitable_p.  To get the size cost logic right, we need a few
modifications to the fields of noce_if_info. We're going to drop "then_cost"
and "else_cost", which will instead be covered by "original_cost" which is the
sum of these costs, plus an extra magic COSTS_N_INSNS (2) to cover a compare
and branch. We're going to drop branch_cost which was used by the old cost
model, and add max_seq_cost which is defined in the new model. Finally, we can
factor out the repeated calculation of speed_p, and just store it once in
noce_if_info. This last point fixes the inconsistency of which basic block
we check optimize_bb_for_speed_p against.

To build the sum for "original_cost" we need to update
bb_valid_for_noce_process_p such that it adds to the cost pointer it takes
rather than overwriting it.

Having done that, we need to update all the cost models in the file to
check for profitability just after we check that if-conversion has
succeeded.

Finally, we use the params added in 1/4 to allow us to do something
sensible with the testcases that look for if-conversion. With these tests
we only care that the mechanics would work if the cost model were permissive
enough, not that a traget has actually set the cost model high enough, so
we just set the parameters to their maximum values.

Bootstrapped on x86-64 and aarch64.

OK?

Thanks,
James

---

gcc/

2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_if_info): New fields: speed_p, original_cost,
	max_seq_cost.  Removed fields: then_cost, else_cost, branch_cost.
	(noce_conversion_profitable_p): New.
	(noce_try_store_flag_constants): Use it.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_cmove_arith): Likewise.
	(bb_valid_for_noce_process_p): Add to the cost parameter rather than
	overwriting it.
	(noce_convert_multiple_sets): Move cost model to here, from...
	(bb_ok_for_noce_convert_multiple_sets) ...here.
	(noce_process_if_block): Update calls for above changes.
	(noce_find_if_block): Record new noce_if_info parameters.

gcc/testsuite/

2016-07-18  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
	* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-RFC-Patch-2-2-v3-Introduce-a-new-cost-model-for-ifcv.patch --]
[-- Type: text/x-patch;  name=0002-RFC-Patch-2-2-v3-Introduce-a-new-cost-model-for-ifcv.patch, Size: 17869 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a92ab6d..4e3d8f3 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -807,12 +807,17 @@ struct noce_if_info
   bool then_simple;
   bool else_simple;
 
-  /* The total rtx cost of the instructions in then_bb and else_bb.  */
-  unsigned int then_cost;
-  unsigned int else_cost;
+  /* True if we're optimisizing the control block for speed, false if
+     we're optimizing for size.  */
+  bool speed_p;
 
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
+  /* The combined cost of COND, JUMP and the costs for THEN_BB and
+     ELSE_BB.  */
+  unsigned int original_cost;
+
+  /* Maximum permissible cost for the unconditional sequence we should
+     generate to replace this branch.  */
+  unsigned int max_seq_cost;
 
   /* The name of the noce transform that succeeded in if-converting
      this structure.  Used for debugging.  */
@@ -835,6 +840,27 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* Return TRUE if SEQ is a good candidate as a replacement for the
+   if-convertible sequence described in IF_INFO.  */
+
+inline static bool
+noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  bool speed_p = if_info->speed_p;
+
+  /* Cost up the new sequence.  */
+  unsigned int cost = seq_cost (seq, speed_p);
+
+  /* When compiling for size, we can make a reasonably accurately guess
+     at the size growth.  */
+  if (!speed_p)
+    {
+      return cost <= if_info->original_cost;
+    }
+  else
+    return cost <= if_info->max_seq_cost;
+}
+
 /* Helper function for noce_try_store_flag*.  */
 
 static rtx
@@ -1319,8 +1345,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
          registers where we handle overlap below.  */
       && (REG_P (XEXP (a, 0))
 	  || (noce_operand_ok (XEXP (a, 0))
-	      && ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0))))
-      && if_info->branch_cost >= 2)
+	      && ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0)))))
     {
       common = XEXP (a, 0);
       a = XEXP (a, 1);
@@ -1393,22 +1418,24 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	    gcc_unreachable ();
 	}
+      /* Is this (cond) ? 2^n : 0?  */
       else if (ifalse == 0 && exact_log2 (itrue) >= 0
-	       && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
+	       && STORE_FLAG_VALUE == 1)
 	normalize = 1;
+      /* Is this (cond) ? 0 : 2^n?  */
       else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	       && STORE_FLAG_VALUE == 1)
 	{
 	  normalize = 1;
 	  reversep = true;
 	}
+      /* Is this (cond) ? -1 : x?  */
       else if (itrue == -1
-	       && (STORE_FLAG_VALUE == -1
-		   || if_info->branch_cost >= 2))
+	       && STORE_FLAG_VALUE == -1)
 	normalize = -1;
+      /* Is this (cond) ? x : -1?  */
       else if (ifalse == -1 && can_reverse
-	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
+	       && STORE_FLAG_VALUE == -1)
 	{
 	  normalize = -1;
 	  reversep = true;
@@ -1497,7 +1524,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	noce_emit_move_insn (if_info->x, target);
 
       seq = end_ifcvt_sequence (if_info);
-      if (!seq)
+      if (!seq || !noce_conversion_profitable_p (seq, if_info))
 	return FALSE;
 
       emit_insn_before_setloc (seq, if_info->jump,
@@ -1551,7 +1578,7 @@ noce_try_addcc (struct noce_if_info *if_info)
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq)
+	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -1564,10 +1591,10 @@ noce_try_addcc (struct noce_if_info *if_info)
 	}
 
       /* If that fails, construct conditional increment or decrement using
-	 setcc.  */
-      if (if_info->branch_cost >= 2
-	  && (XEXP (if_info->a, 1) == const1_rtx
-	      || XEXP (if_info->a, 1) == constm1_rtx))
+	 setcc.  We're changing a branch and an increment to a comparison and
+	 an ADD/SUB.  */
+      if (XEXP (if_info->a, 1) == const1_rtx
+	  || XEXP (if_info->a, 1) == constm1_rtx)
         {
 	  start_sequence ();
 	  if (STORE_FLAG_VALUE == INTVAL (XEXP (if_info->a, 1)))
@@ -1593,7 +1620,7 @@ noce_try_addcc (struct noce_if_info *if_info)
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq)
+	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -1621,15 +1648,14 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
     return FALSE;
 
   reversep = 0;
-  if ((if_info->branch_cost >= 2
-       || STORE_FLAG_VALUE == -1)
-      && ((if_info->a == const0_rtx
-	   && rtx_equal_p (if_info->b, if_info->x))
-	  || ((reversep = (reversed_comparison_code (if_info->cond,
-						     if_info->jump)
-			   != UNKNOWN))
-	      && if_info->b == const0_rtx
-	      && rtx_equal_p (if_info->a, if_info->x))))
+
+  if ((if_info->a == const0_rtx
+       && rtx_equal_p (if_info->b, if_info->x))
+      || ((reversep = (reversed_comparison_code (if_info->cond,
+						 if_info->jump)
+		       != UNKNOWN))
+	  && if_info->b == const0_rtx
+	  && rtx_equal_p (if_info->a, if_info->x)))
     {
       start_sequence ();
       target = noce_emit_store_flag (if_info,
@@ -1643,22 +1669,11 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
       if (target)
 	{
-	  int old_cost, new_cost, insn_cost;
-	  int speed_p;
-
 	  if (target != if_info->x)
 	    noce_emit_move_insn (if_info->x, target);
 
 	  seq = end_ifcvt_sequence (if_info);
-	  if (!seq)
-	    return FALSE;
-
-	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
-	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  old_cost = COSTS_N_INSNS (if_info->branch_cost) + insn_cost;
-	  new_cost = seq_cost (seq, speed_p);
-
-	  if (new_cost > old_cost)
+	  if (!seq || !noce_conversion_profitable_p (seq, if_info))
 	    return FALSE;
 
 	  emit_insn_before_setloc (seq, if_info->jump,
@@ -1827,9 +1842,7 @@ noce_try_cmove (struct noce_if_info *if_info)
 	 we don't know about, so give them a chance before trying this
 	 approach.  */
       else if (!targetm.have_conditional_execution ()
-		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b)
-		&& ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
-		    || if_info->branch_cost >= 3))
+		&& CONST_INT_P (if_info->a) && CONST_INT_P (if_info->b))
 	{
 	  machine_mode mode = GET_MODE (if_info->x);
 	  HOST_WIDE_INT ifalse = INTVAL (if_info->a);
@@ -1865,7 +1878,7 @@ noce_try_cmove (struct noce_if_info *if_info)
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq)
+	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -2078,11 +2091,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
      conditional on their addresses followed by a load.  Don't do this
      early because it'll screw alias analysis.  Note that we've
      already checked for no side effects.  */
-  /* ??? FIXME: Magic number 5.  */
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
-      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-      && if_info->branch_cost >= 5)
+      && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
     {
       machine_mode address_mode = get_address_mode (a);
 
@@ -2114,23 +2125,6 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!can_conditionally_move_p (x_mode))
     return FALSE;
 
-  unsigned int then_cost;
-  unsigned int else_cost;
-  if (insn_a)
-    then_cost = if_info->then_cost;
-  else
-    then_cost = 0;
-
-  if (insn_b)
-    else_cost = if_info->else_cost;
-  else
-    else_cost = 0;
-
-  /* We're going to execute one of the basic blocks anyway, so
-     bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
-    return FALSE;
-
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
     {
@@ -2319,7 +2313,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
     noce_emit_move_insn (x, target);
 
   ifcvt_seq = end_ifcvt_sequence (if_info);
-  if (!ifcvt_seq)
+  if (!ifcvt_seq || !noce_conversion_profitable_p (ifcvt_seq, if_info))
     return FALSE;
 
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
@@ -2805,7 +2799,7 @@ noce_try_sign_mask (struct noce_if_info *if_info)
      && (if_info->insn_b == NULL_RTX
 	 || BLOCK_FOR_INSN (if_info->insn_b) == if_info->test_bb));
   if (!(t_unconditional
-	|| (set_src_cost (t, mode, optimize_bb_for_speed_p (if_info->test_bb))
+	|| (set_src_cost (t, mode, if_info->speed_p)
 	    < COSTS_N_INSNS (2))))
     return FALSE;
 
@@ -3034,8 +3028,8 @@ contains_mem_rtx_p (rtx x)
    x := a and all previous computations
    in TEST_BB don't produce any values that are live after TEST_BB.
    In other words, all the insns in TEST_BB are there only
-   to compute a value for x.  Put the rtx cost of the insns
-   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   to compute a value for x.  Add the rtx cost of the insns
+   in TEST_BB to COST.  Record whether TEST_BB is a single simple
    set instruction in SIMPLE_P.  */
 
 static bool
@@ -3067,7 +3061,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   if (first_insn == last_insn)
     {
       *simple_p = noce_operand_ok (SET_DEST (first_set));
-      *cost = insn_rtx_cost (first_set, speed_p);
+      *cost += insn_rtx_cost (first_set, speed_p);
       return *simple_p;
     }
 
@@ -3114,7 +3108,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
     goto free_bitmap_and_fail;
 
   BITMAP_FREE (test_bb_temps);
-  *cost = potential_cost;
+  *cost += potential_cost;
   *simple_p = false;
   return true;
 
@@ -3290,9 +3284,15 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   for (int i = 0; i < count; i++)
     noce_emit_move_insn (targets[i], temporaries[i]);
 
-  /* Actually emit the sequence.  */
+  /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
+  if (!noce_conversion_profitable_p (seq, if_info))
+    {
+      end_sequence ();
+      return FALSE;
+    }
+
   for (insn = seq; insn; insn = NEXT_INSN (insn))
     set_used_flags (insn);
 
@@ -3342,22 +3342,16 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
 /* Return true iff basic block TEST_BB is comprised of only
    (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  FORNOW: Use II to find the expected cost of
-   the branch into/over TEST_BB.
-
-   TODO: This creates an implicit "magic number" for branch_cost.
-   II->branch_cost now guides the maximum number of set instructions in
-   a basic block which is considered profitable to completely
-   if-convert.  */
+   of conditional moves.  Also check that we have more than one set
+   (other routines can handle a single set better than we would), and
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
-				      struct noce_if_info *ii)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  unsigned limit = MIN (ii->branch_cost, param);
 
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -3393,14 +3387,15 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
       if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
-      /* FORNOW: Our cost model is a count of the number of instructions we
-	 would if-convert.  This is suboptimal, and should be improved as part
-	 of a wider rework of branch_cost.  */
-      if (++count > limit)
-	return false;
+      count++;
     }
 
-  return count > 1;
+  /* If we would only put out one conditional move, the other strategies
+     this pass tries are better optimized and will be more appropriate.
+     Some targets want to strictly limit the number of conditional moves
+     that are emitted, they set this through PARAM, we need to respect
+     that.  */
+  return count > 1 && count <= param;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
@@ -3436,7 +3431,7 @@ noce_process_if_block (struct noce_if_info *if_info)
   if (!else_bb
       && HAVE_conditional_move
       && !HAVE_cc0
-      && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+      && bb_ok_for_noce_convert_multiple_sets (then_bb))
     {
       if (noce_convert_multiple_sets (if_info))
 	{
@@ -3447,12 +3442,12 @@ noce_process_if_block (struct noce_if_info *if_info)
 	}
     }
 
-  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->original_cost,
 				    &if_info->then_simple))
     return false;
 
   if (else_bb
-      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->original_cost,
 				      &if_info->else_simple))
     return false;
 
@@ -3983,6 +3978,7 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   rtx cond;
   rtx_insn *cond_earliest;
   struct noce_if_info if_info;
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
 
   /* We only ever should get here before reload.  */
   gcc_assert (!reload_completed);
@@ -4074,8 +4070,16 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.cond_earliest = cond_earliest;
   if_info.jump = jump;
   if_info.then_else_reversed = then_else_reversed;
-  if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
-				     predictable_edge_p (then_edge));
+  if_info.speed_p = speed_p;
+  if_info.max_seq_cost
+    = targetm.max_noce_ifcvt_seq_cost (then_edge);
+  /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
+     that they are valid to transform.  We can't easily get back to the insn
+     for COND (and it may not exist if we had to canonicalize to get COND),
+     and jump_insns are always given a cost of 1 by seq_cost, so treat
+     both instructions as having cost COSTS_N_INSNS (1).  */
+  if_info.original_cost = COSTS_N_INSNS (2);
+
 
   /* Do the real work.  */
 
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
index e0e1728..73e0dcc 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-2.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 
 typedef unsigned char uint8_t;
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
index 44233d4..b250bc1 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-3.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { { aarch64*-*-* i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 typedef long long s64;
 
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index 319b583..0d1671c 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -1,4 +1,4 @@
-/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* visium-*-*" } }  */
 
diff --git a/gcc/testsuite/gcc.dg/ifcvt-5.c b/gcc/testsuite/gcc.dg/ifcvt-5.c
index 818099a..d2a9476 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-5.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-5.c
@@ -1,7 +1,8 @@
 /* Check that multi-insn if-conversion is not done if the override
-   parameter would not allow it.  */
+   parameter would not allow it.  Set the cost parameter very high
+   to ensure that the limiting factor is actually the count parameter.  */
 
-/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=1" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=1 --param max-rtl-if-conversion-unpredictable-cost=200" } */
 
 typedef int word __attribute__((mode(word)));
 
diff --git a/gcc/testsuite/gcc.dg/pr68435.c b/gcc/testsuite/gcc.dg/pr68435.c
index 765699a..f86b7f8 100644
--- a/gcc/testsuite/gcc.dg/pr68435.c
+++ b/gcc/testsuite/gcc.dg/pr68435.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
-/* { dg-options "-fdump-rtl-ce1 -O2 -w" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 -w --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 typedef struct cpp_reader cpp_reader;
 enum cpp_ttype

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Patch RFC 4/2 v3] Refactor noce_try_cmove_arith
  2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
  2016-07-20  9:52                                 ` [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources James Greenhalgh
  2016-07-20  9:53                                 ` [RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt James Greenhalgh
@ 2016-07-20  9:53                                 ` James Greenhalgh
  2016-07-20 11:41                                 ` [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Bernd Schmidt
  3 siblings, 0 replies; 60+ messages in thread
From: James Greenhalgh @ 2016-07-20  9:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven, rep.dot.nop, law

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]


Hi,

This patch pulls some duplicate logic out from noce_try_cmove_arith.
We do this in order to make reasoning about the code easier.

Some of the natural simplification that comes from this process improves
the generation of temporaries in the code, which is good as it reduces
the size and speed costs of the generated sequence.  We want to do this
as the more useless register moves we can remove early, the more accurate
our profitability analysis will be.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (noce_arith_helper): New.
	(noce_try_cmove_arith): Refactor.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-Patch-RFC-4-2-v3-Refactor-noce_try_cmove_arith.patch --]
[-- Type: text/x-patch;  name=0004-Patch-RFC-4-2-v3-Refactor-noce_try_cmove_arith.patch, Size: 9682 bytes --]

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4e3d8f3..f2e7ac6 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2068,23 +2068,127 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
   return true;
 }
 
-/* Try more complex cases involving conditional_move.  */
+/* Helper for noce_try_cmove_arith.  This gets called twice, once for the
+   then branch, once for the else branch X_BB gives the basic block for the
+   branch we are currently interested in.  X is the destination for this
+   branch.  If X is complex, we need to move it in to a register first, by
+   possibly copying from INSN_X such that we preserve clobbers etc from the
+   original instruction.  EMIT_X is the target register for this branch
+   result.  ORIG_OTHER_DEST gives the original destination from the
+   opposite branch.  OTHER_BB_EXISTS_P is true if there was an opposite
+   branch for us to consider.  */
+
+bool
+noce_arith_helper (rtx *x, rtx *emit_x, rtx_insn *insn_x,
+		   basic_block x_bb, rtx orig_other_dest,
+		   bool other_bb_exists_p)
+{
+  rtx set_tmp = NULL_RTX;
+
+  machine_mode x_mode = GET_MODE (*x);
+
+  /* Two cases to catch here.  Either X is not yet a general operand, in
+     which case we need to move it to an appropriate register.  Or, the other
+     block is empty, in which case ORIG_OTHER_DEST came from the test block.
+     The non-empty complex block that we will emit might clobber the register
+     used by ORIG_OTHER_DEST, so move it to a pseudo first.  */
+  if (! general_operand (*x, x_mode)
+      || !other_bb_exists_p)
+    {
+      rtx reg = gen_reg_rtx (x_mode);
+      if (insn_x)
+	{
+	  rtx_insn *copy_of_x = as_a <rtx_insn *> (copy_rtx (insn_x));
+	  rtx set = single_set (copy_of_x);
+	  SET_DEST (set) = reg;
+	  set_tmp = PATTERN (copy_of_x);
+	}
+      else
+	{
+	  set_tmp = gen_rtx_SET (reg, *x);
+	}
+      *x = reg;
+    }
+
+  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
+  bool modified_in_x = (set_tmp != NULL_RTX)
+			&& modified_in_p (orig_other_dest, set_tmp);
+
+  /* If we have a X_BB to check, go through it and make sure the insns we'd
+     duplicate don't write ORIG_OTHER_DEST.  */
+  if (x_bb)
+    {
+      rtx_insn *tmp_insn = NULL;
+      FOR_BB_INSNS (x_bb, tmp_insn)
+	/* Don't check inside the destination insn, we will have changed
+	   it to use a register that doesn't conflict.  */
+	if (!(insn_x && tmp_insn == insn_x)
+	    && modified_in_p (orig_other_dest, tmp_insn))
+	  {
+	    modified_in_x = true;
+	    break;
+	  }
+    }
+
+  /* Store the SET back in EMIT_X.  */
+  *emit_x = set_tmp;
+  return modified_in_x;
+}
+
+/* Try more complex cases involving conditional_move.
+
+   We have:
+
+      if (test)
+	x = a + b;
+      else
+	x = c - d;
+
+    Make it:
+
+      t1 = a + b;
+      t2 = c - d;
+      x = (test) ? t1 : t2;
+
+   Alternatively, we have:
+
+      if (test)
+	x = *y;
+      else
+	x = *z;
+
+   Make it:
+
+     p1 = (test) ? y : z;
+     x = *p1;
+*/
 
 static int
 noce_try_cmove_arith (struct noce_if_info *if_info)
 {
+  /* SET_SRC from the two branches.  */
   rtx a = if_info->a;
   rtx b = if_info->b;
+  /* SET_DEST of both branches.  */
   rtx x = if_info->x;
-  rtx orig_a, orig_b;
-  rtx_insn *insn_a, *insn_b;
+  /* Full insns from the two branches.  */
+  rtx_insn *insn_a = if_info->insn_a;
+  rtx_insn *insn_b = if_info->insn_b;
+  /* Whether the branches are single set.  */
   bool a_simple = if_info->then_simple;
   bool b_simple = if_info->else_simple;
+  /* Our two basic blocks.  */
   basic_block then_bb = if_info->then_bb;
   basic_block else_bb = if_info->else_bb;
+  /* Whether we're handling the transformation of a load.  */
+  bool is_mem = false;
+  /* Copies of A and B before we modified them.  */
+  rtx orig_a = a, orig_b = b;
+  /* A new target to be used by the conditional select.  */
   rtx target;
-  int is_mem = 0;
-  enum rtx_code code;
+  /* The RTX code for the condition in the test block.  */
+  enum rtx_code code = GET_CODE (if_info->cond);
+  /* Our generated sequence.  */
   rtx_insn *ifcvt_seq;
 
   /* A conditional move from two memory sources is equivalent to a
@@ -2094,33 +2198,19 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (cse_not_expected
       && MEM_P (a) && MEM_P (b)
       && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
-    {
-      machine_mode address_mode = get_address_mode (a);
-
-      a = XEXP (a, 0);
-      b = XEXP (b, 0);
-      x = gen_reg_rtx (address_mode);
-      is_mem = 1;
-    }
-
+    is_mem = true;
   /* ??? We could handle this if we knew that a load from A or B could
      not trap or fault.  This is also true if we've already loaded
      from the address along the path from ENTRY.  */
   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
     return FALSE;
 
-  /* if (test) x = a + b; else x = c - d;
-     => y = a + b;
-        x = c - d;
-	if (test)
-	  x = y;
-  */
 
   code = GET_CODE (if_info->cond);
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  machine_mode x_mode = GET_MODE (x);
+  machine_mode x_mode = is_mem ? get_address_mode (a) : GET_MODE (x);
 
   if (!can_conditionally_move_p (x_mode))
     return FALSE;
@@ -2151,117 +2241,27 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   start_sequence ();
 
-  /* If one of the blocks is empty then the corresponding B or A value
-     came from the test block.  The non-empty complex block that we will
-     emit might clobber the register used by B or A, so move it to a pseudo
-     first.  */
-
-  rtx tmp_a = NULL_RTX;
-  rtx tmp_b = NULL_RTX;
-
-  if (b_simple || !else_bb)
-    tmp_b = gen_reg_rtx (x_mode);
-
-  if (a_simple || !then_bb)
-    tmp_a = gen_reg_rtx (x_mode);
-
   orig_a = a;
   orig_b = b;
 
-  rtx emit_a = NULL_RTX;
-  rtx emit_b = NULL_RTX;
-  rtx_insn *tmp_insn = NULL;
-  bool modified_in_a = false;
-  bool  modified_in_b = false;
-  /* If either operand is complex, load it into a register first.
-     The best way to do this is to copy the original insn.  In this
-     way we preserve any clobbers etc that the insn may have had.
-     This is of course not possible in the IS_MEM case.  */
-
-  if (! general_operand (a, GET_MODE (a)) || tmp_a)
-    {
-
-      if (is_mem)
-	{
-	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  emit_a = gen_rtx_SET (reg, a);
-	}
-      else
-	{
-	  if (insn_a)
-	    {
-	      a = tmp_a ? tmp_a : gen_reg_rtx (GET_MODE (a));
-
-	      rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
-	      rtx set = single_set (copy_of_a);
-	      SET_DEST (set) = a;
-
-	      emit_a = PATTERN (copy_of_a);
-	    }
-	  else
-	    {
-	      rtx tmp_reg = tmp_a ? tmp_a : gen_reg_rtx (GET_MODE (a));
-	      emit_a = gen_rtx_SET (tmp_reg, a);
-	      a = tmp_reg;
-	    }
-	}
-    }
-
-  if (! general_operand (b, GET_MODE (b)) || tmp_b)
+  /* Get the addresses if this is a MEM.  */
+  if (is_mem)
     {
-      if (is_mem)
-	{
-          rtx reg = gen_reg_rtx (GET_MODE (b));
-	  emit_b = gen_rtx_SET (reg, b);
-	}
-      else
-	{
-	  if (insn_b)
-	    {
-	      b = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b));
-	      rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b));
-	      rtx set = single_set (copy_of_b);
-
-	      SET_DEST (set) = b;
-	      emit_b = PATTERN (copy_of_b);
-	    }
-	  else
-	    {
-	      rtx tmp_reg = tmp_b ? tmp_b : gen_reg_rtx (GET_MODE (b));
-	      emit_b = gen_rtx_SET (tmp_reg, b);
-	      b = tmp_reg;
-	  }
-	}
+      a = XEXP (a, 0);
+      b = XEXP (b, 0);
     }
 
-  modified_in_a = emit_a != NULL_RTX && modified_in_p (orig_b, emit_a);
-  if (tmp_b && then_bb)
-    {
-      FOR_BB_INSNS (then_bb, tmp_insn)
-	/* Don't check inside insn_a.  We will have changed it to emit_a
-	   with a destination that doesn't conflict.  */
-	if (!(insn_a && tmp_insn == insn_a)
-	    && modified_in_p (orig_b, tmp_insn))
-	  {
-	    modified_in_a = true;
-	    break;
-	  }
-
-    }
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
 
-  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
-  if (tmp_a && else_bb)
-    {
-      FOR_BB_INSNS (else_bb, tmp_insn)
-      /* Don't check inside insn_b.  We will have changed it to emit_b
-	 with a destination that doesn't conflict.  */
-      if (!(insn_b && tmp_insn == insn_b)
-	  && modified_in_p (orig_a, tmp_insn))
-	{
-	  modified_in_b = true;
-	  break;
-	}
-    }
+  /* Sort out temporary registers and figure out whether either branch
+     would clobber the other.  */
+  bool modified_in_a
+    = noce_arith_helper (&a, &emit_a, insn_a, then_bb,
+			 orig_b, (else_bb != NULL));
+  bool modified_in_b
+    = noce_arith_helper (&b, &emit_b, insn_b, else_bb,
+			 orig_a, (then_bb != NULL));
 
   /* If insn to set up A clobbers any registers B depends on, try to
      swap insn that sets up A with the one that sets up B.  If even
@@ -2285,6 +2285,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   else
     goto end_seq_and_fail;
 
+  /* Emit the conditional move.  If our operands were MEMs, we
+     need to generate a temporary to hold the result of conditionally
+     selecting between the two possible addresses.  We'll fix this up
+     immediately afterwards.  */
+  if (is_mem)
+    x = gen_reg_rtx (x_mode);
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost
  2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
                                                   ` (2 preceding siblings ...)
  2016-07-20  9:53                                 ` [Patch RFC 4/2 v3] Refactor noce_try_cmove_arith James Greenhalgh
@ 2016-07-20 11:41                                 ` Bernd Schmidt
  2016-07-20 16:40                                   ` James Greenhalgh
  3 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-20 11:41 UTC (permalink / raw)
  To: James Greenhalgh, gcc-patches
  Cc: nd, richard.guenther, Ramana.Radhakrishnan, bernds_cb1,
	ebotcazou, steven, rep.dot.nop, law

On 07/20/2016 11:51 AM, James Greenhalgh wrote:

>
> 2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* target.def (max_noce_ifcvt_seq_cost): New.
> 	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
> 	* doc/tm.texi: Regenerate.
> 	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
> 	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
> 	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
> 	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
> 	* doc/invoke.texi: Document new params.

I think this is starting to look like a clear improvement, so I'll ack 
patches 1-3 with a few minor comments, and with the expectation that 
you'll address performance regressions on other targets if they occur. 
Number 4 I still need to figure out.

Minor details:

> +  if (!speed_p)
> +    {
> +      return cost <= if_info->original_cost;
> +    }

No braces around single statements in ifs. There's an instance of this 
in patch 4 as well.

> +  if (global_options_set.x_param_values[param])
> +    return PARAM_VALUE (param);

How about wrapping the param value into COSTS_N_INSNS, to make the value 
of the param less dependent on compiler internals?

In patch 4:

> +  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
> +  bool modified_in_x = (set_tmp != NULL_RTX)
> +			&& modified_in_p (orig_other_dest, set_tmp);

Watch line wrapping. No parens around the first subexpression (there are 
other examples of unnecessary ones in invocations of noce_arith_helper), 
but around the full one.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost
  2016-07-20 11:41                                 ` [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Bernd Schmidt
@ 2016-07-20 16:40                                   ` James Greenhalgh
  2016-07-21 11:32                                     ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: James Greenhalgh @ 2016-07-20 16:40 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: gcc-patches, nd, richard.guenther, Ramana.Radhakrishnan,
	bernds_cb1, ebotcazou, steven, rep.dot.nop, law

On Wed, Jul 20, 2016 at 01:41:39PM +0200, Bernd Schmidt wrote:
> On 07/20/2016 11:51 AM, James Greenhalgh wrote:
> 
> >
> >2016-07-20  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >	* target.def (max_noce_ifcvt_seq_cost): New.
> >	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
> >	* doc/tm.texi: Regenerate.
> >	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
> >	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
> >	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
> >	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
> >	* doc/invoke.texi: Document new params.
> 
> I think this is starting to look like a clear improvement, so I'll
> ack patches 1-3 with a few minor comments, and with the expectation
> that you'll address performance regressions on other targets if they
> occur.

I'll gladly take a look if I've caused anyone any trouble.

> Number 4 I still need to figure out.
> 
> Minor details:
> 
> >+  if (!speed_p)
> >+    {
> >+      return cost <= if_info->original_cost;
> >+    }
> 
> No braces around single statements in ifs. There's an instance of
> this in patch 4 as well.
> 
> >+  if (global_options_set.x_param_values[param])
> >+    return PARAM_VALUE (param);
> 
> How about wrapping the param value into COSTS_N_INSNS, to make the
> value of the param less dependent on compiler internals?

I did consider this, but found it hard to word for the user documentation.
I found it easier to understand when it was in the same units as
rtx_cost, particularly as the AArch64 backend prints RTX costs to most
dump files (including ce1, ce2, ce3) so comparing directly was easy for me
to grok. I think going in either direction has the potential to confuse
users, the cost metrics of the RTL passes are very tightly coupled to
compiler internals.

I don't have a strong feeling either way, just a slight preference to keep
everything in the same units as rtx_cost where I can.

Let me know if you'd rather I follow this comment. There's some precedent
to wrapping it in COSTS_N_INSNS in GCSE_UNRESTRICTED_COST, but I find this
less clear than what I've done (well, I would say that :-) ).

> In patch 4:
> 
> >+  /* Check that our new insn isn't going to clobber ORIG_OTHER_DEST.  */
> >+  bool modified_in_x = (set_tmp != NULL_RTX)
> >+			&& modified_in_p (orig_other_dest, set_tmp);
> 
> Watch line wrapping. No parens around the first subexpression (there
> are other examples of unnecessary ones in invocations of
> noce_arith_helper), but around the full one.

I'll catch these and others on commit, thanks for pointing them out.

Thanks,
James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost
  2016-07-20 16:40                                   ` James Greenhalgh
@ 2016-07-21 11:32                                     ` Bernd Schmidt
  0 siblings, 0 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-21 11:32 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: gcc-patches, nd, richard.guenther, Ramana.Radhakrishnan,
	bernds_cb1, ebotcazou, steven, rep.dot.nop, law

On 07/20/2016 06:39 PM, James Greenhalgh wrote:
> On Wed, Jul 20, 2016 at 01:41:39PM +0200, Bernd Schmidt wrote:
>> How about wrapping the param value into COSTS_N_INSNS, to make the
>> value of the param less dependent on compiler internals?
>
> I did consider this, but found it hard to word for the user documentation.
> I found it easier to understand when it was in the same units as
> rtx_cost, particularly as the AArch64 backend prints RTX costs to most
> dump files (including ce1, ce2, ce3) so comparing directly was easy for me
> to grok. I think going in either direction has the potential to confuse
> users, the cost metrics of the RTL passes are very tightly coupled to
> compiler internals.
>
> I don't have a strong feeling either way, just a slight preference to keep
> everything in the same units as rtx_cost where I can.

I guess it's ok.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-07-21 11:32 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 15:01 [Patch] Teach RTL ifcvt to handle multiple simple set instructions James Greenhalgh
2015-09-10 18:24 ` Bernd Schmidt
2015-09-10 21:34   ` Jeff Law
2015-09-11  8:51     ` Kyrill Tkachov
2015-09-11 21:49       ` Jeff Law
2015-09-11  9:04     ` Bernd Schmidt
2015-09-11  9:08       ` Ramana Radhakrishnan
2015-09-11 10:55         ` James Greenhalgh
2015-09-25 15:06           ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs James Greenhalgh
2015-09-25 15:06             ` [Patch ifcvt 1/3] Factor out cost calculations from noce cases James Greenhalgh
2015-09-25 15:08             ` [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h James Greenhalgh
2015-09-25 15:14             ` [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64 James Greenhalgh
2015-09-29 10:43               ` Richard Biener
2015-09-25 15:28             ` [Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion James Greenhalgh
2015-09-29 10:36             ` [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs Richard Biener
2015-09-29 15:28               ` James Greenhalgh
2015-09-29 19:52                 ` Mike Stump
2015-09-30  8:42                   ` Richard Biener
2015-09-30  8:48                 ` Richard Biener
2015-09-30 19:01                   ` Mike Stump
2015-10-01  9:37                 ` Bernd Schmidt
2015-10-09 11:28                   ` Bernd Schmidt
2015-10-09 15:28                     ` Jeff Law
2016-06-02 16:54                     ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models James Greenhalgh
2016-06-02 16:54                       ` [RFC: Patch 1/6] New target hook: rtx_branch_cost James Greenhalgh
2016-06-03 10:39                         ` Richard Biener
2016-06-21 15:51                           ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost James Greenhalgh
2016-06-21 15:51                             ` [RFC: Patch 5/6 v2] Improve the cost model for multiple-sets James Greenhalgh
2016-07-13 21:23                               ` Jeff Law
2016-06-21 15:51                             ` [RFC: Patch 3/6 v2] Remove if_info->branch_cost James Greenhalgh
2016-07-13 21:19                               ` Jeff Law
2016-06-21 15:51                             ` [RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
2016-07-13 21:18                               ` Jeff Law
2016-06-21 15:53                             ` [RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
2016-07-13 21:24                               ` Jeff Law
2016-06-21 15:53                             ` [RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith James Greenhalgh
2016-07-13 21:22                               ` Jeff Law
2016-06-21 21:31                             ` [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost Bernhard Reutner-Fischer
2016-06-30 12:01                             ` Bernd Schmidt
2016-07-13 21:16                             ` Jeff Law
2016-07-20  9:52                               ` [Re: RFC: Patch 1/2 v3] " James Greenhalgh
2016-07-20  9:52                                 ` [Patch RFC: 3/2 v3] Don't expand a conditional move between identical sources James Greenhalgh
2016-07-20  9:53                                 ` [RFC: Patch 2/2 v3] Introduce a new cost model for ifcvt James Greenhalgh
2016-07-20  9:53                                 ` [Patch RFC 4/2 v3] Refactor noce_try_cmove_arith James Greenhalgh
2016-07-20 11:41                                 ` [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Bernd Schmidt
2016-07-20 16:40                                   ` James Greenhalgh
2016-07-21 11:32                                     ` Bernd Schmidt
2016-06-02 16:54                       ` [RFC: Patch 4/6] Modify cost model for noce_cmove_arith James Greenhalgh
2016-06-02 16:55                       ` [RFC: Patch 2/6] Factor out the comparisons against magic numbers in ifcvt James Greenhalgh
2016-06-02 16:55                       ` [RFC: Patch 5/6] Improve the cost model for multiple-sets James Greenhalgh
2016-06-02 16:56                       ` [RFC: Patch 6/6] Remove second cost model from noce_try_store_flag_mask James Greenhalgh
2016-06-02 16:56                       ` [RFC: Patch 3/6] Remove if_info->branch_cost James Greenhalgh
2016-06-03  9:32                       ` [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models Bernd Schmidt
2016-06-09 16:58                       ` Jeff Law
2016-06-10 10:45                         ` James Greenhalgh
2015-09-12 14:04     ` [Patch] Teach RTL ifcvt to handle multiple simple set instructions Eric Botcazou
2015-10-30 18:09   ` [Patch ifcvt] " James Greenhalgh
2015-11-04 11:04     ` Bernd Schmidt
2015-11-04 15:37       ` James Greenhalgh
2015-11-06  9:13         ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).