public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
@ 2015-07-10 12:31 Kyrill Tkachov
  2015-07-10 13:05 ` Kyrill Tkachov
  2015-07-10 23:00 ` Bernhard Reutner-Fischer
  0 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-10 12:31 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4537 bytes --]

Hi all,

This patch makes if-conversion more aggressive when handling code of the form:
if (test)
   x := a  //THEN
else
   x := b  //ELSE

Currently, we can handle this case only if x:=a and x:=b are simple single set instructions.
With this patch we will be able to handle the cases where x:=a and x:=b take multiple instructions.
This can be done under the condition that all the instructions in the THEN and ELSE basic blocks are
only used to compute a value for x.  I suppose we could generalise even further (perhaps to handle
cases where multiple x's are being set) but that's out of the scope of this patch.

This was sparked by some cases in aarch64 where the THEN or ELSE branches contained an extra
zero_extend operation after an arithmetic instruction which prevented if-conversion.

To implement this approach noce_process_if_block in ifcvt.c is relaxed to allow multi-instruction
basic blocks when the intermediate values produced in them don't escape the basic block except
through x.  noce_process_if_block then calls a number of other functions to detect various
patterns and if-convert. Most of them don't actually make sense for multi-instruction basic blocks
so they are updated to reject them and operate only on the existing single-instruction case.

However, noce_try_cmove_arith can take advantage of multi-instruction basic blocks and is thus
updated to emit the whole basic blocks rather than just one instruction.

The transformation is, of course, guarded on a cost calculation.
The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
exceed the branch cost. I don't think that's quite a right calculation.
We're going to be executing at least one of the basic blocks anyway.
This patch we instead check the *maximum* of the two blocks against the branch cost.
This should still catch cases where a high latency instruction appears in one or both of
the paths.


This transformation applies to targets with conditional move operations but no conditional
execution. Thus, it applies to aarch64 and x86_64, but not arm.

The effect of this patch is more noticeable if the backend branch cost is higher (like you'd expect).


Not increasing the branch cost we still get more aggressive if-conversion.
Across the whole of SPEC2006 I saw a 5.8% increase in the number of csel instructions generated
(from 41242 -> 43637)

Bootstrapped and tested on aarch64, x86_6, arm.
I've made the testcases aarch64-specific since they depend on backend branch costs that are hard
to predict across all platforms (we don't have a -mbranch-cost= option ;))
No performance regressions on SPEC2006 on aarch64 and x86_64.
On aarch64 I've seen 482.sphinx3 improve by 2.3% and 459.GemsFDTD by 2.1%

Some of the testcases in aarch64.exp now fail their scan-assembler patterns due to if-conversion.
I've updated those testcases to properly generate the pattern they expect. The changes are mostly
due to add+compare-style instructions now appearing in the same basic blocks as their result uses,
which, I think, scares combine away from combining them into one.

Does this approach look reasonable?
If so, ok for trunk?

Thanks,
Kyrill


2015-07-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
     then_cost, else_cost fields.
     (end_ifcvt_sequence): Call set_used_flags on each insn in the
     sequence.
     (noce_simple_bbs): New function.
     (noce_try_move): Bail if basic blocks are not simple.
     (noce_try_store_flag): Likewise.
     (noce_try_store_flag_constants): Likewise.
     (noce_try_addcc): Likewise.
     (noce_try_store_flag_mask): Likewise.
     (noce_try_cmove): Likewise.
     (noce_try_minmax): Likewise.
     (noce_try_abs): Likewise.
     (noce_try_sign_mask): Likewise.
     (noce_try_bitop): Likewise.
     (bbs_ok_for_cmove_arith): New function.
     (noce_emit_all_but_last): Likewise.
     (noce_emit_insn): Likewise.
     (noce_emit_bb): Likewise.
     (noce_try_cmove_arith): Handle non-simple basic blocks.
     (insn_valid_noce_process_p): New function.
     (bb_valid_for_noce_process_p): Likewise.
     (noce_process_if_block): Allow non-simple basic blocks
     where appropriate.


2015-07-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ifcvt_csel_1.c: New test.
     * gcc.target/aarch64/ifcvt_csel_2.c: New test.
     * gcc.target/aarch64/ifcvt_csel_3.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt.patch --]
[-- Type: text/x-patch; name=ifcvt.patch, Size: 20381 bytes --]

commit b6fe0e0a5f64fdc11fbbd7c9e05caeeb23e21662
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

    [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..3d324257 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -815,6 +815,15 @@ struct noce_if_info
      form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  int then_cost;
+  int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
   int branch_cost;
 };
@@ -1036,6 +1045,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info->cond);
   set_used_flags (if_info->a);
   set_used_flags (if_info->b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1053,6 +1066,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info->then_simple)
+    return false;
+
+  if (if_info->else_bb)
+    return if_info->else_simple;
+
+  return true;
+}
+
 /* Convert "if (a != b) x = a; else x = b" into "x = a" and
    "if (a == b) x = a; else x = b" into "x = b".  */
 
@@ -1067,6 +1095,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE && code != EQ)
     return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
      or a signed zero.  */
   if (HONOR_NANS (if_info->x)
@@ -1115,6 +1146,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->b)
       && INTVAL (if_info->b) == STORE_FLAG_VALUE
       && if_info->a == const0_rtx)
@@ -1163,6 +1197,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   int normalize, can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->a)
       && CONST_INT_P (if_info->b))
     {
@@ -1291,6 +1328,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (GET_CODE (if_info->a) == PLUS
       && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
       && (reversed_comparison_code (if_info->cond, if_info->jump)
@@ -1382,6 +1422,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   reversep = 0;
   if ((if_info->branch_cost >= 2
        || STORE_FLAG_VALUE == -1)
@@ -1550,6 +1593,9 @@ noce_try_cmove (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
       && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
     {
@@ -1584,6 +1630,92 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+	continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+	return false;
+
+      rtx dest_reg = SET_DEST (sset_a);
+      rtx_insn *b_insn;
+
+      FOR_BB_INSNS (bb_b, b_insn)
+	{
+	  if (!active_insn_p (b_insn))
+	    continue;
+
+	  rtx sset_b = single_set (b_insn);
+
+	  if (!sset_b)
+	    return false;
+
+	  if (reg_referenced_p (dest_reg, sset_b))
+	    return false;
+	}
+    }
+
+  return true;
+}
+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+	{
+	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+	  emit_insn (PATTERN (to_emit));
+	}
+    }
+}
+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -1594,9 +1726,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx x = if_info->x;
   rtx orig_a, orig_b;
   rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
   rtx target;
   int is_mem = 0;
-  int insn_cost;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
 
@@ -1635,27 +1770,23 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  /* Total insn_rtx_cost should be smaller than branch cost.  Exit
-     if insn_rtx_cost can't be estimated.  */
+  int then_cost;
+  int else_cost;
   if (insn_a)
-    {
-      insn_cost
-	= insn_rtx_cost (PATTERN (insn_a),
-      			 optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_a)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-	return FALSE;
-    }
+    then_cost = if_info->then_cost;
   else
-    insn_cost = 0;
+    then_cost = 0;
 
   if (insn_b)
-    {
-      insn_cost
-	+= insn_rtx_cost (PATTERN (insn_b),
-      			  optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_b)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-        return FALSE;
-    }
+    else_cost = if_info->else_cost;
+  else
+    else_cost = 0;
+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost)
+      > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -1671,26 +1802,35 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  code = reversed_comparison_code (if_info->cond, if_info->jump);
 	  std::swap (a, b);
 	  std::swap (insn_a, insn_b);
+	  std::swap (a_simple, b_simple);
+	  std::swap (then_bb, else_bb);
 	}
     }
 
+  if (!a_simple && then_bb && !b_simple && else_bb
+      && !bbs_ok_for_cmove_arith (then_bb, else_bb))
+    return FALSE;
+
   start_sequence ();
 
   orig_a = a;
   orig_b = b;
 
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
      This is of course not possible in the IS_MEM case.  */
+
   if (! general_operand (a, GET_MODE (a)))
     {
-      rtx_insn *insn;
 
       if (is_mem)
 	{
 	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  insn = emit_insn (gen_rtx_SET (reg, a));
+	  emit_a = gen_rtx_SET (reg, a);
 	}
       else if (! insn_a)
 	goto end_seq_and_fail;
@@ -1700,21 +1840,17 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
 	  rtx set = single_set (copy_of_a);
 	  SET_DEST (set) = a;
-	  insn = emit_insn (PATTERN (copy_of_a));
+
+	  emit_a = PATTERN (copy_of_a);
 	}
-      if (recog_memoized (insn) < 0)
-	goto end_seq_and_fail;
     }
+
   if (! general_operand (b, GET_MODE (b)))
     {
-      rtx pat;
-      rtx_insn *last;
-      rtx_insn *new_insn;
-
       if (is_mem)
 	{
           rtx reg = gen_reg_rtx (GET_MODE (b));
-	  pat = gen_rtx_SET (reg, b);
+	  emit_b = gen_rtx_SET (reg, b);
 	}
       else if (! insn_b)
 	goto end_seq_and_fail;
@@ -1723,26 +1859,39 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
           b = gen_reg_rtx (GET_MODE (b));
 	  rtx_insn *copy_of_insn_b = as_a <rtx_insn *> (copy_rtx (insn_b));
 	  rtx set = single_set (copy_of_insn_b);
+
 	  SET_DEST (set) = b;
-	  pat = PATTERN (copy_of_insn_b);
+	  emit_b = PATTERN (copy_of_insn_b);
 	}
+    }
 
-      /* If insn to set up A clobbers any registers B depends on, try to
-	 swap insn that sets up A with the one that sets up B.  If even
-	 that doesn't help, punt.  */
-      last = get_last_insn ();
-      if (last && modified_in_p (orig_b, last))
-	{
-	  new_insn = emit_insn_before (pat, get_insns ());
-	  if (modified_in_p (orig_a, new_insn))
-	    goto end_seq_and_fail;
-	}
-      else
-	new_insn = emit_insn (pat);
+    /* If insn to set up A clobbers any registers B depends on, try to
+       swap insn that sets up A with the one that sets up B.  If even
+       that doesn't help, punt.  */
 
-      if (recog_memoized (new_insn) < 0)
-	goto end_seq_and_fail;
-    }
+    if (emit_a && modified_in_p (orig_b, emit_a))
+      {
+	if (modified_in_p (orig_a, emit_b))
+	  goto end_seq_and_fail;
+
+	if (else_bb && !b_simple)
+	  {
+	    if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	      goto end_seq_and_fail;
+	  }
+
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+      }
+    else
+      {
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+
+	if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	  goto end_seq_and_fail;
+
+      }
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
@@ -1946,6 +2095,9 @@ noce_try_minmax (struct noce_if_info *if_info)
   enum rtx_code code, op;
   int unsignedp;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* ??? Reject modes with NaNs or signed zeros since we don't know how
      they will be resolved with an SMIN/SMAX.  It wouldn't be too hard
      to get the target to tell us...  */
@@ -2042,6 +2194,9 @@ noce_try_abs (struct noce_if_info *if_info)
   int negate;
   bool one_cmpl = false;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Reject modes with signed zeros.  */
   if (HONOR_SIGNED_ZEROS (if_info->x))
     return FALSE;
@@ -2190,6 +2345,9 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   enum rtx_code code;
   bool t_unconditional;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   cond = if_info->cond;
   code = GET_CODE (cond);
   m = XEXP (cond, 0);
@@ -2273,6 +2431,9 @@ noce_try_bitop (struct noce_if_info *if_info)
   cond = if_info->cond;
   code = GET_CODE (cond);
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Check for no else condition.  */
   if (! rtx_equal_p (x, if_info->b))
     return FALSE;
@@ -2523,6 +2684,121 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
   return false;
 }
 
+/* Helper for bb_valid_for_noce_process_p.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+      || !NONJUMP_INSN_P (insn)
+      || (cc && set_of (cc, insn)))
+      return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+      || !noce_operand_ok (SET_DEST (sset))
+      || !noce_operand_ok (SET_SRC (sset)))
+    return false;
+
+  return true;
+}
+
+/* Return true iff basic block TEST_BB followed by a join basic block
+   JOIN_BB is valid for noce if-conversion.
+   The condition used in this if-conversion is in COND.
+   In practice, check that TEST_BB ends with a single set
+   x := a and all previous computations
+   in TEST_BB don't produce any values that are live after TEST_BB.
+   In other words, all the insns in TEST_BB are there only
+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  If the bb is not simple place all insns
+   except the last insn into SEQ.  */
+
+static bool
+bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
+			      int *cost, bool *simple_p)
+{
+  if (!test_bb)
+    return false;
+
+  rtx_insn *last_insn = last_active_insn (test_bb, FALSE);
+  rtx last_set = NULL_RTX;
+
+  rtx cc = cc_in_cond (cond);
+
+  if (!insn_valid_noce_process_p (last_insn, cc))
+    return FALSE;
+  last_set = single_set (last_insn);
+
+  rtx x = SET_DEST (last_set);
+
+  rtx_insn *first_insn = first_active_insn (test_bb);
+  rtx first_set = single_set (first_insn);
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  *cost = insn_rtx_cost (last_set, speed_p);
+  if (!first_set)
+    return false;
+  /* We have a single simple set, that's okay.  */
+  else if (first_insn == last_insn)
+    {
+      *simple_p = noce_operand_ok (SET_DEST (first_set));
+      *cost = insn_rtx_cost (first_set, speed_p);
+      return *simple_p;
+    }
+
+  *simple_p = false;
+
+  rtx_insn *prev_last_insn = PREV_INSN (last_insn);
+  gcc_assert (prev_last_insn);
+
+  /* For now, disallow setting x multiple times in test_bb.  */
+  if (REG_P (x) && reg_set_between_p (x, first_insn, prev_last_insn))
+    return false;
+
+  bitmap test_bb_temps = BITMAP_ALLOC (&reg_obstack);
+
+  /* The regs that are live out of test_bb.  */
+  bitmap test_bb_live_out = df_get_live_out (test_bb);
+
+  rtx_insn *insn;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      if (insn != last_insn)
+	{
+	  if (!active_insn_p (insn))
+	    continue;
+
+	  if (!insn_valid_noce_process_p (insn, cc))
+	    goto free_bitmap_and_fail;
+
+	  rtx sset = single_set (insn);
+	  gcc_assert (sset);
+
+	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
+	    goto free_bitmap_and_fail;
+
+	  *cost += insn_rtx_cost (sset, speed_p);
+	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
+	}
+    }
+
+  /* If any of the intermediate results in test_bb are live after test_bb
+     then fail.  */
+  if (bitmap_intersect_p (test_bb_live_out, test_bb_temps))
+    goto free_bitmap_and_fail;
+
+  BITMAP_FREE (test_bb_temps);
+  return true;
+
+  free_bitmap_and_fail:
+    BITMAP_FREE (test_bb_temps);
+    return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2539,7 +2815,6 @@ noce_process_if_block (struct noce_if_info *if_info)
   rtx_insn *insn_a, *insn_b;
   rtx set_a, set_b;
   rtx orig_x, x, a, b;
-  rtx cc;
 
   /* We're looking for patterns of the form
 
@@ -2548,16 +2823,31 @@ noce_process_if_block (struct noce_if_info *if_info)
      (3) if (...) x = a;   // as if with an initial x = x.
 
      The later patterns require jumps to be more expensive.
-
+     For the if (...) x = a; else x = b; case we allow multiple insns
+     inside the then and else blocks as long as their only effect is
+     to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  /* Look for one of the potential sets.  */
-  insn_a = first_active_insn (then_bb);
-  if (! insn_a
-      || insn_a != last_active_insn (then_bb, FALSE)
-      || (set_a = single_set (insn_a)) == NULL_RTX)
+  bool then_bb_valid
+    = bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple);
+
+  bool else_bb_valid = false;
+  if (else_bb)
+    else_bb_valid
+      = bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple);
+
+  if (!then_bb_valid)
+    return FALSE;
+
+  if (else_bb && !else_bb_valid)
     return FALSE;
 
+  insn_a = last_active_insn (then_bb, FALSE);
+  set_a = single_set (insn_a);
+  gcc_assert (set_a);
+
   x = SET_DEST (set_a);
   a = SET_SRC (set_a);
 
@@ -2571,12 +2861,12 @@ noce_process_if_block (struct noce_if_info *if_info)
   set_b = NULL_RTX;
   if (else_bb)
     {
-      insn_b = first_active_insn (else_bb);
-      if (! insn_b
-	  || insn_b != last_active_insn (else_bb, FALSE)
-	  || (set_b = single_set (insn_b)) == NULL_RTX
-	  || ! rtx_interchangeable_p (x, SET_DEST (set_b)))
-	return FALSE;
+      insn_b = last_active_insn (else_bb, FALSE);
+      set_b = single_set (insn_b);
+      gcc_assert (set_b);
+
+      if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
+        return FALSE;
     }
   else
     {
@@ -2651,20 +2941,14 @@ noce_process_if_block (struct noce_if_info *if_info)
   if_info->a = a;
   if_info->b = b;
 
-  /* Skip it if the instruction to be moved might clobber CC.  */
-  cc = cc_in_cond (cond);
-  if (cc
-      && (set_of (cc, insn_a)
-	  || (insn_b && set_of (cc, insn_b))))
-    return FALSE;
-
   /* Try optimizations in some approximation of a useful order.  */
   /* ??? Should first look to see if X is live incoming at all.  If it
      isn't, we don't need anything but an unconditional set.  */
 
   /* Look and see if A and B are really the same.  Avoid creating silly
      cmove constructs that no one will fix up later.  */
-  if (rtx_interchangeable_p (a, b))
+  if (noce_simple_bbs (if_info)
+      && rtx_interchangeable_p (a, b))
     {
       /* If we have an INSN_B, we don't have to create any new rtl.  Just
 	 move the instruction that we already have.  If we don't have an
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
new file mode 100644
index 0000000..1836f57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+foo (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
new file mode 100644
index 0000000..8c48270
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint16_t;
+
+uint8_t
+_xtime (const uint8_t byte, const uint16_t generator)
+{
+  if (byte & 0x80)
+    return byte ^ generator;
+  else
+    return byte << 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
new file mode 100644
index 0000000..1aecbc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+
+typedef long long s64;
+
+int
+foo (s64 a, s64 b, s64 c)
+{
+ s64 d = a - b;
+
+  if (d == 0)
+    return a + c;
+  else
+    return b + d + c;
+}
+
+/* This test can be reduced to just return a + c;  */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-10 12:31 [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive Kyrill Tkachov
@ 2015-07-10 13:05 ` Kyrill Tkachov
  2015-07-10 23:00 ` Bernhard Reutner-Fischer
  1 sibling, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-10 13:05 UTC (permalink / raw)
  To: GCC Patches


On 10/07/15 13:31, Kyrill Tkachov wrote:
> +   to compute a value for x.  Put the rtx cost of the insns
> +   in TEST_BB into COST.  Record whether TEST_BB is a single simple
> +   set instruction in SIMPLE_P.  If the bb is not simple place all insns
> +   except the last insn into SEQ.  */
> +

That last sentence is stale. That function doesn't have a SEQ argument.
Consider that comment sentence removed.

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-10 12:31 [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive Kyrill Tkachov
  2015-07-10 13:05 ` Kyrill Tkachov
@ 2015-07-10 23:00 ` Bernhard Reutner-Fischer
  2015-07-10 23:08   ` Bernhard Reutner-Fischer
  2015-07-13  9:46   ` Kyrill Tkachov
  1 sibling, 2 replies; 31+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-07-10 23:00 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2234 bytes --]

On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> This patch makes if-conversion more aggressive when handling code of the
> form:
> if (test)
>   x := a  //THEN
> else
>   x := b  //ELSE

> The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
> exceed the branch cost. I don't think that's quite a right calculation.
> We're going to be executing at least one of the basic blocks anyway.
> This patch we instead check the *maximum* of the two blocks against the branch cost.
> This should still catch cases where a high latency instruction appears in one or both of
> the paths.

Shouldn't this maximum also take probability into account? Or maybe
not, would have to think about it tomorrow.

$ contrib/check_GNU_style.sh rtl-ifcvt.00.patch

Blocks of 8 spaces should be replaced with tabs.
783:+        return FALSE;


Generally ifcvt.c (resp. the whole tree) could use a
sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c
Maybe some of the int predicates could then become bools.


+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff


Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template


bbs_ok_for_cmove_arith() looks costly but i guess you looked if
there's some pre-existing cleverness you could have used instead?

noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().

insn_rtx_cost() should return an unsigned int, then_cost, else_cost
should thus be unsigned int too.

copy_of_a versus copy_of_insn_b; I'd shorten the latter.

bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
but there is none?
Also should document the return value (and should not clobber the OUT
params upon failure, no?).

As for the testcases, it would be nice to have at least a tiny bit for
x86_64, too.

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

cheers,

[-- Attachment #2: 0001-rtl-ifcvt-typos.patch --]
[-- Type: text/x-patch, Size: 6044 bytes --]

From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
Date: Fri, 10 Jul 2015 21:25:30 +0200
Subject: [PATCH] rtl-ifcvt typos

fix some typos on top of Kyrill's patch.
should mv the testcases to common ground.
---
 gcc/ifcvt.c                                     |   49 ++++++++++-------------
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c |    2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c |    2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c |    7 ++--
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3d324257..0bf6645 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1784,8 +1784,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
      bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost)
-      > COSTS_N_INSNS (if_info->branch_cost))
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
     return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -2730,28 +2729,26 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   rtx cc = cc_in_cond (cond);
 
   if (!insn_valid_noce_process_p (last_insn, cc))
-    return FALSE;
+    return false;
   last_set = single_set (last_insn);
 
   rtx x = SET_DEST (last_set);
-
   rtx_insn *first_insn = first_active_insn (test_bb);
   rtx first_set = single_set (first_insn);
-  bool speed_p = optimize_bb_for_speed_p (test_bb);
 
-  *cost = insn_rtx_cost (last_set, speed_p);
   if (!first_set)
     return false;
+
   /* We have a single simple set, that's okay.  */
-  else if (first_insn == last_insn)
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
     {
       *simple_p = noce_operand_ok (SET_DEST (first_set));
       *cost = insn_rtx_cost (first_set, speed_p);
       return *simple_p;
     }
 
-  *simple_p = false;
-
   rtx_insn *prev_last_insn = PREV_INSN (last_insn);
   gcc_assert (prev_last_insn);
 
@@ -2764,6 +2761,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   /* The regs that are live out of test_bb.  */
   bitmap test_bb_live_out = df_get_live_out (test_bb);
 
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
   rtx_insn *insn;
   FOR_BB_INSNS (test_bb, insn)
     {
@@ -2781,7 +2779,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
 	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
 	    goto free_bitmap_and_fail;
 
-	  *cost += insn_rtx_cost (sset, speed_p);
+	  potential_cost += insn_rtx_cost (sset, speed_p);
 	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
 	}
     }
@@ -2792,11 +2790,13 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
     goto free_bitmap_and_fail;
 
   BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
   return true;
 
-  free_bitmap_and_fail:
-    BITMAP_FREE (test_bb_temps);
-    return false;
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
@@ -2828,21 +2828,14 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  bool then_bb_valid
-    = bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
-				    &if_info->then_simple);
-
-  bool else_bb_valid = false;
-  if (else_bb)
-    else_bb_valid
-      = bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
-				      &if_info->else_simple);
-
-  if (!then_bb_valid)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
 
-  if (else_bb && !else_bb_valid)
-    return FALSE;
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
 
   insn_a = last_active_insn (then_bb, FALSE);
   set_a = single_set (insn_a);
@@ -2866,7 +2859,7 @@ noce_process_if_block (struct noce_if_info *if_info)
       gcc_assert (set_b);
 
       if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
-        return FALSE;
+	return FALSE;
     }
   else
     {
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
index 1836f57..92bc17a 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
 /* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
index 8c48270..e0e1728 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
 /* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
index 1aecbc9..0441357 100644
--- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c
@@ -1,5 +1,5 @@
-/* { dg-do compile } */
-/* { dg-options "-save-temps -O2" } */
+/* { dg-do assemble { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
 
 
 typedef long long s64;
@@ -16,4 +16,5 @@ foo (s64 a, s64 b, s64 c)
 }
 
 /* This test can be reduced to just return a + c;  */
-/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" } } */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */
-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-10 23:00 ` Bernhard Reutner-Fischer
@ 2015-07-10 23:08   ` Bernhard Reutner-Fischer
  2015-07-13  9:46   ` Kyrill Tkachov
  1 sibling, 0 replies; 31+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-07-10 23:08 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On 11 July 2015 at 01:00, Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
> On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)

err, arm and aarch64 have no -mbranch-cost, a couple of prominent
other arches do..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-10 23:00 ` Bernhard Reutner-Fischer
  2015-07-10 23:08   ` Bernhard Reutner-Fischer
@ 2015-07-13  9:46   ` Kyrill Tkachov
  2015-07-13 10:00     ` Kyrill Tkachov
                       ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-13  9:46 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: GCC Patches

Hi Bernhard,

On 11/07/15 00:00, Bernhard Reutner-Fischer wrote:
> On 10 July 2015 at 14:31, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi all,
>>
>> This patch makes if-conversion more aggressive when handling code of the
>> form:
>> if (test)
>>    x := a  //THEN
>> else
>>    x := b  //ELSE
>> The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't
>> exceed the branch cost. I don't think that's quite a right calculation.
>> We're going to be executing at least one of the basic blocks anyway.
>> This patch we instead check the *maximum* of the two blocks against the branch cost.
>> This should still catch cases where a high latency instruction appears in one or both of
>> the paths.
> Shouldn't this maximum also take probability into account? Or maybe
> not, would have to think about it tomorrow.

The branch cost that we test against (recorded in if_info earlier in the ifcvt.c
call chain) is either the predictable branch cost or the unpredictable branch
cost, depending on what the predictable_edge_p machinery returned.
I think checking against that should be enough, but it's an easy thing to experiment
with, so I'm open to arguments in any direction.

>
> $ contrib/check_GNU_style.sh rtl-ifcvt.00.patch
>
> Blocks of 8 spaces should be replaced with tabs.
> 783:+        return FALSE;
>
>
> Generally ifcvt.c (resp. the whole tree) could use a
> sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c
> Maybe some of the int predicates could then become bools.

Ok, will go over the style in the patch.

>
>
> +/* Return iff the registers that the insns in BB_A set do not
> +   get used in BB_B.  */
>
> Return true iff

I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
I'll use a normal if here.

>
>
> Did you include go in your testing?
> I see:
> Unexpected results in this build (new failures)
> FAIL: encoding/json
> FAIL: go/printer
> FAIL: go/scanner
> FAIL: html/template
> FAIL: log
> FAIL: net/http
> FAIL: net/http/cgi
> FAIL: net/http/cookiejar
> FAIL: os
> FAIL: text/template

Hmmm. I don't see these failures. I double checked right now and they
appear as PASS in my configuration.

I tested make check-go on x86_64-unknown-linux-gnu configured with
--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.

Are you sure this is not some other issue in your tree?

>
> bbs_ok_for_cmove_arith() looks costly but i guess you looked if
> there's some pre-existing cleverness you could have used instead?

I did have a look, but couldn't find any.
The bbs_ok_for_cmove_arith is done after the costs check
so I'd hope that the costs check would already discard
really long basic-blocks.

>
> noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().
>
> insn_rtx_cost() should return an unsigned int, then_cost, else_cost
> should thus be unsigned int too.
>
> copy_of_a versus copy_of_insn_b; I'd shorten the latter.

Thanks, good suggestions.

>
> bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
> but there is none?
> Also should document the return value (and should not clobber the OUT
> params upon failure, no?).

bah, I forgot to update the comment once I modified the function
during development of the patch. I'll fix those.

>
> As for the testcases, it would be nice to have at least a tiny bit for
> x86_64, too.

I can put the testcases in gcc.dg and enable them for x86 as well,
but I think a couple of the already pass as is because x86 doesn't
need to do an extra zero_extend inside the basic-block.

>
> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
> PPS: attached meant to illustrate comments above. Untested.

Thanks a lot! This is all very helpful.
I'll respin the patch.

Thanks,
Kyrill


>
> cheers,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13  9:46   ` Kyrill Tkachov
@ 2015-07-13 10:00     ` Kyrill Tkachov
  2015-07-13 10:48     ` Bernhard Reutner-Fischer
  2015-07-13 14:03     ` Kyrill Tkachov
  2 siblings, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-13 10:00 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: GCC Patches


On 13/07/15 10:45, Kyrill Tkachov wrote:
>
>>
>> +/* Return iff the registers that the insns in BB_A set do not
>> +   get used in BB_B.  */
>>
>> Return true iff
> I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
> I'll use a normal if here.

Err, of course you were talking about the missing 'true'.
Sorry, early morning.

Kyrill


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13  9:46   ` Kyrill Tkachov
  2015-07-13 10:00     ` Kyrill Tkachov
@ 2015-07-13 10:48     ` Bernhard Reutner-Fischer
  2015-07-13 13:12       ` Kyrill Tkachov
  2015-07-13 14:03     ` Kyrill Tkachov
  2 siblings, 1 reply; 31+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-07-13 10:48 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>Hi Bernhard,
>

>> Did you include go in your testing?
>> I see:
>> Unexpected results in this build (new failures)
>> FAIL: encoding/json
>> FAIL: go/printer
>> FAIL: go/scanner
>> FAIL: html/template
>> FAIL: log
>> FAIL: net/http
>> FAIL: net/http/cgi
>> FAIL: net/http/cookiejar
>> FAIL: os
>> FAIL: text/template
>
>Hmmm. I don't see these failures. I double checked right now and they
>appear as PASS in my configuration.
>
>I tested make check-go on x86_64-unknown-linux-gnu configured with
>--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.
>
>Are you sure this is not some other issue in your tree?

I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL.

Thanks,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13 10:48     ` Bernhard Reutner-Fischer
@ 2015-07-13 13:12       ` Kyrill Tkachov
  0 siblings, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-13 13:12 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: GCC Patches


On 13/07/15 11:48, Bernhard Reutner-Fischer wrote:
> On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi Bernhard,
>>
>>> Did you include go in your testing?
>>> I see:
>>> Unexpected results in this build (new failures)
>>> FAIL: encoding/json
>>> FAIL: go/printer
>>> FAIL: go/scanner
>>> FAIL: html/template
>>> FAIL: log
>>> FAIL: net/http
>>> FAIL: net/http/cgi
>>> FAIL: net/http/cookiejar
>>> FAIL: os
>>> FAIL: text/template
>> Hmmm. I don't see these failures. I double checked right now and they
>> appear as PASS in my configuration.
>>
>> I tested make check-go on x86_64-unknown-linux-gnu configured with
>> --without-isl --disable-multilib --enable-languages=c,c++,fortran,go.
>>
>> Are you sure this is not some other issue in your tree?
> I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL.

I've rebuilt with ISL from scratch and the tests still pass for me.
I'm testing Ubuntu on a Haswell machine, don't know if that's relevant.

Kyrill

>
> Thanks,
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13  9:46   ` Kyrill Tkachov
  2015-07-13 10:00     ` Kyrill Tkachov
  2015-07-13 10:48     ` Bernhard Reutner-Fischer
@ 2015-07-13 14:03     ` Kyrill Tkachov
  2015-07-20 10:59       ` Kyrill Tkachov
  2015-07-23 20:57       ` Jeff Law
  2 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-13 14:03 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1832 bytes --]

Hi Bernhard,

On 13/07/15 10:45, Kyrill Tkachov wrote:
>> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
>> PPS: attached meant to illustrate comments above. Untested.
> Thanks a lot! This is all very helpful.
> I'll respin the patch.

Here it is. I've expanded the comments in the functions you mentioned,
moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed
the types of the costs used to unsigned int.


Bootstrapped on aarch64 and x86_64.
The go testsuite passes on x86_64-unknown-linux-gnu for me...

Thanks,
Kyrill

2015-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
	then_cost, else_cost fields.  Change branch_cost field to unsigned int.
	(end_ifcvt_sequence): Call set_used_flags on each insn in the
	sequence.
	(noce_simple_bbs): New function.
	(noce_try_move): Bail if basic blocks are not simple.
	(noce_try_store_flag): Likewise.
	(noce_try_store_flag_constants): Likewise.
	(noce_try_addcc): Likewise.
	(noce_try_store_flag_mask): Likewise.
	(noce_try_cmove): Likewise.
	(noce_try_minmax): Likewise.
	(noce_try_abs): Likewise.
	(noce_try_sign_mask): Likewise.
	(noce_try_bitop): Likewise.
	(bbs_ok_for_cmove_arith): New function.
	(noce_emit_all_but_last): Likewise.
	(noce_emit_insn): Likewise.
	(noce_emit_bb): Likewise.
	(noce_try_cmove_arith): Handle non-simple basic blocks.
	(insn_valid_noce_process_p): New function.
	(bb_valid_for_noce_process_p): Likewise.
	(noce_process_if_block): Allow non-simple basic blocks
	where appropriate.


2015-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.dg/ifcvt-1.c: New test.
	* gcc.dg/ifcvt-2.c: Likewise.
	* gcc.dg/ifcvt-3.c: Likewise.




>
> Thanks,
> Kyrill
>
>
>> cheers,


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt.patch --]
[-- Type: text/x-patch; name=ifcvt.patch, Size: 20835 bytes --]

commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

    [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..2f0a228 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -815,8 +815,17 @@ struct noce_if_info
      form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1036,6 +1045,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info->cond);
   set_used_flags (if_info->a);
   set_used_flags (if_info->b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1053,6 +1066,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info->then_simple)
+    return false;
+
+  if (if_info->else_bb)
+    return if_info->else_simple;
+
+  return true;
+}
+
 /* Convert "if (a != b) x = a; else x = b" into "x = a" and
    "if (a == b) x = a; else x = b" into "x = b".  */
 
@@ -1067,6 +1095,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE && code != EQ)
     return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
      or a signed zero.  */
   if (HONOR_NANS (if_info->x)
@@ -1115,6 +1146,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->b)
       && INTVAL (if_info->b) == STORE_FLAG_VALUE
       && if_info->a == const0_rtx)
@@ -1163,6 +1197,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   int normalize, can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->a)
       && CONST_INT_P (if_info->b))
     {
@@ -1291,6 +1328,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (GET_CODE (if_info->a) == PLUS
       && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
       && (reversed_comparison_code (if_info->cond, if_info->jump)
@@ -1382,6 +1422,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   reversep = 0;
   if ((if_info->branch_cost >= 2
        || STORE_FLAG_VALUE == -1)
@@ -1550,6 +1593,9 @@ noce_try_cmove (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
       && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
     {
@@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+	continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+	return false;
+
+      rtx dest_reg = SET_DEST (sset_a);
+      rtx_insn *b_insn;
+
+      FOR_BB_INSNS (bb_b, b_insn)
+	{
+	  if (!active_insn_p (b_insn))
+	    continue;
+
+	  rtx sset_b = single_set (b_insn);
+
+	  if (!sset_b)
+	    return false;
+
+	  if (reg_referenced_p (dest_reg, sset_b))
+	    return false;
+	}
+    }
+
+  return true;
+}
+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+	{
+	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+	  emit_insn (PATTERN (to_emit));
+	}
+    }
+}
+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -1594,9 +1730,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx x = if_info->x;
   rtx orig_a, orig_b;
   rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
   rtx target;
   int is_mem = 0;
-  int insn_cost;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
 
@@ -1635,27 +1774,22 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  /* Total insn_rtx_cost should be smaller than branch cost.  Exit
-     if insn_rtx_cost can't be estimated.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
   if (insn_a)
-    {
-      insn_cost
-	= insn_rtx_cost (PATTERN (insn_a),
-      			 optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_a)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-	return FALSE;
-    }
+    then_cost = if_info->then_cost;
   else
-    insn_cost = 0;
+    then_cost = 0;
 
   if (insn_b)
-    {
-      insn_cost
-	+= insn_rtx_cost (PATTERN (insn_b),
-      			  optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_b)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-        return FALSE;
-    }
+    else_cost = if_info->else_cost;
+  else
+    else_cost = 0;
+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -1671,26 +1805,35 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  code = reversed_comparison_code (if_info->cond, if_info->jump);
 	  std::swap (a, b);
 	  std::swap (insn_a, insn_b);
+	  std::swap (a_simple, b_simple);
+	  std::swap (then_bb, else_bb);
 	}
     }
 
+  if (!a_simple && then_bb && !b_simple && else_bb
+      && !bbs_ok_for_cmove_arith (then_bb, else_bb))
+    return FALSE;
+
   start_sequence ();
 
   orig_a = a;
   orig_b = b;
 
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
      This is of course not possible in the IS_MEM case.  */
+
   if (! general_operand (a, GET_MODE (a)))
     {
-      rtx_insn *insn;
 
       if (is_mem)
 	{
 	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  insn = emit_insn (gen_rtx_SET (reg, a));
+	  emit_a = gen_rtx_SET (reg, a);
 	}
       else if (! insn_a)
 	goto end_seq_and_fail;
@@ -1700,49 +1843,58 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
 	  rtx set = single_set (copy_of_a);
 	  SET_DEST (set) = a;
-	  insn = emit_insn (PATTERN (copy_of_a));
+
+	  emit_a = PATTERN (copy_of_a);
 	}
-      if (recog_memoized (insn) < 0)
-	goto end_seq_and_fail;
     }
+
   if (! general_operand (b, GET_MODE (b)))
     {
-      rtx pat;
-      rtx_insn *last;
-      rtx_insn *new_insn;
-
       if (is_mem)
 	{
           rtx reg = gen_reg_rtx (GET_MODE (b));
-	  pat = gen_rtx_SET (reg, b);
+	  emit_b = gen_rtx_SET (reg, b);
 	}
       else if (! insn_b)
 	goto end_seq_and_fail;
       else
 	{
           b = gen_reg_rtx (GET_MODE (b));
-	  rtx_insn *copy_of_insn_b = as_a <rtx_insn *> (copy_rtx (insn_b));
-	  rtx set = single_set (copy_of_insn_b);
+	  rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b));
+	  rtx set = single_set (copy_of_b);
+
 	  SET_DEST (set) = b;
-	  pat = PATTERN (copy_of_insn_b);
+	  emit_b = PATTERN (copy_of_b);
 	}
+    }
 
-      /* If insn to set up A clobbers any registers B depends on, try to
-	 swap insn that sets up A with the one that sets up B.  If even
-	 that doesn't help, punt.  */
-      last = get_last_insn ();
-      if (last && modified_in_p (orig_b, last))
-	{
-	  new_insn = emit_insn_before (pat, get_insns ());
-	  if (modified_in_p (orig_a, new_insn))
-	    goto end_seq_and_fail;
-	}
-      else
-	new_insn = emit_insn (pat);
+    /* If insn to set up A clobbers any registers B depends on, try to
+       swap insn that sets up A with the one that sets up B.  If even
+       that doesn't help, punt.  */
 
-      if (recog_memoized (new_insn) < 0)
-	goto end_seq_and_fail;
-    }
+    if (emit_a && modified_in_p (orig_b, emit_a))
+      {
+	if (modified_in_p (orig_a, emit_b))
+	  goto end_seq_and_fail;
+
+	if (else_bb && !b_simple)
+	  {
+	    if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	      goto end_seq_and_fail;
+	  }
+
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+      }
+    else
+      {
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+
+	if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	  goto end_seq_and_fail;
+
+      }
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
@@ -1946,6 +2098,9 @@ noce_try_minmax (struct noce_if_info *if_info)
   enum rtx_code code, op;
   int unsignedp;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* ??? Reject modes with NaNs or signed zeros since we don't know how
      they will be resolved with an SMIN/SMAX.  It wouldn't be too hard
      to get the target to tell us...  */
@@ -2042,6 +2197,9 @@ noce_try_abs (struct noce_if_info *if_info)
   int negate;
   bool one_cmpl = false;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Reject modes with signed zeros.  */
   if (HONOR_SIGNED_ZEROS (if_info->x))
     return FALSE;
@@ -2190,6 +2348,9 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   enum rtx_code code;
   bool t_unconditional;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   cond = if_info->cond;
   code = GET_CODE (cond);
   m = XEXP (cond, 0);
@@ -2273,6 +2434,9 @@ noce_try_bitop (struct noce_if_info *if_info)
   cond = if_info->cond;
   code = GET_CODE (cond);
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Check for no else condition.  */
   if (! rtx_equal_p (x, if_info->b))
     return FALSE;
@@ -2523,6 +2687,123 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
   return false;
 }
 
+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+      || !NONJUMP_INSN_P (insn)
+      || (cc && set_of (cc, insn)))
+      return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+      || !noce_operand_ok (SET_DEST (sset))
+      || !noce_operand_ok (SET_SRC (sset)))
+    return false;
+
+  return true;
+}
+
+/* Return true iff basic block TEST_BB is valid for noce if-conversion.
+   The condition used in this if-conversion is in COND.
+   In practice, check that TEST_BB ends with a single set
+   x := a and all previous computations
+   in TEST_BB don't produce any values that are live after TEST_BB.
+   In other words, all the insns in TEST_BB are there only
+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  */
+
+static bool
+bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
+			      unsigned int *cost, bool *simple_p)
+{
+  if (!test_bb)
+    return false;
+
+  rtx_insn *last_insn = last_active_insn (test_bb, FALSE);
+  rtx last_set = NULL_RTX;
+
+  rtx cc = cc_in_cond (cond);
+
+  if (!insn_valid_noce_process_p (last_insn, cc))
+    return false;
+  last_set = single_set (last_insn);
+
+  rtx x = SET_DEST (last_set);
+  rtx_insn *first_insn = first_active_insn (test_bb);
+  rtx first_set = single_set (first_insn);
+
+  if (!first_set)
+    return false;
+
+  /* We have a single simple set, that's okay.  */
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
+    {
+      *simple_p = noce_operand_ok (SET_DEST (first_set));
+      *cost = insn_rtx_cost (first_set, speed_p);
+      return *simple_p;
+    }
+
+  rtx_insn *prev_last_insn = PREV_INSN (last_insn);
+  gcc_assert (prev_last_insn);
+
+  /* For now, disallow setting x multiple times in test_bb.  */
+  if (REG_P (x) && reg_set_between_p (x, first_insn, prev_last_insn))
+    return false;
+
+  bitmap test_bb_temps = BITMAP_ALLOC (&reg_obstack);
+
+  /* The regs that are live out of test_bb.  */
+  bitmap test_bb_live_out = df_get_live_out (test_bb);
+
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
+  rtx_insn *insn;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      if (insn != last_insn)
+	{
+	  if (!active_insn_p (insn))
+	    continue;
+
+	  if (!insn_valid_noce_process_p (insn, cc))
+	    goto free_bitmap_and_fail;
+
+	  rtx sset = single_set (insn);
+	  gcc_assert (sset);
+
+	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
+	    goto free_bitmap_and_fail;
+
+	  potential_cost += insn_rtx_cost (sset, speed_p);
+	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
+	}
+    }
+
+  /* If any of the intermediate results in test_bb are live after test_bb
+     then fail.  */
+  if (bitmap_intersect_p (test_bb_live_out, test_bb_temps))
+    goto free_bitmap_and_fail;
+
+  BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
+  return true;
+
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2539,7 +2820,6 @@ noce_process_if_block (struct noce_if_info *if_info)
   rtx_insn *insn_a, *insn_b;
   rtx set_a, set_b;
   rtx orig_x, x, a, b;
-  rtx cc;
 
   /* We're looking for patterns of the form
 
@@ -2548,15 +2828,23 @@ noce_process_if_block (struct noce_if_info *if_info)
      (3) if (...) x = a;   // as if with an initial x = x.
 
      The later patterns require jumps to be more expensive.
-
+     For the if (...) x = a; else x = b; case we allow multiple insns
+     inside the then and else blocks as long as their only effect is
+     to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  /* Look for one of the potential sets.  */
-  insn_a = first_active_insn (then_bb);
-  if (! insn_a
-      || insn_a != last_active_insn (then_bb, FALSE)
-      || (set_a = single_set (insn_a)) == NULL_RTX)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
+
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
+
+  insn_a = last_active_insn (then_bb, FALSE);
+  set_a = single_set (insn_a);
+  gcc_assert (set_a);
 
   x = SET_DEST (set_a);
   a = SET_SRC (set_a);
@@ -2571,11 +2859,11 @@ noce_process_if_block (struct noce_if_info *if_info)
   set_b = NULL_RTX;
   if (else_bb)
     {
-      insn_b = first_active_insn (else_bb);
-      if (! insn_b
-	  || insn_b != last_active_insn (else_bb, FALSE)
-	  || (set_b = single_set (insn_b)) == NULL_RTX
-	  || ! rtx_interchangeable_p (x, SET_DEST (set_b)))
+      insn_b = last_active_insn (else_bb, FALSE);
+      set_b = single_set (insn_b);
+      gcc_assert (set_b);
+
+      if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
 	return FALSE;
     }
   else
@@ -2651,20 +2939,14 @@ noce_process_if_block (struct noce_if_info *if_info)
   if_info->a = a;
   if_info->b = b;
 
-  /* Skip it if the instruction to be moved might clobber CC.  */
-  cc = cc_in_cond (cond);
-  if (cc
-      && (set_of (cc, insn_a)
-	  || (insn_b && set_of (cc, insn_b))))
-    return FALSE;
-
   /* Try optimizations in some approximation of a useful order.  */
   /* ??? Should first look to see if X is live incoming at all.  If it
      isn't, we don't need anything but an unconditional set.  */
 
   /* Look and see if A and B are really the same.  Avoid creating silly
      cmove constructs that no one will fix up later.  */
-  if (rtx_interchangeable_p (a, b))
+  if (noce_simple_bbs (if_info)
+      && rtx_interchangeable_p (a, b))
     {
       /* If we have an INSN_B, we don't have to create any new rtl.  Just
 	 move the instruction that we already have.  If we don't have an
diff --git a/gcc/testsuite/gcc.dg/ifcvt-1.c b/gcc/testsuite/gcc.dg/ifcvt-1.c
new file mode 100644
index 0000000..92bc17a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+foo (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
new file mode 100644
index 0000000..e0e1728
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint16_t;
+
+uint8_t
+_xtime (const uint8_t byte, const uint16_t generator)
+{
+  if (byte & 0x80)
+    return byte ^ generator;
+  else
+    return byte << 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
new file mode 100644
index 0000000..2e104a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* }  } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+typedef long long s64;
+
+int
+foo (s64 a, s64 b, s64 c)
+{
+ s64 d = a - b;
+
+  if (d == 0)
+    return a + c;
+  else
+    return b + d + c;
+}
+
+/* This test can be reduced to just return a + c;  */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13 14:03     ` Kyrill Tkachov
@ 2015-07-20 10:59       ` Kyrill Tkachov
  2015-07-23 20:57       ` Jeff Law
  1 sibling, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-20 10:59 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: GCC Patches

Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01047.html

The go testsuite passes for me on x86_64-unknown-linux-gnu for me.
A third data point on testing would be appreciated...

Thanks,
Kyrill

On 13/07/15 15:03, Kyrill Tkachov wrote:
> Hi Bernhard,
>
> On 13/07/15 10:45, Kyrill Tkachov wrote:
>>> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
>>> PPS: attached meant to illustrate comments above. Untested.
>> Thanks a lot! This is all very helpful.
>> I'll respin the patch.
> Here it is. I've expanded the comments in the functions you mentioned,
> moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed
> the types of the costs used to unsigned int.
>
>
> Bootstrapped on aarch64 and x86_64.
> The go testsuite passes on x86_64-unknown-linux-gnu for me...
>
> Thanks,
> Kyrill
>
> 2015-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
> 	* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
> 	then_cost, else_cost fields.  Change branch_cost field to unsigned int.
> 	(end_ifcvt_sequence): Call set_used_flags on each insn in the
> 	sequence.
> 	(noce_simple_bbs): New function.
> 	(noce_try_move): Bail if basic blocks are not simple.
> 	(noce_try_store_flag): Likewise.
> 	(noce_try_store_flag_constants): Likewise.
> 	(noce_try_addcc): Likewise.
> 	(noce_try_store_flag_mask): Likewise.
> 	(noce_try_cmove): Likewise.
> 	(noce_try_minmax): Likewise.
> 	(noce_try_abs): Likewise.
> 	(noce_try_sign_mask): Likewise.
> 	(noce_try_bitop): Likewise.
> 	(bbs_ok_for_cmove_arith): New function.
> 	(noce_emit_all_but_last): Likewise.
> 	(noce_emit_insn): Likewise.
> 	(noce_emit_bb): Likewise.
> 	(noce_try_cmove_arith): Handle non-simple basic blocks.
> 	(insn_valid_noce_process_p): New function.
> 	(bb_valid_for_noce_process_p): Likewise.
> 	(noce_process_if_block): Allow non-simple basic blocks
> 	where appropriate.
>
>
> 2015-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
> 	* gcc.dg/ifcvt-1.c: New test.
> 	* gcc.dg/ifcvt-2.c: Likewise.
> 	* gcc.dg/ifcvt-3.c: Likewise.
>
>
>
>
>> Thanks,
>> Kyrill
>>
>>
>>> cheers,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-13 14:03     ` Kyrill Tkachov
  2015-07-20 10:59       ` Kyrill Tkachov
@ 2015-07-23 20:57       ` Jeff Law
  2015-07-24  9:44         ` Kyrill Tkachov
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-07-23 20:57 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernhard Reutner-Fischer; +Cc: GCC Patches

On 07/13/2015 08:03 AM, Kyrill Tkachov wrote:
>
> 2015-07-13  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>      then_cost, else_cost fields.  Change branch_cost field to unsigned
> int.
>      (end_ifcvt_sequence): Call set_used_flags on each insn in the
>      sequence.
>      (noce_simple_bbs): New function.
>      (noce_try_move): Bail if basic blocks are not simple.
>      (noce_try_store_flag): Likewise.
>      (noce_try_store_flag_constants): Likewise.
>      (noce_try_addcc): Likewise.
>      (noce_try_store_flag_mask): Likewise.
>      (noce_try_cmove): Likewise.
>      (noce_try_minmax): Likewise.
>      (noce_try_abs): Likewise.
>      (noce_try_sign_mask): Likewise.
>      (noce_try_bitop): Likewise.
>      (bbs_ok_for_cmove_arith): New function.
>      (noce_emit_all_but_last): Likewise.
>      (noce_emit_insn): Likewise.
>      (noce_emit_bb): Likewise.
>      (noce_try_cmove_arith): Handle non-simple basic blocks.
>      (insn_valid_noce_process_p): New function.
>      (bb_valid_for_noce_process_p): Likewise.
>      (noce_process_if_block): Allow non-simple basic blocks
>      where appropriate.
>
>
> 2015-07-13  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.dg/ifcvt-1.c: New test.
>      * gcc.dg/ifcvt-2.c: Likewise.
>      * gcc.dg/ifcvt-3.c: Likewise.
>
>
>
>
>>
>> Thanks,
>> Kyrill
>>
>>
>>> cheers,
>
>
> ifcvt.patch
>
>
> commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Wed Jul 8 15:45:04 2015 +0100
>
>      [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 31849ee..2f0a228 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
>     return FALSE;
>   }
>
> +/* Return true iff the registers that the insns in BB_A set do not
> +   get used in BB_B.  */
> +
> +static bool
> +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
> +{
> +  rtx_insn *a_insn;
> +  FOR_BB_INSNS (bb_a, a_insn)
> +    {
> +      if (!active_insn_p (a_insn))
> +	continue;
> +
> +      rtx sset_a = single_set (a_insn);
> +
> +      if (!sset_a)
> +	return false;
> +
> +      rtx dest_reg = SET_DEST (sset_a);
> +      rtx_insn *b_insn;
> +
> +      FOR_BB_INSNS (bb_b, b_insn)
> +	{
> +	  if (!active_insn_p (b_insn))
> +	    continue;
> +
> +	  rtx sset_b = single_set (b_insn);
> +
> +	  if (!sset_b)
> +	    return false;
> +
> +	  if (reg_referenced_p (dest_reg, sset_b))
> +	    return false;
> +	}
> +    }
> +
> +  return true;
> +}
Doesn't this have the potential to get very expensive since you end up 
looking at every insn in BB_B for every insn in BB_A?

Wouldn't it be better to walk BB_A, gathering the set of all the 
registers modified, then do a single walk through BB testing for uses of 
those registers?

Don't you have to be careful with MEMs in both blocks?



> +
> +/* Emit copies of all the active instructions in BB except the last.
> +   This is a helper for noce_try_cmove_arith.  */
> +
> +static void
> +noce_emit_all_but_last (basic_block bb)
> +{
> +  rtx_insn *last = last_active_insn (bb, FALSE);
> +  rtx_insn *insn;
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      if (insn != last && active_insn_p (insn))
> +	{
> +	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
> +
> +	  emit_insn (PATTERN (to_emit));
> +	}
> +    }
> +}
Won't this create invalid RTL sharing?  RTL has strict rules about nodes 
can and can not be shared and I'm pretty sure this blindly shares 
everything.

Now we may get away with that because you're going to delete all the 
insns from BB.  But that begs the question why not just move the insns 
from BB to their new location rather than re-emiting them?




> +
> +/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
> +   the resulting insn or NULL if it's not a valid insn.  */
> +
> +static rtx_insn *
> +noce_emit_insn (rtx to_emit)
> +{
> +  gcc_assert (to_emit);
> +  rtx_insn *insn = emit_insn (to_emit);
> +
> +  if (recog_memoized (insn) < 0)
> +    return NULL;
> +
> +  return insn;
> +}
> +
> +/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
> +   and including the penultimate one in BB if it is not simple
> +   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
> +   insn in the block.  The reason for that is that LAST_INSN may
> +   have been modified by the preparation in noce_try_cmove_arith.  */
> +
> +static bool
> +noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
> +{
> +  if (bb && !simple)
> +    noce_emit_all_but_last (bb);
> +
> +  if (last_insn && !noce_emit_insn (last_insn))
> +    return false;
> +
> +  return true;
> +}
Under what conditions can noce_emit_insn fail and what happens to the 
insn stream if it does?  It seems to me like the insn stream would be 
bogus and we should stop compilation.  Which argues that rather than 
returning a bool, we should just assert that the insn is memoized and 
remove the check in noce_emit_bb.

Or is it the case that we're emitting onto a sequence that we can just 
throw away in the event of a failure?


What is the meaning of the return value from noce_emit_bb?

> +
> +  /* We're going to execute one of the basic blocks anyway, so
> +     bail out if the most expensive of the two blocks is unacceptable.  */
> +  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
> +    return FALSE;
I believe you or someone else mentioned that this may not be the right 
heuristic since you're paying for one arm regardless.  But we can refine 
as necessary.

  +  FOR_BB_INSNS (test_bb, insn)
> +    {
> +      if (insn != last_insn)
> +	{
> +	  if (!active_insn_p (insn))
> +	    continue;
> +
> +	  if (!insn_valid_noce_process_p (insn, cc))
> +	    goto free_bitmap_and_fail;
> +
> +	  rtx sset = single_set (insn);
> +	  gcc_assert (sset);
> +
> +	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
> +	    goto free_bitmap_and_fail;
So don't you need to look deeper into SET_SRC to see if it's a mem?   It 
might be something like (plus (reg) (mem) no?

Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-23 20:57       ` Jeff Law
@ 2015-07-24  9:44         ` Kyrill Tkachov
  2015-07-24 18:44           ` Jeff Law
  0 siblings, 1 reply; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-24  9:44 UTC (permalink / raw)
  To: Jeff Law, Bernhard Reutner-Fischer; +Cc: GCC Patches


On 23/07/15 21:38, Jeff Law wrote:
> On 07/13/2015 08:03 AM, Kyrill Tkachov wrote:
>> 2015-07-13  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>>       then_cost, else_cost fields.  Change branch_cost field to unsigned
>> int.
>>       (end_ifcvt_sequence): Call set_used_flags on each insn in the
>>       sequence.
>>       (noce_simple_bbs): New function.
>>       (noce_try_move): Bail if basic blocks are not simple.
>>       (noce_try_store_flag): Likewise.
>>       (noce_try_store_flag_constants): Likewise.
>>       (noce_try_addcc): Likewise.
>>       (noce_try_store_flag_mask): Likewise.
>>       (noce_try_cmove): Likewise.
>>       (noce_try_minmax): Likewise.
>>       (noce_try_abs): Likewise.
>>       (noce_try_sign_mask): Likewise.
>>       (noce_try_bitop): Likewise.
>>       (bbs_ok_for_cmove_arith): New function.
>>       (noce_emit_all_but_last): Likewise.
>>       (noce_emit_insn): Likewise.
>>       (noce_emit_bb): Likewise.
>>       (noce_try_cmove_arith): Handle non-simple basic blocks.
>>       (insn_valid_noce_process_p): New function.
>>       (bb_valid_for_noce_process_p): Likewise.
>>       (noce_process_if_block): Allow non-simple basic blocks
>>       where appropriate.
>>
>>
>> 2015-07-13  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * gcc.dg/ifcvt-1.c: New test.
>>       * gcc.dg/ifcvt-2.c: Likewise.
>>       * gcc.dg/ifcvt-3.c: Likewise.
>>
>>
>>
>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>> cheers,
>>
>> ifcvt.patch
>>
>>
>> commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>> Date:   Wed Jul 8 15:45:04 2015 +0100
>>
>>       [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive
>>
>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>> index 31849ee..2f0a228 100644
>> --- a/gcc/ifcvt.c
>> +++ b/gcc/ifcvt.c
>> @@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
>>      return FALSE;
>>    }
>>
>> +/* Return true iff the registers that the insns in BB_A set do not
>> +   get used in BB_B.  */
>> +
>> +static bool
>> +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
>> +{
>> +  rtx_insn *a_insn;
>> +  FOR_BB_INSNS (bb_a, a_insn)
>> +    {
>> +      if (!active_insn_p (a_insn))
>> +	continue;
>> +
>> +      rtx sset_a = single_set (a_insn);
>> +
>> +      if (!sset_a)
>> +	return false;
>> +
>> +      rtx dest_reg = SET_DEST (sset_a);
>> +      rtx_insn *b_insn;
>> +
>> +      FOR_BB_INSNS (bb_b, b_insn)
>> +	{
>> +	  if (!active_insn_p (b_insn))
>> +	    continue;
>> +
>> +	  rtx sset_b = single_set (b_insn);
>> +
>> +	  if (!sset_b)
>> +	    return false;
>> +
>> +	  if (reg_referenced_p (dest_reg, sset_b))
>> +	    return false;
>> +	}
>> +    }
>> +
>> +  return true;
>> +}
> Doesn't this have the potential to get very expensive since you end up
> looking at every insn in BB_B for every insn in BB_A?

Yes, the saving grace is that bbs_ok_for_cmove_arith is called after the costs checks,
so that the costs in practice reject blocks more than a few instructions long.

>
> Wouldn't it be better to walk BB_A, gathering the set of all the
> registers modified, then do a single walk through BB testing for uses of
> those registers?

I think so, yes. I'll try that.

> Don't you have to be careful with MEMs in both blocks?

bb_valid_for_noce_process_p called earlier in the chain should have
rejected memory operands already.

>
>
>
>> +
>> +/* Emit copies of all the active instructions in BB except the last.
>> +   This is a helper for noce_try_cmove_arith.  */
>> +
>> +static void
>> +noce_emit_all_but_last (basic_block bb)
>> +{
>> +  rtx_insn *last = last_active_insn (bb, FALSE);
>> +  rtx_insn *insn;
>> +  FOR_BB_INSNS (bb, insn)
>> +    {
>> +      if (insn != last && active_insn_p (insn))
>> +	{
>> +	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
>> +
>> +	  emit_insn (PATTERN (to_emit));
>> +	}
>> +    }
>> +}
> Won't this create invalid RTL sharing?  RTL has strict rules about nodes
> can and can not be shared and I'm pretty sure this blindly shares
> everything.

All the if-conversion functions end up calling end_ifcvt_sequence
at the end which seems to unshare the stuff that needs to be unshared.
Is that not enough? (I don't have much experience with this part)

>
> Now we may get away with that because you're going to delete all the
> insns from BB.  But that begs the question why not just move the insns
> from BB to their new location rather than re-emiting them?

I'll try it out.


>
>
>
>> +
>> +/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
>> +   the resulting insn or NULL if it's not a valid insn.  */
>> +
>> +static rtx_insn *
>> +noce_emit_insn (rtx to_emit)
>> +{
>> +  gcc_assert (to_emit);
>> +  rtx_insn *insn = emit_insn (to_emit);
>> +
>> +  if (recog_memoized (insn) < 0)
>> +    return NULL;
>> +
>> +  return insn;
>> +}
>> +
>> +/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
>> +   and including the penultimate one in BB if it is not simple
>> +   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
>> +   insn in the block.  The reason for that is that LAST_INSN may
>> +   have been modified by the preparation in noce_try_cmove_arith.  */
>> +
>> +static bool
>> +noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
>> +{
>> +  if (bb && !simple)
>> +    noce_emit_all_but_last (bb);
>> +
>> +  if (last_insn && !noce_emit_insn (last_insn))
>> +    return false;
>> +
>> +  return true;
>> +}
> Under what conditions can noce_emit_insn fail and what happens to the
> insn stream if it does?  It seems to me like the insn stream would be
> bogus and we should stop compilation.  Which argues that rather than
> returning a bool, we should just assert that the insn is memoized and
> remove the check in noce_emit_bb.
>
> Or is it the case that we're emitting onto a sequence that we can just
> throw away in the event of a failure?

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have
not seen it cause any trouble. If it fails then back in
noce_try_cmove_arith we goto end_seq_and_fail which ends
the sequence and throws it away (and cancels if-conversion
down that path), so it should be safe.

>
> What is the meaning of the return value from noce_emit_bb?
>
>> +
>> +  /* We're going to execute one of the basic blocks anyway, so
>> +     bail out if the most expensive of the two blocks is unacceptable.  */
>> +  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
>> +    return FALSE;
> I believe you or someone else mentioned that this may not be the right
> heuristic since you're paying for one arm regardless.  But we can refine
> as necessary.

The current code compares the sum of both the arms against the
branch cost. Now that I think about it a bit more, I think the most
  accurate heuristic would be to compare
the cost of both blocks (in the event we do if-conversion) against the
cost of doing the branch plus the cost of the block that is predicted to
be taken, but we can't get that information accurately, I think.
But you're right, it's easy to implement the heuristic once we decide on one.

>
>    +  FOR_BB_INSNS (test_bb, insn)
>> +    {
>> +      if (insn != last_insn)
>> +	{
>> +	  if (!active_insn_p (insn))
>> +	    continue;
>> +
>> +	  if (!insn_valid_noce_process_p (insn, cc))
>> +	    goto free_bitmap_and_fail;
>> +
>> +	  rtx sset = single_set (insn);
>> +	  gcc_assert (sset);
>> +
>> +	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
>> +	    goto free_bitmap_and_fail;
> So don't you need to look deeper into SET_SRC to see if it's a mem?   It
> might be something like (plus (reg) (mem) no?

I think you're right (though this didn't cause me any problems in testing or benchmarking).
I'll recurse into it looking for a MEM.

Thanks for the feedback, I'll respin the patch.

Kyrill


>
> Jeff
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-24  9:44         ` Kyrill Tkachov
@ 2015-07-24 18:44           ` Jeff Law
  2015-07-27 10:30             ` Kyrill Tkachov
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-07-24 18:44 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernhard Reutner-Fischer; +Cc: GCC Patches

On 07/24/2015 03:31 AM, Kyrill Tkachov wrote:

>> Wouldn't it be better to walk BB_A, gathering the set of all the
>> registers modified, then do a single walk through BB testing for uses of
>> those registers?
>
> I think so, yes. I'll try that.
You might look at resource.c -- I haven't looked at it in a long time, 
but it might have many of the interfaces you need to do this kind of 
book keeping.

>
>> Don't you have to be careful with MEMs in both blocks?
>
> bb_valid_for_noce_process_p called earlier in the chain should have
> rejected memory operands already.
>
>>
>>
>>

>>
>>
>>
>>> +
>>> +/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and
>>> return
>>> +   the resulting insn or NULL if it's not a valid insn.  */
>>> +
>>> +static rtx_insn *
>>> +noce_emit_insn (rtx to_emit)
>>> +{
>>> +  gcc_assert (to_emit);
>>> +  rtx_insn *insn = emit_insn (to_emit);
>>> +
>>> +  if (recog_memoized (insn) < 0)
>>> +    return NULL;
>>> +
>>> +  return insn;
>>> +}
>>> +
>>> +/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
>>> +   and including the penultimate one in BB if it is not simple
>>> +   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
>>> +   insn in the block.  The reason for that is that LAST_INSN may
>>> +   have been modified by the preparation in noce_try_cmove_arith.  */
>>> +
>>> +static bool
>>> +noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
>>> +{
>>> +  if (bb && !simple)
>>> +    noce_emit_all_but_last (bb);
>>> +
>>> +  if (last_insn && !noce_emit_insn (last_insn))
>>> +    return false;
>>> +
>>> +  return true;
>>> +}
>> Under what conditions can noce_emit_insn fail and what happens to the
>> insn stream if it does?  It seems to me like the insn stream would be
>> bogus and we should stop compilation.  Which argues that rather than
>> returning a bool, we should just assert that the insn is memoized and
>> remove the check in noce_emit_bb.
>>
>> Or is it the case that we're emitting onto a sequence that we can just
>> throw away in the event of a failure?
>
> It fails when the last insn is not recognised, because
> noce_try_cmove_arith can modify the last insn, but I have
> not seen it cause any trouble. If it fails then back in
> noce_try_cmove_arith we goto end_seq_and_fail which ends
> the sequence and throws it away (and cancels if-conversion
> down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not fail, 
but it seems that was a bad assumption on my part.    So given 
noce_try_cmove_arith can change the last insn and make it 
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

2. Investigate a more efficient means to find set/use conflicts
    between the two blocks, possibly using resource.[ch]

jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-24 18:44           ` Jeff Law
@ 2015-07-27 10:30             ` Kyrill Tkachov
  2015-07-27 16:14               ` Jeff Law
  0 siblings, 1 reply; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-27 10:30 UTC (permalink / raw)
  To: Jeff Law, Bernhard Reutner-Fischer; +Cc: GCC Patches

Hi Jeff,

On 24/07/15 19:43, Jeff Law wrote:
> On 07/24/2015 03:31 AM, Kyrill Tkachov wrote:
>
>>> Wouldn't it be better to walk BB_A, gathering the set of all the
>>> registers modified, then do a single walk through BB testing for uses of
>>> those registers?
>> I think so, yes. I'll try that.
> You might look at resource.c -- I haven't looked at it in a long time,
> but it might have many of the interfaces you need to do this kind of
> book keeping.

I experimented with resource.c and the roadblock I hit is that it seems
to have an assumption that it operates on hard regs (in fact the struct it uses to
describe the resources has a HARD_REG_SET for the regs) and so it triggers various
HARD_REGISTER_P asserts when I try to use the functions there.
if-conversion runs before register allocation, so we're dealing with pseudos here.

My other attempt was to go over BB_A and mark the set registers in a bitmap then
go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is
a reg that is set in the bitmap from BB_A we return false. This seemed to do the job
and testing worked out ok. That would require one walk over BB_A, one walk over BB_B
but I don't know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

<snip>
> It fails when the last insn is not recognised, because
> noce_try_cmove_arith can modify the last insn, but I have
> not seen it cause any trouble. If it fails then back in
> noce_try_cmove_arith we goto end_seq_and_fail which ends
> the sequence and throws it away (and cancels if-conversion
> down that path), so it should be safe.
> OK, I was working for the assumption that memoization ought not fail,
> but it seems that was a bad assumption on my part.    So given
> noce_try_cmove_arith can change the last insn and make it
> no-recognizable this code seems reasoanble.
>
> So I think the only outstanding issues are:
>
> 1. Investigate moving rather than re-emitting insns.

I'll look into that, but what is the machinery by which one moves insns?

Thanks,
Kyrill

>
> 2. Investigate a more efficient means to find set/use conflicts
>      between the two blocks, possibly using resource.[ch]
>
> jeff
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-27 10:30             ` Kyrill Tkachov
@ 2015-07-27 16:14               ` Jeff Law
  2015-07-27 17:54                 ` Kyrill Tkachov
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-07-27 16:14 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernhard Reutner-Fischer; +Cc: GCC Patches

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:
> I experimented with resource.c and the roadblock I hit is that it
> seems to have an assumption that it operates on hard regs (in fact
> the struct it uses to describe the resources has a HARD_REG_SET for
> the regs) and so it triggers various HARD_REGISTER_P asserts when I
> try to use the functions there. if-conversion runs before register
> allocation, so we're dealing with pseudos here.
Sigh.  resource.c probably isn't going to be useful then.

> My other attempt was to go over BB_A and mark the set registers in a
>  bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
> each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
> we return false. This seemed to do the job and testing worked out ok.
> That would require one walk over BB_A, one walk over BB_B but I don't
> know how expensive FOR_EACH_SUBRTX walks are...
>
> Would that be an acceptable solution?
I think the latter is reasonable.  Ultimately we have to do a full look 
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not 
sure if it's even initialized for the ifcvt code.  If it is, then you 
might be able to avoid some of the walking of the insns and instead walk 
the DF structures.


>
> <snip>
>> It fails when the last insn is not recognised, because
>> noce_try_cmove_arith can modify the last insn, but I have not seen
>> it cause any trouble. If it fails then back in noce_try_cmove_arith
>> we goto end_seq_and_fail which ends the sequence and throws it away
>> (and cancels if-conversion down that path), so it should be safe.
>> OK, I was working for the assumption that memoization ought not
>> fail, but it seems that was a bad assumption on my part.    So
>> given noce_try_cmove_arith can change the last insn and make it
>> no-recognizable this code seems reasoanble.
>>
>> So I think the only outstanding issues are:
>>
>> 1. Investigate moving rather than re-emitting insns.
>
> I'll look into that, but what is the machinery by which one moves
> insns?
I don't think we have any good generic machinery for this.  I think 
every pass that needs this capability unlinks the insn from the chain 
and patches it back in at the new location.

jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-27 16:14               ` Jeff Law
@ 2015-07-27 17:54                 ` Kyrill Tkachov
  2015-07-28 10:21                   ` Kyrill Tkachov
  2015-07-31 17:05                   ` Jeff Law
  0 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-27 17:54 UTC (permalink / raw)
  To: Jeff Law, Bernhard Reutner-Fischer; +Cc: GCC Patches


On 27/07/15 17:09, Jeff Law wrote:
> On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:
>> I experimented with resource.c and the roadblock I hit is that it
>> seems to have an assumption that it operates on hard regs (in fact
>> the struct it uses to describe the resources has a HARD_REG_SET for
>> the regs) and so it triggers various HARD_REGISTER_P asserts when I
>> try to use the functions there. if-conversion runs before register
>> allocation, so we're dealing with pseudos here.
> Sigh.  resource.c probably isn't going to be useful then.
>
>> My other attempt was to go over BB_A and mark the set registers in a
>>   bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
>> each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
>> we return false. This seemed to do the job and testing worked out ok.
>> That would require one walk over BB_A, one walk over BB_B but I don't
>> know how expensive FOR_EACH_SUBRTX walks are...
>>
>> Would that be an acceptable solution?
> I think the latter is reasonable.  Ultimately we have to do a full look
> at those rtxs, so it's unavoidable to some extent.
>
> The only other possibility would be to use the DF framework.  I'm not
> sure if it's even initialized for the ifcvt code.  If it is, then you
> might be able to avoid some of the walking of the insns and instead walk
> the DF structures.

I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?


>
>
>> <snip>
>>> It fails when the last insn is not recognised, because
>>> noce_try_cmove_arith can modify the last insn, but I have not seen
>>> it cause any trouble. If it fails then back in noce_try_cmove_arith
>>> we goto end_seq_and_fail which ends the sequence and throws it away
>>> (and cancels if-conversion down that path), so it should be safe.
>>> OK, I was working for the assumption that memoization ought not
>>> fail, but it seems that was a bad assumption on my part.    So
>>> given noce_try_cmove_arith can change the last insn and make it
>>> no-recognizable this code seems reasoanble.
>>>
>>> So I think the only outstanding issues are:
>>>
>>> 1. Investigate moving rather than re-emitting insns.
>> I'll look into that, but what is the machinery by which one moves
>> insns?
> I don't think we have any good generic machinery for this.  I think
> every pass that needs this capability unlinks the insn from the chain
> and patches it back in at the new location.

That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
  noce_process_if_block to handle differently
  this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.

Thanks,
Kyrill

>
> jeff
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-27 17:54                 ` Kyrill Tkachov
@ 2015-07-28 10:21                   ` Kyrill Tkachov
  2015-07-31 18:08                     ` Jeff Law
  2015-07-31 17:05                   ` Jeff Law
  1 sibling, 1 reply; 31+ messages in thread
From: Kyrill Tkachov @ 2015-07-28 10:21 UTC (permalink / raw)
  To: Jeff Law, Bernhard Reutner-Fischer; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5698 bytes --]

Hi Jeff,

On 27/07/15 17:40, Kyrill Tkachov wrote:
> On 27/07/15 17:09, Jeff Law wrote:
>> On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:
>>> I experimented with resource.c and the roadblock I hit is that it
>>> seems to have an assumption that it operates on hard regs (in fact
>>> the struct it uses to describe the resources has a HARD_REG_SET for
>>> the regs) and so it triggers various HARD_REGISTER_P asserts when I
>>> try to use the functions there. if-conversion runs before register
>>> allocation, so we're dealing with pseudos here.
>> Sigh.  resource.c probably isn't going to be useful then.
>>
>>> My other attempt was to go over BB_A and mark the set registers in a
>>>    bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
>>> each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
>>> we return false. This seemed to do the job and testing worked out ok.
>>> That would require one walk over BB_A, one walk over BB_B but I don't
>>> know how expensive FOR_EACH_SUBRTX walks are...
>>>
>>> Would that be an acceptable solution?
>> I think the latter is reasonable.  Ultimately we have to do a full look
>> at those rtxs, so it's unavoidable to some extent.
>>
>> The only other possibility would be to use the DF framework.  I'm not
>> sure if it's even initialized for the ifcvt code.  If it is, then you
>> might be able to avoid some of the walking of the insns and instead walk
>> the DF structures.
> I think it is initialized (I look at df_get_live_out earlier on
> in the call chain). I suppose what we want is for the live in regs for BB_B
> to not include any of the set regs in BB_A?
>
>
>>
>>> <snip>
>>>> It fails when the last insn is not recognised, because
>>>> noce_try_cmove_arith can modify the last insn, but I have not seen
>>>> it cause any trouble. If it fails then back in noce_try_cmove_arith
>>>> we goto end_seq_and_fail which ends the sequence and throws it away
>>>> (and cancels if-conversion down that path), so it should be safe.
>>>> OK, I was working for the assumption that memoization ought not
>>>> fail, but it seems that was a bad assumption on my part.    So
>>>> given noce_try_cmove_arith can change the last insn and make it
>>>> no-recognizable this code seems reasoanble.
>>>>
>>>> So I think the only outstanding issues are:
>>>>
>>>> 1. Investigate moving rather than re-emitting insns.
>>> I'll look into that, but what is the machinery by which one moves
>>> insns?
>> I don't think we have any good generic machinery for this.  I think
>> every pass that needs this capability unlinks the insn from the chain
>> and patches it back in at the new location.
> That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?
>
> The current way the top-level noce_process_if_block is structured
> it expects the various ifcvt functions (like noce_try_cmove_arith)
> to generate a sequence, then it takes it, unshares it and removes
> the empty basic blocks.
>
> If we're to instead move insns around we'd need to further modify
>    noce_process_if_block to handle differently
>    this one case where we move insns instead of re-emitting them.
> I think this would make that function more convoluted than it needs to be.
> With the current approach we always call unshare_all_rtl_in_chain on the
> emitted sequence which should take care of any RTL sharing issues and in
> practice I don't expect to have more than 3-4 insns in these sequences since
> they will be guarded by the branch cost.
>
> So I would rather argue for re-emitting insns in this case to keep consistent
> with the dozen or so similar functions in ifcvt.c that already work that way.

Here's a respin.
I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
the set registers then go over BB_B once and look inside the SET_SRC
of each insn for those registers. How does this look? Would you like
me to investigate the data-flow infrastructure approach?

Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
looking for a MEM with the FOR_EACH_SUBRTX machinery.

As I said above, I think moving the insns rather than re-emitting them
would make the function more convoluted than I think it needs to be.

Bootstrapped and tested on arm, aarch64, x86_64.



2015-07-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
     then_cost, else_cost fields.  Change branch_cost field to unsigned int.
     (end_ifcvt_sequence): Call set_used_flags on each insn in the
     sequence.
     (noce_simple_bbs): New function.
     (noce_try_move): Bail if basic blocks are not simple.
     (noce_try_store_flag): Likewise.
     (noce_try_store_flag_constants): Likewise.
     (noce_try_addcc): Likewise.
     (noce_try_store_flag_mask): Likewise.
     (noce_try_cmove): Likewise.
     (noce_try_minmax): Likewise.
     (noce_try_abs): Likewise.
     (noce_try_sign_mask): Likewise.
     (noce_try_bitop): Likewise.
     (bbs_ok_for_cmove_arith): New function.
     (noce_emit_all_but_last): Likewise.
     (noce_emit_insn): Likewise.
     (noce_emit_bb): Likewise.
     (noce_try_cmove_arith): Handle non-simple basic blocks.
     (insn_valid_noce_process_p): New function.
     (contains_mem_rtx_p): Likewise.
     (bb_valid_for_noce_process_p): Likewise.
     (noce_process_if_block): Allow non-simple basic blocks
     where appropriate.

2015-07-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/ifcvt-1.c: New test.
     * gcc.dg/ifcvt-2.c: Likewise.
     * gcc.dg/ifcvt-3.c: Likewise.
> Thanks,
> Kyrill
>
>> jeff
>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt.patch --]
[-- Type: text/x-patch; name=ifcvt.patch, Size: 21768 bytes --]

commit a02417f015b70a0e47170ac7c41a5569392085a5
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

    [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2d97cc5..dae9b41 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -53,6 +53,7 @@
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "shrink-wrap.h"
+#include "rtl-iter.h"
 #include "ifcvt.h"
 
 #ifndef HAVE_incscc
@@ -816,8 +817,17 @@ struct noce_if_info
      form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info->cond);
   set_used_flags (if_info->a);
   set_used_flags (if_info->b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info->then_simple)
+    return false;
+
+  if (if_info->else_bb)
+    return if_info->else_simple;
+
+  return true;
+}
+
 /* Convert "if (a != b) x = a; else x = b" into "x = a" and
    "if (a == b) x = a; else x = b" into "x = b".  */
 
@@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE && code != EQ)
     return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
      or a signed zero.  */
   if (HONOR_NANS (if_info->x)
@@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->b)
       && INTVAL (if_info->b) == STORE_FLAG_VALUE
       && if_info->a == const0_rtx)
@@ -1164,6 +1199,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   int normalize, can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->a)
       && CONST_INT_P (if_info->b))
     {
@@ -1292,6 +1330,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (GET_CODE (if_info->a) == PLUS
       && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
       && (reversed_comparison_code (if_info->cond, if_info->jump)
@@ -1383,6 +1424,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   reversep = 0;
   if ((if_info->branch_cost >= 2
        || STORE_FLAG_VALUE == -1)
@@ -1551,6 +1595,9 @@ noce_try_cmove (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
       && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
     {
@@ -1585,6 +1632,116 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
+
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+	continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      rtx dest_reg = SET_DEST (sset_a);
+      bitmap_set_bit (bba_sets, REGNO (dest_reg));
+    }
+
+  rtx_insn *b_insn;
+  subrtx_iterator::array_type array;
+
+  FOR_BB_INSNS (bb_b, b_insn)
+    {
+      if (!active_insn_p (b_insn))
+	continue;
+
+      rtx sset_b = single_set (b_insn);
+
+      if (!sset_b)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      FOR_EACH_SUBRTX (iter, array, SET_SRC (sset_b), NONCONST)
+	{
+	  const_rtx x = *iter;
+	  if (REG_P (x) && bitmap_bit_p (bba_sets, REGNO (x)))
+	    {
+	      BITMAP_FREE (bba_sets);
+	      return true;
+	    }
+	}
+
+    }
+
+  BITMAP_FREE (bba_sets);
+  return true;
+}
+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+	{
+	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+	  emit_insn (PATTERN (to_emit));
+	}
+    }
+}
+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -1595,9 +1752,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx x = if_info->x;
   rtx orig_a, orig_b;
   rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
   rtx target;
   int is_mem = 0;
-  int insn_cost;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
 
@@ -1636,27 +1796,22 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  /* Total insn_rtx_cost should be smaller than branch cost.  Exit
-     if insn_rtx_cost can't be estimated.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
   if (insn_a)
-    {
-      insn_cost
-	= insn_rtx_cost (PATTERN (insn_a),
-      			 optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_a)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-	return FALSE;
-    }
+    then_cost = if_info->then_cost;
   else
-    insn_cost = 0;
+    then_cost = 0;
 
   if (insn_b)
-    {
-      insn_cost
-	+= insn_rtx_cost (PATTERN (insn_b),
-      			  optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_b)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-        return FALSE;
-    }
+    else_cost = if_info->else_cost;
+  else
+    else_cost = 0;
+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -1672,26 +1827,36 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  code = reversed_comparison_code (if_info->cond, if_info->jump);
 	  std::swap (a, b);
 	  std::swap (insn_a, insn_b);
+	  std::swap (a_simple, b_simple);
+	  std::swap (then_bb, else_bb);
 	}
     }
 
+  if (!a_simple && then_bb && !b_simple && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+    return FALSE;
+
   start_sequence ();
 
   orig_a = a;
   orig_b = b;
 
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
      This is of course not possible in the IS_MEM case.  */
+
   if (! general_operand (a, GET_MODE (a)))
     {
-      rtx_insn *insn;
 
       if (is_mem)
 	{
 	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  insn = emit_insn (gen_rtx_SET (reg, a));
+	  emit_a = gen_rtx_SET (reg, a);
 	}
       else if (! insn_a)
 	goto end_seq_and_fail;
@@ -1701,49 +1866,58 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
 	  rtx set = single_set (copy_of_a);
 	  SET_DEST (set) = a;
-	  insn = emit_insn (PATTERN (copy_of_a));
+
+	  emit_a = PATTERN (copy_of_a);
 	}
-      if (recog_memoized (insn) < 0)
-	goto end_seq_and_fail;
     }
+
   if (! general_operand (b, GET_MODE (b)))
     {
-      rtx pat;
-      rtx_insn *last;
-      rtx_insn *new_insn;
-
       if (is_mem)
 	{
           rtx reg = gen_reg_rtx (GET_MODE (b));
-	  pat = gen_rtx_SET (reg, b);
+	  emit_b = gen_rtx_SET (reg, b);
 	}
       else if (! insn_b)
 	goto end_seq_and_fail;
       else
 	{
           b = gen_reg_rtx (GET_MODE (b));
-	  rtx_insn *copy_of_insn_b = as_a <rtx_insn *> (copy_rtx (insn_b));
-	  rtx set = single_set (copy_of_insn_b);
+	  rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b));
+	  rtx set = single_set (copy_of_b);
+
 	  SET_DEST (set) = b;
-	  pat = PATTERN (copy_of_insn_b);
+	  emit_b = PATTERN (copy_of_b);
 	}
+    }
 
-      /* If insn to set up A clobbers any registers B depends on, try to
-	 swap insn that sets up A with the one that sets up B.  If even
-	 that doesn't help, punt.  */
-      last = get_last_insn ();
-      if (last && modified_in_p (orig_b, last))
-	{
-	  new_insn = emit_insn_before (pat, get_insns ());
-	  if (modified_in_p (orig_a, new_insn))
-	    goto end_seq_and_fail;
-	}
-      else
-	new_insn = emit_insn (pat);
+    /* If insn to set up A clobbers any registers B depends on, try to
+       swap insn that sets up A with the one that sets up B.  If even
+       that doesn't help, punt.  */
 
-      if (recog_memoized (new_insn) < 0)
-	goto end_seq_and_fail;
-    }
+    if (emit_a && modified_in_p (orig_b, emit_a))
+      {
+	if (modified_in_p (orig_a, emit_b))
+	  goto end_seq_and_fail;
+
+	if (else_bb && !b_simple)
+	  {
+	    if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	      goto end_seq_and_fail;
+	  }
+
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+      }
+    else
+      {
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+
+	if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	  goto end_seq_and_fail;
+
+      }
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
@@ -1947,6 +2121,9 @@ noce_try_minmax (struct noce_if_info *if_info)
   enum rtx_code code, op;
   int unsignedp;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* ??? Reject modes with NaNs or signed zeros since we don't know how
      they will be resolved with an SMIN/SMAX.  It wouldn't be too hard
      to get the target to tell us...  */
@@ -2043,6 +2220,9 @@ noce_try_abs (struct noce_if_info *if_info)
   int negate;
   bool one_cmpl = false;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Reject modes with signed zeros.  */
   if (HONOR_SIGNED_ZEROS (if_info->x))
     return FALSE;
@@ -2191,6 +2371,9 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   enum rtx_code code;
   bool t_unconditional;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   cond = if_info->cond;
   code = GET_CODE (cond);
   m = XEXP (cond, 0);
@@ -2274,6 +2457,9 @@ noce_try_bitop (struct noce_if_info *if_info)
   cond = if_info->cond;
   code = GET_CODE (cond);
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Check for no else condition.  */
   if (! rtx_equal_p (x, if_info->b))
     return FALSE;
@@ -2524,6 +2710,137 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
   return false;
 }
 
+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+      || !NONJUMP_INSN_P (insn)
+      || (cc && set_of (cc, insn)))
+      return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+      || !noce_operand_ok (SET_DEST (sset))
+      || !noce_operand_ok (SET_SRC (sset)))
+    return false;
+
+  return true;
+}
+
+/* Return true if X contains a MEM subrtx.  */
+
+static bool
+contains_mem_rtx_p (rtx x)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+    if (MEM_P (*iter))
+      return true;
+
+  return false;
+}
+
+/* Return true iff basic block TEST_BB is valid for noce if-conversion.
+   The condition used in this if-conversion is in COND.
+   In practice, check that TEST_BB ends with a single set
+   x := a and all previous computations
+   in TEST_BB don't produce any values that are live after TEST_BB.
+   In other words, all the insns in TEST_BB are there only
+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  */
+
+static bool
+bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
+			      unsigned int *cost, bool *simple_p)
+{
+  if (!test_bb)
+    return false;
+
+  rtx_insn *last_insn = last_active_insn (test_bb, FALSE);
+  rtx last_set = NULL_RTX;
+
+  rtx cc = cc_in_cond (cond);
+
+  if (!insn_valid_noce_process_p (last_insn, cc))
+    return false;
+  last_set = single_set (last_insn);
+
+  rtx x = SET_DEST (last_set);
+  rtx_insn *first_insn = first_active_insn (test_bb);
+  rtx first_set = single_set (first_insn);
+
+  if (!first_set)
+    return false;
+
+  /* We have a single simple set, that's okay.  */
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
+    {
+      *simple_p = noce_operand_ok (SET_DEST (first_set));
+      *cost = insn_rtx_cost (first_set, speed_p);
+      return *simple_p;
+    }
+
+  rtx_insn *prev_last_insn = PREV_INSN (last_insn);
+  gcc_assert (prev_last_insn);
+
+  /* For now, disallow setting x multiple times in test_bb.  */
+  if (REG_P (x) && reg_set_between_p (x, first_insn, prev_last_insn))
+    return false;
+
+  bitmap test_bb_temps = BITMAP_ALLOC (&reg_obstack);
+
+  /* The regs that are live out of test_bb.  */
+  bitmap test_bb_live_out = df_get_live_out (test_bb);
+
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
+  rtx_insn *insn;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      if (insn != last_insn)
+	{
+	  if (!active_insn_p (insn))
+	    continue;
+
+	  if (!insn_valid_noce_process_p (insn, cc))
+	    goto free_bitmap_and_fail;
+
+	  rtx sset = single_set (insn);
+	  gcc_assert (sset);
+
+	  if (contains_mem_rtx_p (SET_SRC (sset))
+	      || contains_mem_rtx_p (SET_DEST (sset)))
+	    goto free_bitmap_and_fail;
+
+	  potential_cost += insn_rtx_cost (sset, speed_p);
+	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
+	}
+    }
+
+  /* If any of the intermediate results in test_bb are live after test_bb
+     then fail.  */
+  if (bitmap_intersect_p (test_bb_live_out, test_bb_temps))
+    goto free_bitmap_and_fail;
+
+  BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
+  return true;
+
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2540,7 +2857,6 @@ noce_process_if_block (struct noce_if_info *if_info)
   rtx_insn *insn_a, *insn_b;
   rtx set_a, set_b;
   rtx orig_x, x, a, b;
-  rtx cc;
 
   /* We're looking for patterns of the form
 
@@ -2549,15 +2865,23 @@ noce_process_if_block (struct noce_if_info *if_info)
      (3) if (...) x = a;   // as if with an initial x = x.
 
      The later patterns require jumps to be more expensive.
-
+     For the if (...) x = a; else x = b; case we allow multiple insns
+     inside the then and else blocks as long as their only effect is
+     to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  /* Look for one of the potential sets.  */
-  insn_a = first_active_insn (then_bb);
-  if (! insn_a
-      || insn_a != last_active_insn (then_bb, FALSE)
-      || (set_a = single_set (insn_a)) == NULL_RTX)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
+
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
+
+  insn_a = last_active_insn (then_bb, FALSE);
+  set_a = single_set (insn_a);
+  gcc_assert (set_a);
 
   x = SET_DEST (set_a);
   a = SET_SRC (set_a);
@@ -2572,11 +2896,11 @@ noce_process_if_block (struct noce_if_info *if_info)
   set_b = NULL_RTX;
   if (else_bb)
     {
-      insn_b = first_active_insn (else_bb);
-      if (! insn_b
-	  || insn_b != last_active_insn (else_bb, FALSE)
-	  || (set_b = single_set (insn_b)) == NULL_RTX
-	  || ! rtx_interchangeable_p (x, SET_DEST (set_b)))
+      insn_b = last_active_insn (else_bb, FALSE);
+      set_b = single_set (insn_b);
+      gcc_assert (set_b);
+
+      if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
 	return FALSE;
     }
   else
@@ -2652,20 +2976,14 @@ noce_process_if_block (struct noce_if_info *if_info)
   if_info->a = a;
   if_info->b = b;
 
-  /* Skip it if the instruction to be moved might clobber CC.  */
-  cc = cc_in_cond (cond);
-  if (cc
-      && (set_of (cc, insn_a)
-	  || (insn_b && set_of (cc, insn_b))))
-    return FALSE;
-
   /* Try optimizations in some approximation of a useful order.  */
   /* ??? Should first look to see if X is live incoming at all.  If it
      isn't, we don't need anything but an unconditional set.  */
 
   /* Look and see if A and B are really the same.  Avoid creating silly
      cmove constructs that no one will fix up later.  */
-  if (rtx_interchangeable_p (a, b))
+  if (noce_simple_bbs (if_info)
+      && rtx_interchangeable_p (a, b))
     {
       /* If we have an INSN_B, we don't have to create any new rtl.  Just
 	 move the instruction that we already have.  If we don't have an
diff --git a/gcc/testsuite/gcc.dg/ifcvt-1.c b/gcc/testsuite/gcc.dg/ifcvt-1.c
new file mode 100644
index 0000000..92bc17a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+foo (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
new file mode 100644
index 0000000..e0e1728
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint16_t;
+
+uint8_t
+_xtime (const uint8_t byte, const uint16_t generator)
+{
+  if (byte & 0x80)
+    return byte ^ generator;
+  else
+    return byte << 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
new file mode 100644
index 0000000..2e104a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* }  } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+typedef long long s64;
+
+int
+foo (s64 a, s64 b, s64 c)
+{
+ s64 d = a - b;
+
+  if (d == 0)
+    return a + c;
+  else
+    return b + d + c;
+}
+
+/* This test can be reduced to just return a + c;  */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-27 17:54                 ` Kyrill Tkachov
  2015-07-28 10:21                   ` Kyrill Tkachov
@ 2015-07-31 17:05                   ` Jeff Law
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Law @ 2015-07-31 17:05 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernhard Reutner-Fischer; +Cc: GCC Patches

On 07/27/2015 10:40 AM, Kyrill Tkachov wrote:
>
> On 27/07/15 17:09, Jeff Law wrote:
>> On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:
>>> I experimented with resource.c and the roadblock I hit is that it
>>> seems to have an assumption that it operates on hard regs (in fact
>>> the struct it uses to describe the resources has a HARD_REG_SET for
>>> the regs) and so it triggers various HARD_REGISTER_P asserts when I
>>> try to use the functions there. if-conversion runs before register
>>> allocation, so we're dealing with pseudos here.
>> Sigh.  resource.c probably isn't going to be useful then.
>>
>>> My other attempt was to go over BB_A and mark the set registers in a
>>>   bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
>>> each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
>>> we return false. This seemed to do the job and testing worked out ok.
>>> That would require one walk over BB_A, one walk over BB_B but I don't
>>> know how expensive FOR_EACH_SUBRTX walks are...
>>>
>>> Would that be an acceptable solution?
>> I think the latter is reasonable.  Ultimately we have to do a full look
>> at those rtxs, so it's unavoidable to some extent.
>>
>> The only other possibility would be to use the DF framework.  I'm not
>> sure if it's even initialized for the ifcvt code.  If it is, then you
>> might be able to avoid some of the walking of the insns and instead walk
>> the DF structures.
>
> I think it is initialized (I look at df_get_live_out earlier on
> in the call chain). I suppose what we want is for the live in regs for BB_B
> to not include any of the set regs in BB_A?
I was thinking more that you'd walk BB_A, which would give you a set of 
registers changed in BB_A.  Then you could walk the uses of those regs 
using the DF structures and see if any are in BB_B.

That may not be any better in practice.  It'd depend on things like how 
many uses exist outside BB_B vs the size of BB_B.  Perhaps it isn't 
worth the trouble and we should stick with a single walk of both blocks?


>> I don't think we have any good generic machinery for this.  I think
>> every pass that needs this capability unlinks the insn from the chain
>> and patches it back in at the new location.
>
> That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?
Right.  Passes would just unlink the insn from the chain, then link it 
back in at a new location.  It's always been fairly ad-hoc at the RTL level.

>
> The current way the top-level noce_process_if_block is structured
> it expects the various ifcvt functions (like noce_try_cmove_arith)
> to generate a sequence, then it takes it, unshares it and removes
> the empty basic blocks.
>
> If we're to instead move insns around we'd need to further modify
>   noce_process_if_block to handle differently
>   this one case where we move insns instead of re-emitting them.
> I think this would make that function more convoluted than it needs to be.
> With the current approach we always call unshare_all_rtl_in_chain on the
> emitted sequence which should take care of any RTL sharing issues and in
> practice I don't expect to have more than 3-4 insns in these sequences
> since
> they will be guarded by the branch cost.
>
> So I would rather argue for re-emitting insns in this case to keep
> consistent
> with the dozen or so similar functions in ifcvt.c that already work that
> way.
OK, then let's stay with re-emitting since that's the structure 
generally expected elsewhere in ifcvt.c.  I just get worried anytime I 
spot the potential for invalid RTL sharing.


Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-28 10:21                   ` Kyrill Tkachov
@ 2015-07-31 18:08                     ` Jeff Law
  2015-08-09 21:21                       ` Steven Bosscher
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-07-31 18:08 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernhard Reutner-Fischer; +Cc: GCC Patches

On 07/28/2015 04:14 AM, Kyrill Tkachov wrote:
[ Snip ]


> Here's a respin.
> I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
> the set registers then go over BB_B once and look inside the SET_SRC
> of each insn for those registers. How does this look? Would you like
> me to investigate the data-flow infrastructure approach?
>
> Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
> looking for a MEM with the FOR_EACH_SUBRTX machinery.
>
> As I said above, I think moving the insns rather than re-emitting them
> would make the function more convoluted than I think it needs to be.
>
> Bootstrapped and tested on arm, aarch64, x86_64.
>
>
>
> 2015-07-28  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>      then_cost, else_cost fields.  Change branch_cost field to unsigned
> int.
>      (end_ifcvt_sequence): Call set_used_flags on each insn in the
>      sequence.
>      (noce_simple_bbs): New function.
>      (noce_try_move): Bail if basic blocks are not simple.
>      (noce_try_store_flag): Likewise.
>      (noce_try_store_flag_constants): Likewise.
>      (noce_try_addcc): Likewise.
>      (noce_try_store_flag_mask): Likewise.
>      (noce_try_cmove): Likewise.
>      (noce_try_minmax): Likewise.
>      (noce_try_abs): Likewise.
>      (noce_try_sign_mask): Likewise.
>      (noce_try_bitop): Likewise.
>      (bbs_ok_for_cmove_arith): New function.
>      (noce_emit_all_but_last): Likewise.
>      (noce_emit_insn): Likewise.
>      (noce_emit_bb): Likewise.
>      (noce_try_cmove_arith): Handle non-simple basic blocks.
>      (insn_valid_noce_process_p): New function.
>      (contains_mem_rtx_p): Likewise.
>      (bb_valid_for_noce_process_p): Likewise.
>      (noce_process_if_block): Allow non-simple basic blocks
>      where appropriate.
>
> 2015-07-28  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.dg/ifcvt-1.c: New test.
>      * gcc.dg/ifcvt-2.c: Likewise.
>      * gcc.dg/ifcvt-3.c: Likewise.
>> Thanks,
>> Kyrill
>>
>>> jeff
>>>
>
>
> ifcvt.patch
>
>
> commit a02417f015b70a0e47170ac7c41a5569392085a5
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Wed Jul 8 15:45:04 2015 +0100
>
>      [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 2d97cc5..dae9b41 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -53,6 +53,7 @@
>   #include "tree-pass.h"
>   #include "dbgcnt.h"
>   #include "shrink-wrap.h"
> +#include "rtl-iter.h"
>   #include "ifcvt.h"
Needs to be mentioned in the ChangeLog.

> @@ -1585,6 +1632,116 @@ noce_try_cmove (struct noce_if_info *if_info)
>     return FALSE;
>   }
>
> +/* Return true iff the registers that the insns in BB_A set do not
> +   get used in BB_B.  */
> +
> +static bool
> +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
> +{
> +  rtx_insn *a_insn;
> +  bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
> +
> +  FOR_BB_INSNS (bb_a, a_insn)
> +    {
> +      if (!active_insn_p (a_insn))
> +	continue;
> +
> +      rtx sset_a = single_set (a_insn);
> +
> +      if (!sset_a)
> +	{
> +	  BITMAP_FREE (bba_sets);
> +	  return false;
> +	}
> +
> +      rtx dest_reg = SET_DEST (sset_a);
> +      bitmap_set_bit (bba_sets, REGNO (dest_reg));
> +    }
So there's a tight relationship between the implementation of 
bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, 
then we'd probably be looking to use note_stores and note_uses.

With that tight relationship, I think the implementations should be 
close together in the file and a comment in insn_valid_noce_process_p 
indicating that if insn_valid_noce_process_p is changed that the 
implementation of bbs_ok_for_cmove_arith may need to change as well.

One things I am a bit worried about is a SET_DEST that is a ZERO_EXTRACT 
or SUBREG.  Those are usually a read and a write.  So you'd need to 
filter them out earlier (noce_operand_ok?) or handle them in the second 
half of bbs_ok_for_cmove_arith.

Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-07-31 18:08                     ` Jeff Law
@ 2015-08-09 21:21                       ` Steven Bosscher
  2015-08-11 17:05                         ` Jeff Law
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Bosscher @ 2015-08-09 21:21 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kyrill Tkachov, Bernhard Reutner-Fischer, GCC Patches

On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law <law@redhat.com> wrote:
>
> So there's a tight relationship between the implementation of
> bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
> we'd probably be looking to use note_stores and note_uses.

Perhaps I'm missing something, but what is wrong with using DF here
instead of note_stores/note_uses? All the info on refs/mods of
registers is available in the DF caches.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-09 21:21                       ` Steven Bosscher
@ 2015-08-11 17:05                         ` Jeff Law
  2015-08-11 17:09                           ` Kyrill Tkachov
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-08-11 17:05 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Kyrill Tkachov, Bernhard Reutner-Fischer, GCC Patches

On 08/09/2015 03:20 PM, Steven Bosscher wrote:
> On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law <law@redhat.com> wrote:
>>
>> So there's a tight relationship between the implementation of
>> bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
>> we'd probably be looking to use note_stores and note_uses.
>
> Perhaps I'm missing something, but what is wrong with using DF here
> instead of note_stores/note_uses? All the info on refs/mods of
> registers is available in the DF caches.
Nothing inherently wrong with using DF here.

jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-11 17:05                         ` Jeff Law
@ 2015-08-11 17:09                           ` Kyrill Tkachov
  2015-08-12 14:31                             ` Kyrill Tkachov
  0 siblings, 1 reply; 31+ messages in thread
From: Kyrill Tkachov @ 2015-08-11 17:09 UTC (permalink / raw)
  To: Jeff Law, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches


On 11/08/15 18:05, Jeff Law wrote:
> On 08/09/2015 03:20 PM, Steven Bosscher wrote:
>> On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law <law@redhat.com> wrote:
>>> So there's a tight relationship between the implementation of
>>> bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
>>> we'd probably be looking to use note_stores and note_uses.
>> Perhaps I'm missing something, but what is wrong with using DF here
>> instead of note_stores/note_uses? All the info on refs/mods of
>> registers is available in the DF caches.
> Nothing inherently wrong with using DF here.

I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE
in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work.
Is that what you mean by DF?
I'm doing some more testing and hope to post the updated version soon.

Thanks,
Kyrill

> jeff
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-11 17:09                           ` Kyrill Tkachov
@ 2015-08-12 14:31                             ` Kyrill Tkachov
  2015-08-19 12:59                               ` Kyrill Tkachov
  2015-08-19 16:59                               ` Jeff Law
  0 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-08-12 14:31 UTC (permalink / raw)
  To: Jeff Law, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]


On 11/08/15 18:09, Kyrill Tkachov wrote:
> On 11/08/15 18:05, Jeff Law wrote:
>> On 08/09/2015 03:20 PM, Steven Bosscher wrote:
>>> On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law <law@redhat.com> wrote:
>>>> So there's a tight relationship between the implementation of
>>>> bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
>>>> we'd probably be looking to use note_stores and note_uses.
>>> Perhaps I'm missing something, but what is wrong with using DF here
>>> instead of note_stores/note_uses? All the info on refs/mods of
>>> registers is available in the DF caches.
>> Nothing inherently wrong with using DF here.
> I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE
> in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work.
> Is that what you mean by DF?
> I'm doing some more testing and hope to post the updated version soon.

Here it is, I've used the FOR_EACH* macros from dataflow
to gather the uses and sets.

Bootstrapped and tested on x86_64 and aarch64.
How does this look?

Thanks,
Kyrill

2015-08-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
     then_cost, else_cost fields.  Change branch_cost field to unsigned int.
     (end_ifcvt_sequence): Call set_used_flags on each insn in the
     sequence.
     Include rtl-iter.h.
     (noce_simple_bbs): New function.
     (noce_try_move): Bail if basic blocks are not simple.
     (noce_try_store_flag): Likewise.
     (noce_try_store_flag_constants): Likewise.
     (noce_try_addcc): Likewise.
     (noce_try_store_flag_mask): Likewise.
     (noce_try_cmove): Likewise.
     (noce_try_minmax): Likewise.
     (noce_try_abs): Likewise.
     (noce_try_sign_mask): Likewise.
     (noce_try_bitop): Likewise.
     (bbs_ok_for_cmove_arith): New function.
     (noce_emit_all_but_last): Likewise.
     (noce_emit_insn): Likewise.
     (noce_emit_bb): Likewise.
     (noce_try_cmove_arith): Handle non-simple basic blocks.
     (insn_valid_noce_process_p): New function.
     (contains_mem_rtx_p): Likewise.
     (bb_valid_for_noce_process_p): Likewise.
     (noce_process_if_block): Allow non-simple basic blocks
     where appropriate.

2015-08-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/ifcvt-1.c: New test.
     * gcc.dg/ifcvt-2.c: Likewise.
     * gcc.dg/ifcvt-3.c: Likewise.

> Thanks,
> Kyrill
>
>> jeff
>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifcvt.patch --]
[-- Type: text/x-patch; name=ifcvt.patch, Size: 22019 bytes --]

commit 76813a58e40d067bb3e64151617a975581bc81cb
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

    [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -53,6 +53,7 @@
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "shrink-wrap.h"
+#include "rtl-iter.h"
 #include "ifcvt.h"
 
 #ifndef HAVE_incscc
@@ -816,8 +817,17 @@ struct noce_if_info
      form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info->cond);
   set_used_flags (if_info->a);
   set_used_flags (if_info->b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info->then_simple)
+    return false;
+
+  if (if_info->else_bb)
+    return if_info->else_simple;
+
+  return true;
+}
+
 /* Convert "if (a != b) x = a; else x = b" into "x = a" and
    "if (a == b) x = a; else x = b" into "x = b".  */
 
@@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE && code != EQ)
     return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
      or a signed zero.  */
   if (HONOR_NANS (if_info->x)
@@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->b)
       && INTVAL (if_info->b) == STORE_FLAG_VALUE
       && if_info->a == const0_rtx)
@@ -1165,6 +1200,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   bool can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (CONST_INT_P (if_info->a)
       && CONST_INT_P (if_info->b))
     {
@@ -1332,6 +1370,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if (GET_CODE (if_info->a) == PLUS
       && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
       && (reversed_comparison_code (if_info->cond, if_info->jump)
@@ -1423,6 +1464,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   reversep = 0;
   if ((if_info->branch_cost >= 2
        || STORE_FLAG_VALUE == -1)
@@ -1591,6 +1635,9 @@ noce_try_cmove (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   if ((CONSTANT_P (if_info->a) || register_operand (if_info->a, VOIDmode))
       && (CONSTANT_P (if_info->b) || register_operand (if_info->b, VOIDmode)))
     {
@@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+      || !NONJUMP_INSN_P (insn)
+      || (cc && set_of (cc, insn)))
+      return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+      || !noce_operand_ok (SET_DEST (sset))
+      || !noce_operand_ok (SET_SRC (sset)))
+    return false;
+
+  return true;
+}
+
+
+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
+
+  df_ref def;
+  df_ref use;
+
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+	continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      /* Record all registers that BB_A sets.  */
+      FOR_EACH_INSN_DEF (def, a_insn)
+	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
+    }
+
+  rtx_insn *b_insn;
+
+  FOR_BB_INSNS (bb_b, b_insn)
+    {
+      if (!active_insn_p (b_insn))
+	continue;
+
+      rtx sset_b = single_set (b_insn);
+
+      if (!sset_b)
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      /* Make sure this is a REG and not some instance
+	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
+      if (!REG_P (SET_DEST (sset_b)))
+	{
+	  BITMAP_FREE (bba_sets);
+	  return false;
+	}
+
+      /* If the insn uses a reg set in BB_A return false.  */
+      FOR_EACH_INSN_USE (use, b_insn)
+	{
+	  if (bitmap_bit_p (bba_sets, DF_REF_REGNO (use)))
+	    {
+	      BITMAP_FREE (bba_sets);
+	      return false;
+	    }
+	}
+
+    }
+
+  BITMAP_FREE (bba_sets);
+  return true;
+}
+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+	{
+	  rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+	  emit_insn (PATTERN (to_emit));
+	}
+    }
+}
+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -1635,9 +1828,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx x = if_info->x;
   rtx orig_a, orig_b;
   rtx_insn *insn_a, *insn_b;
+  bool a_simple = if_info->then_simple;
+  bool b_simple = if_info->else_simple;
+  basic_block then_bb = if_info->then_bb;
+  basic_block else_bb = if_info->else_bb;
   rtx target;
   int is_mem = 0;
-  int insn_cost;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
 
@@ -1676,27 +1872,22 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   insn_a = if_info->insn_a;
   insn_b = if_info->insn_b;
 
-  /* Total insn_rtx_cost should be smaller than branch cost.  Exit
-     if insn_rtx_cost can't be estimated.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
   if (insn_a)
-    {
-      insn_cost
-	= insn_rtx_cost (PATTERN (insn_a),
-      			 optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_a)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-	return FALSE;
-    }
+    then_cost = if_info->then_cost;
   else
-    insn_cost = 0;
+    then_cost = 0;
 
   if (insn_b)
-    {
-      insn_cost
-	+= insn_rtx_cost (PATTERN (insn_b),
-      			  optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn_b)));
-      if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
-        return FALSE;
-    }
+    else_cost = if_info->else_cost;
+  else
+    else_cost = 0;
+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
@@ -1712,26 +1903,36 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  code = reversed_comparison_code (if_info->cond, if_info->jump);
 	  std::swap (a, b);
 	  std::swap (insn_a, insn_b);
+	  std::swap (a_simple, b_simple);
+	  std::swap (then_bb, else_bb);
 	}
     }
 
+  if (!a_simple && then_bb && !b_simple && else_bb
+      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
+	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
+    return FALSE;
+
   start_sequence ();
 
   orig_a = a;
   orig_b = b;
 
+  rtx emit_a = NULL_RTX;
+  rtx emit_b = NULL_RTX;
+
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
      This is of course not possible in the IS_MEM case.  */
+
   if (! general_operand (a, GET_MODE (a)))
     {
-      rtx_insn *insn;
 
       if (is_mem)
 	{
 	  rtx reg = gen_reg_rtx (GET_MODE (a));
-	  insn = emit_insn (gen_rtx_SET (reg, a));
+	  emit_a = gen_rtx_SET (reg, a);
 	}
       else if (! insn_a)
 	goto end_seq_and_fail;
@@ -1741,49 +1942,58 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  rtx_insn *copy_of_a = as_a <rtx_insn *> (copy_rtx (insn_a));
 	  rtx set = single_set (copy_of_a);
 	  SET_DEST (set) = a;
-	  insn = emit_insn (PATTERN (copy_of_a));
+
+	  emit_a = PATTERN (copy_of_a);
 	}
-      if (recog_memoized (insn) < 0)
-	goto end_seq_and_fail;
     }
+
   if (! general_operand (b, GET_MODE (b)))
     {
-      rtx pat;
-      rtx_insn *last;
-      rtx_insn *new_insn;
-
       if (is_mem)
 	{
           rtx reg = gen_reg_rtx (GET_MODE (b));
-	  pat = gen_rtx_SET (reg, b);
+	  emit_b = gen_rtx_SET (reg, b);
 	}
       else if (! insn_b)
 	goto end_seq_and_fail;
       else
 	{
           b = gen_reg_rtx (GET_MODE (b));
-	  rtx_insn *copy_of_insn_b = as_a <rtx_insn *> (copy_rtx (insn_b));
-	  rtx set = single_set (copy_of_insn_b);
+	  rtx_insn *copy_of_b = as_a <rtx_insn *> (copy_rtx (insn_b));
+	  rtx set = single_set (copy_of_b);
+
 	  SET_DEST (set) = b;
-	  pat = PATTERN (copy_of_insn_b);
+	  emit_b = PATTERN (copy_of_b);
 	}
+    }
 
-      /* If insn to set up A clobbers any registers B depends on, try to
-	 swap insn that sets up A with the one that sets up B.  If even
-	 that doesn't help, punt.  */
-      last = get_last_insn ();
-      if (last && modified_in_p (orig_b, last))
-	{
-	  new_insn = emit_insn_before (pat, get_insns ());
-	  if (modified_in_p (orig_a, new_insn))
-	    goto end_seq_and_fail;
-	}
-      else
-	new_insn = emit_insn (pat);
+    /* If insn to set up A clobbers any registers B depends on, try to
+       swap insn that sets up A with the one that sets up B.  If even
+       that doesn't help, punt.  */
 
-      if (recog_memoized (new_insn) < 0)
-	goto end_seq_and_fail;
-    }
+    if (emit_a && modified_in_p (orig_b, emit_a))
+      {
+	if (modified_in_p (orig_a, emit_b))
+	  goto end_seq_and_fail;
+
+	if (else_bb && !b_simple)
+	  {
+	    if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	      goto end_seq_and_fail;
+	  }
+
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+      }
+    else
+      {
+	if (!noce_emit_bb (emit_a, then_bb, a_simple))
+	  goto end_seq_and_fail;
+
+	if (!noce_emit_bb (emit_b, else_bb, b_simple))
+	  goto end_seq_and_fail;
+
+      }
 
   target = noce_emit_cmove (if_info, x, code, XEXP (if_info->cond, 0),
 			    XEXP (if_info->cond, 1), a, b);
@@ -1987,6 +2197,9 @@ noce_try_minmax (struct noce_if_info *if_info)
   enum rtx_code code, op;
   int unsignedp;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* ??? Reject modes with NaNs or signed zeros since we don't know how
      they will be resolved with an SMIN/SMAX.  It wouldn't be too hard
      to get the target to tell us...  */
@@ -2083,6 +2296,9 @@ noce_try_abs (struct noce_if_info *if_info)
   int negate;
   bool one_cmpl = false;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Reject modes with signed zeros.  */
   if (HONOR_SIGNED_ZEROS (if_info->x))
     return FALSE;
@@ -2231,6 +2447,9 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   enum rtx_code code;
   bool t_unconditional;
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   cond = if_info->cond;
   code = GET_CODE (cond);
   m = XEXP (cond, 0);
@@ -2314,6 +2533,9 @@ noce_try_bitop (struct noce_if_info *if_info)
   cond = if_info->cond;
   code = GET_CODE (cond);
 
+  if (!noce_simple_bbs (if_info))
+    return FALSE;
+
   /* Check for no else condition.  */
   if (! rtx_equal_p (x, if_info->b))
     return FALSE;
@@ -2564,6 +2786,113 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
   return false;
 }
 
+/* Return true if X contains a MEM subrtx.  */
+
+static bool
+contains_mem_rtx_p (rtx x)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+    if (MEM_P (*iter))
+      return true;
+
+  return false;
+}
+
+/* Return true iff basic block TEST_BB is valid for noce if-conversion.
+   The condition used in this if-conversion is in COND.
+   In practice, check that TEST_BB ends with a single set
+   x := a and all previous computations
+   in TEST_BB don't produce any values that are live after TEST_BB.
+   In other words, all the insns in TEST_BB are there only
+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  */
+
+static bool
+bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
+			      unsigned int *cost, bool *simple_p)
+{
+  if (!test_bb)
+    return false;
+
+  rtx_insn *last_insn = last_active_insn (test_bb, FALSE);
+  rtx last_set = NULL_RTX;
+
+  rtx cc = cc_in_cond (cond);
+
+  if (!insn_valid_noce_process_p (last_insn, cc))
+    return false;
+  last_set = single_set (last_insn);
+
+  rtx x = SET_DEST (last_set);
+  rtx_insn *first_insn = first_active_insn (test_bb);
+  rtx first_set = single_set (first_insn);
+
+  if (!first_set)
+    return false;
+
+  /* We have a single simple set, that's okay.  */
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
+    {
+      *simple_p = noce_operand_ok (SET_DEST (first_set));
+      *cost = insn_rtx_cost (first_set, speed_p);
+      return *simple_p;
+    }
+
+  rtx_insn *prev_last_insn = PREV_INSN (last_insn);
+  gcc_assert (prev_last_insn);
+
+  /* For now, disallow setting x multiple times in test_bb.  */
+  if (REG_P (x) && reg_set_between_p (x, first_insn, prev_last_insn))
+    return false;
+
+  bitmap test_bb_temps = BITMAP_ALLOC (&reg_obstack);
+
+  /* The regs that are live out of test_bb.  */
+  bitmap test_bb_live_out = df_get_live_out (test_bb);
+
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
+  rtx_insn *insn;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      if (insn != last_insn)
+	{
+	  if (!active_insn_p (insn))
+	    continue;
+
+	  if (!insn_valid_noce_process_p (insn, cc))
+	    goto free_bitmap_and_fail;
+
+	  rtx sset = single_set (insn);
+	  gcc_assert (sset);
+
+	  if (contains_mem_rtx_p (SET_SRC (sset))
+	      || !REG_P (SET_DEST (sset)))
+	    goto free_bitmap_and_fail;
+
+	  potential_cost += insn_rtx_cost (sset, speed_p);
+	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
+	}
+    }
+
+  /* If any of the intermediate results in test_bb are live after test_bb
+     then fail.  */
+  if (bitmap_intersect_p (test_bb_live_out, test_bb_temps))
+    goto free_bitmap_and_fail;
+
+  BITMAP_FREE (test_bb_temps);
+  *cost = potential_cost;
+  *simple_p = false;
+  return true;
+
+ free_bitmap_and_fail:
+  BITMAP_FREE (test_bb_temps);
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2580,7 +2909,6 @@ noce_process_if_block (struct noce_if_info *if_info)
   rtx_insn *insn_a, *insn_b;
   rtx set_a, set_b;
   rtx orig_x, x, a, b;
-  rtx cc;
 
   /* We're looking for patterns of the form
 
@@ -2589,15 +2917,23 @@ noce_process_if_block (struct noce_if_info *if_info)
      (3) if (...) x = a;   // as if with an initial x = x.
 
      The later patterns require jumps to be more expensive.
-
+     For the if (...) x = a; else x = b; case we allow multiple insns
+     inside the then and else blocks as long as their only effect is
+     to calculate a value for x.
      ??? For future expansion, look for multiple X in such patterns.  */
 
-  /* Look for one of the potential sets.  */
-  insn_a = first_active_insn (then_bb);
-  if (! insn_a
-      || insn_a != last_active_insn (then_bb, FALSE)
-      || (set_a = single_set (insn_a)) == NULL_RTX)
-    return FALSE;
+  if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+				    &if_info->then_simple))
+    return false;
+
+  if (else_bb
+      && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost,
+				      &if_info->else_simple))
+    return false;
+
+  insn_a = last_active_insn (then_bb, FALSE);
+  set_a = single_set (insn_a);
+  gcc_assert (set_a);
 
   x = SET_DEST (set_a);
   a = SET_SRC (set_a);
@@ -2612,11 +2948,11 @@ noce_process_if_block (struct noce_if_info *if_info)
   set_b = NULL_RTX;
   if (else_bb)
     {
-      insn_b = first_active_insn (else_bb);
-      if (! insn_b
-	  || insn_b != last_active_insn (else_bb, FALSE)
-	  || (set_b = single_set (insn_b)) == NULL_RTX
-	  || ! rtx_interchangeable_p (x, SET_DEST (set_b)))
+      insn_b = last_active_insn (else_bb, FALSE);
+      set_b = single_set (insn_b);
+      gcc_assert (set_b);
+
+      if (!rtx_interchangeable_p (x, SET_DEST (set_b)))
 	return FALSE;
     }
   else
@@ -2692,20 +3028,14 @@ noce_process_if_block (struct noce_if_info *if_info)
   if_info->a = a;
   if_info->b = b;
 
-  /* Skip it if the instruction to be moved might clobber CC.  */
-  cc = cc_in_cond (cond);
-  if (cc
-      && (set_of (cc, insn_a)
-	  || (insn_b && set_of (cc, insn_b))))
-    return FALSE;
-
   /* Try optimizations in some approximation of a useful order.  */
   /* ??? Should first look to see if X is live incoming at all.  If it
      isn't, we don't need anything but an unconditional set.  */
 
   /* Look and see if A and B are really the same.  Avoid creating silly
      cmove constructs that no one will fix up later.  */
-  if (rtx_interchangeable_p (a, b))
+  if (noce_simple_bbs (if_info)
+      && rtx_interchangeable_p (a, b))
     {
       /* If we have an INSN_B, we don't have to create any new rtl.  Just
 	 move the instruction that we already have.  If we don't have an
diff --git a/gcc/testsuite/gcc.dg/ifcvt-1.c b/gcc/testsuite/gcc.dg/ifcvt-1.c
new file mode 100644
index 0000000..92bc17a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+foo (int x)
+{
+  return x > 100 ? x - 2 : x - 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
new file mode 100644
index 0000000..e0e1728
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint16_t;
+
+uint8_t
+_xtime (const uint8_t byte, const uint16_t generator)
+{
+  if (byte & 0x80)
+    return byte ^ generator;
+  else
+    return byte << 1;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
new file mode 100644
index 0000000..2e104a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target aarch64*-*-* x86_64-*-* }  } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+typedef long long s64;
+
+int
+foo (s64 a, s64 b, s64 c)
+{
+ s64 d = a - b;
+
+  if (d == 0)
+    return a + c;
+  else
+    return b + d + c;
+}
+
+/* This test can be reduced to just return a + c;  */
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-12 14:31                             ` Kyrill Tkachov
@ 2015-08-19 12:59                               ` Kyrill Tkachov
  2015-08-19 16:59                               ` Jeff Law
  1 sibling, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-08-19 12:59 UTC (permalink / raw)
  To: Jeff Law, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00609.html

Thanks,
Kyrill

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Kyrill Tkachov
> Sent: 12 August 2015 15:32
> To: Jeff Law; Steven Bosscher
> Cc: Bernhard Reutner-Fischer; GCC Patches
> Subject: Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-
> conversion more aggressive
> 
> 
> On 11/08/15 18:09, Kyrill Tkachov wrote:
> > On 11/08/15 18:05, Jeff Law wrote:
> >> On 08/09/2015 03:20 PM, Steven Bosscher wrote:
> >>> On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law <law@redhat.com> wrote:
> >>>> So there's a tight relationship between the implementation of
> >>>> bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there
> >>>> wasn't, then we'd probably be looking to use note_stores and
> note_uses.
> >>> Perhaps I'm missing something, but what is wrong with using DF here
> >>> instead of note_stores/note_uses? All the info on refs/mods of
> >>> registers is available in the DF caches.
> >> Nothing inherently wrong with using DF here.
> > I have reworked the patch to use FOR_EACH_INSN_DEF and
> > FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the
> refs/mods and it seems to work.
> > Is that what you mean by DF?
> > I'm doing some more testing and hope to post the updated version soon.
> 
> Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses
> and sets.
> 
> Bootstrapped and tested on x86_64 and aarch64.
> How does this look?
> 
> Thanks,
> Kyrill
> 
> 2015-08-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>      then_cost, else_cost fields.  Change branch_cost field to unsigned int.
>      (end_ifcvt_sequence): Call set_used_flags on each insn in the
>      sequence.
>      Include rtl-iter.h.
>      (noce_simple_bbs): New function.
>      (noce_try_move): Bail if basic blocks are not simple.
>      (noce_try_store_flag): Likewise.
>      (noce_try_store_flag_constants): Likewise.
>      (noce_try_addcc): Likewise.
>      (noce_try_store_flag_mask): Likewise.
>      (noce_try_cmove): Likewise.
>      (noce_try_minmax): Likewise.
>      (noce_try_abs): Likewise.
>      (noce_try_sign_mask): Likewise.
>      (noce_try_bitop): Likewise.
>      (bbs_ok_for_cmove_arith): New function.
>      (noce_emit_all_but_last): Likewise.
>      (noce_emit_insn): Likewise.
>      (noce_emit_bb): Likewise.
>      (noce_try_cmove_arith): Handle non-simple basic blocks.
>      (insn_valid_noce_process_p): New function.
>      (contains_mem_rtx_p): Likewise.
>      (bb_valid_for_noce_process_p): Likewise.
>      (noce_process_if_block): Allow non-simple basic blocks
>      where appropriate.
> 
> 2015-08-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * gcc.dg/ifcvt-1.c: New test.
>      * gcc.dg/ifcvt-2.c: Likewise.
>      * gcc.dg/ifcvt-3.c: Likewise.
> 
> > Thanks,
> > Kyrill
> >
> >> jeff
> >>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-12 14:31                             ` Kyrill Tkachov
  2015-08-19 12:59                               ` Kyrill Tkachov
@ 2015-08-19 16:59                               ` Jeff Law
  2015-08-19 17:20                                 ` Kyrill Tkachov
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Law @ 2015-08-19 16:59 UTC (permalink / raw)
  To: Kyrill Tkachov, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>
> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>      then_cost, else_cost fields.  Change branch_cost field to unsigned
> int.
>      (end_ifcvt_sequence): Call set_used_flags on each insn in the
>      sequence.
>      Include rtl-iter.h.
>      (noce_simple_bbs): New function.
>      (noce_try_move): Bail if basic blocks are not simple.
>      (noce_try_store_flag): Likewise.
>      (noce_try_store_flag_constants): Likewise.
>      (noce_try_addcc): Likewise.
>      (noce_try_store_flag_mask): Likewise.
>      (noce_try_cmove): Likewise.
>      (noce_try_minmax): Likewise.
>      (noce_try_abs): Likewise.
>      (noce_try_sign_mask): Likewise.
>      (noce_try_bitop): Likewise.
>      (bbs_ok_for_cmove_arith): New function.
>      (noce_emit_all_but_last): Likewise.
>      (noce_emit_insn): Likewise.
>      (noce_emit_bb): Likewise.
>      (noce_try_cmove_arith): Handle non-simple basic blocks.
>      (insn_valid_noce_process_p): New function.
>      (contains_mem_rtx_p): Likewise.
>      (bb_valid_for_noce_process_p): Likewise.
>      (noce_process_if_block): Allow non-simple basic blocks
>      where appropriate.
>
> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      * gcc.dg/ifcvt-1.c: New test.
>      * gcc.dg/ifcvt-2.c: Likewise.
>      * gcc.dg/ifcvt-3.c: Likewise.
Thanks for pinging -- I thought I'd already approved this a few days 
ago!  But I can't find it in my outbox...  So clearly I didn't finish 
the final review.



> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 1f29646..c33fe24 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
>     return FALSE;
>   }
>
> +/* Helper for bb_valid_for_noce_process_p.  Validate that
> +   the rtx insn INSN is a single set that does not set
> +   the conditional register CC and is in general valid for
> +   if-conversion.  */
> +
> +static bool
> +insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
> +{
> +  if (!insn
> +      || !NONJUMP_INSN_P (insn)
> +      || (cc && set_of (cc, insn)))
> +      return false;
> +
> +  rtx sset = single_set (insn);
> +
> +  /* Currently support only simple single sets in test_bb.  */
> +  if (!sset
> +      || !noce_operand_ok (SET_DEST (sset))
> +      || !noce_operand_ok (SET_SRC (sset)))
> +    return false;
> +
> +  return true;
> +}
> +
> +

> +      /* Make sure this is a REG and not some instance
> +	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
> +      if (!REG_P (SET_DEST (sset_b)))
> +	{
> +	  BITMAP_FREE (bba_sets);
> +	  return false;
> +	}
BTW, this is overly conservative.  You're working with pseudos here, so 
you can just treat a ZERO_EXTRACT or SUBREG as a read of the full 
underlying register.  If this comes up in practice you might consider 
handling them as a follow-up patch.  I don't think you need to handle 
that case immediately though.

I also can't remember if we discussed what happens if blocks A & B write 
to the same register, do we handle that situation correctly?

That's the only issue left in my mind.  If we're handling that case 
correctly, then this is Ok for the trunk as-is.  Else we'll need another 
iteration.

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-19 16:59                               ` Jeff Law
@ 2015-08-19 17:20                                 ` Kyrill Tkachov
  2015-08-19 17:38                                   ` Jeff Law
  2015-09-02 15:18                                   ` Zamyatin, Igor
  0 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2015-08-19 17:20 UTC (permalink / raw)
  To: Jeff Law, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches


On 19/08/15 17:57, Jeff Law wrote:
> On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>>       then_cost, else_cost fields.  Change branch_cost field to unsigned
>> int.
>>       (end_ifcvt_sequence): Call set_used_flags on each insn in the
>>       sequence.
>>       Include rtl-iter.h.
>>       (noce_simple_bbs): New function.
>>       (noce_try_move): Bail if basic blocks are not simple.
>>       (noce_try_store_flag): Likewise.
>>       (noce_try_store_flag_constants): Likewise.
>>       (noce_try_addcc): Likewise.
>>       (noce_try_store_flag_mask): Likewise.
>>       (noce_try_cmove): Likewise.
>>       (noce_try_minmax): Likewise.
>>       (noce_try_abs): Likewise.
>>       (noce_try_sign_mask): Likewise.
>>       (noce_try_bitop): Likewise.
>>       (bbs_ok_for_cmove_arith): New function.
>>       (noce_emit_all_but_last): Likewise.
>>       (noce_emit_insn): Likewise.
>>       (noce_emit_bb): Likewise.
>>       (noce_try_cmove_arith): Handle non-simple basic blocks.
>>       (insn_valid_noce_process_p): New function.
>>       (contains_mem_rtx_p): Likewise.
>>       (bb_valid_for_noce_process_p): Likewise.
>>       (noce_process_if_block): Allow non-simple basic blocks
>>       where appropriate.
>>
>> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       * gcc.dg/ifcvt-1.c: New test.
>>       * gcc.dg/ifcvt-2.c: Likewise.
>>       * gcc.dg/ifcvt-3.c: Likewise.
> Thanks for pinging -- I thought I'd already approved this a few days
> ago!  But I can't find it in my outbox...  So clearly I didn't finish
> the final review.
>
>
>
>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>> index 1f29646..c33fe24 100644
>> --- a/gcc/ifcvt.c
>> +++ b/gcc/ifcvt.c
>> @@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
>>      return FALSE;
>>    }
>>
>> +/* Helper for bb_valid_for_noce_process_p.  Validate that
>> +   the rtx insn INSN is a single set that does not set
>> +   the conditional register CC and is in general valid for
>> +   if-conversion.  */
>> +
>> +static bool
>> +insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
>> +{
>> +  if (!insn
>> +      || !NONJUMP_INSN_P (insn)
>> +      || (cc && set_of (cc, insn)))
>> +      return false;
>> +
>> +  rtx sset = single_set (insn);
>> +
>> +  /* Currently support only simple single sets in test_bb.  */
>> +  if (!sset
>> +      || !noce_operand_ok (SET_DEST (sset))
>> +      || !noce_operand_ok (SET_SRC (sset)))
>> +    return false;
>> +
>> +  return true;
>> +}
>> +
>> +
>> +      /* Make sure this is a REG and not some instance
>> +	 of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
>> +      if (!REG_P (SET_DEST (sset_b)))
>> +	{
>> +	  BITMAP_FREE (bba_sets);
>> +	  return false;
>> +	}
> BTW, this is overly conservative.  You're working with pseudos here, so
> you can just treat a ZERO_EXTRACT or SUBREG as a read of the full
> underlying register.  If this comes up in practice you might consider
> handling them as a follow-up patch.  I don't think you need to handle
> that case immediately though.

I agree, from my testing and investigation this patch strictly
increases the available opportunities, so being conservative here
should not cause any regressions against existing behaviour.
Making it more aggressive can be done as a follow up though, as you said,
I'm not sure how frequently this comes up in practice.

>
> I also can't remember if we discussed what happens if blocks A & B write
> to the same register, do we handle that situation correctly?

Hmmm...
The function bb_valid_for_noce_process_p that we call early on
in noce_process_if_block makes sure that the only live reg out
of each basic block is the final common destination ('x' in the
noce_if_info struct definition). Since both basic blocks satisfy
that check I suppose that means that even if A and B do write to
the same intermediate pseudo that should not affect correctness
since the written-to common register would have to be read within
A and B (and nowhere outside A and B), which would cause it to
fail the bbs_ok_for_cmove_arith check that checks that no regs
written in A are read in B (and vice versa).

>
> That's the only issue left in my mind.  If we're handling that case
> correctly, then this is Ok for the trunk as-is.  Else we'll need another
> iteration.

Does the above explanation look ok to you?
If so, I'll be away for a week from Monday so I'd rather commit
the patch when I get back so I can deal with any fallout...

Thanks for the reviews!
Kyrill

>
> Thanks,
> Jeff
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-19 17:20                                 ` Kyrill Tkachov
@ 2015-08-19 17:38                                   ` Jeff Law
  2015-09-02 15:18                                   ` Zamyatin, Igor
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Law @ 2015-08-19 17:38 UTC (permalink / raw)
  To: Kyrill Tkachov, Steven Bosscher; +Cc: Bernhard Reutner-Fischer, GCC Patches

On 08/19/2015 11:20 AM, Kyrill Tkachov wrote:
>
> Hmmm...
> The function bb_valid_for_noce_process_p that we call early on
> in noce_process_if_block makes sure that the only live reg out
> of each basic block is the final common destination ('x' in the
> noce_if_info struct definition). Since both basic blocks satisfy
> that check I suppose that means that even if A and B do write to
> the same intermediate pseudo that should not affect correctness
> since the written-to common register would have to be read within
> A and B (and nowhere outside A and B), which would cause it to
> fail the bbs_ok_for_cmove_arith check that checks that no regs
> written in A are read in B (and vice versa).
Excellent.  That answers things quite well.  In retrospect, I could have 
figured that out myself :-)

>
>>
>> That's the only issue left in my mind.  If we're handling that case
>> correctly, then this is Ok for the trunk as-is.  Else we'll need another
>> iteration.
>
> Does the above explanation look ok to you?
> If so, I'll be away for a week from Monday so I'd rather commit
> the patch when I get back so I can deal with any fallout...
That's fine with me.  Commit at your leisure.

Thanks,
jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-08-19 17:20                                 ` Kyrill Tkachov
  2015-08-19 17:38                                   ` Jeff Law
@ 2015-09-02 15:18                                   ` Zamyatin, Igor
  2015-09-02 16:02                                     ` Kyrill Tkachov
  2015-09-02 21:01                                     ` Jeff Law
  1 sibling, 2 replies; 31+ messages in thread
From: Zamyatin, Igor @ 2015-09-02 15:18 UTC (permalink / raw)
  To: Kyrill Tkachov, Jeff Law, Steven Bosscher
  Cc: Bernhard Reutner-Fischer, GCC Patches

> 
> 
> On 19/08/15 17:57, Jeff Law wrote:
> > On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
> >> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> >>
> >>       * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
> >>       then_cost, else_cost fields.  Change branch_cost field to
> >> unsigned int.
> >>       (end_ifcvt_sequence): Call set_used_flags on each insn in the
> >>       sequence.
> >>       Include rtl-iter.h.
> >>       (noce_simple_bbs): New function.
> >>       (noce_try_move): Bail if basic blocks are not simple.
> >>       (noce_try_store_flag): Likewise.
> >>       (noce_try_store_flag_constants): Likewise.
> >>       (noce_try_addcc): Likewise.
> >>       (noce_try_store_flag_mask): Likewise.
> >>       (noce_try_cmove): Likewise.
> >>       (noce_try_minmax): Likewise.
> >>       (noce_try_abs): Likewise.
> >>       (noce_try_sign_mask): Likewise.
> >>       (noce_try_bitop): Likewise.
> >>       (bbs_ok_for_cmove_arith): New function.
> >>       (noce_emit_all_but_last): Likewise.
> >>       (noce_emit_insn): Likewise.
> >>       (noce_emit_bb): Likewise.
> >>       (noce_try_cmove_arith): Handle non-simple basic blocks.
> >>       (insn_valid_noce_process_p): New function.
> >>       (contains_mem_rtx_p): Likewise.
> >>       (bb_valid_for_noce_process_p): Likewise.
> >>       (noce_process_if_block): Allow non-simple basic blocks
> >>       where appropriate.
> >>
> >> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> >>
> >>       * gcc.dg/ifcvt-1.c: New test.
> >>       * gcc.dg/ifcvt-2.c: Likewise.
> >>       * gcc.dg/ifcvt-3.c: Likewise.

Looks like ifcvt-3.c fails on x86_64. I see

New failures:
FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

Could you please take a look?

Thanks,
Igor


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-09-02 15:18                                   ` Zamyatin, Igor
@ 2015-09-02 16:02                                     ` Kyrill Tkachov
  2015-09-05 15:22                                       ` H.J. Lu
  2015-09-02 21:01                                     ` Jeff Law
  1 sibling, 1 reply; 31+ messages in thread
From: Kyrill Tkachov @ 2015-09-02 16:02 UTC (permalink / raw)
  To: Zamyatin, Igor, Jeff Law, Steven Bosscher
  Cc: Bernhard Reutner-Fischer, GCC Patches


On 02/09/15 16:18, Zamyatin, Igor wrote:
>>
>> On 19/08/15 17:57, Jeff Law wrote:
>>> On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>>>> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>
>>>>        * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>>>>        then_cost, else_cost fields.  Change branch_cost field to
>>>> unsigned int.
>>>>        (end_ifcvt_sequence): Call set_used_flags on each insn in the
>>>>        sequence.
>>>>        Include rtl-iter.h.
>>>>        (noce_simple_bbs): New function.
>>>>        (noce_try_move): Bail if basic blocks are not simple.
>>>>        (noce_try_store_flag): Likewise.
>>>>        (noce_try_store_flag_constants): Likewise.
>>>>        (noce_try_addcc): Likewise.
>>>>        (noce_try_store_flag_mask): Likewise.
>>>>        (noce_try_cmove): Likewise.
>>>>        (noce_try_minmax): Likewise.
>>>>        (noce_try_abs): Likewise.
>>>>        (noce_try_sign_mask): Likewise.
>>>>        (noce_try_bitop): Likewise.
>>>>        (bbs_ok_for_cmove_arith): New function.
>>>>        (noce_emit_all_but_last): Likewise.
>>>>        (noce_emit_insn): Likewise.
>>>>        (noce_emit_bb): Likewise.
>>>>        (noce_try_cmove_arith): Handle non-simple basic blocks.
>>>>        (insn_valid_noce_process_p): New function.
>>>>        (contains_mem_rtx_p): Likewise.
>>>>        (bb_valid_for_noce_process_p): Likewise.
>>>>        (noce_process_if_block): Allow non-simple basic blocks
>>>>        where appropriate.
>>>>
>>>> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>
>>>>        * gcc.dg/ifcvt-1.c: New test.
>>>>        * gcc.dg/ifcvt-2.c: Likewise.
>>>>        * gcc.dg/ifcvt-3.c: Likewise.
> Looks like ifcvt-3.c fails on x86_64. I see
>
> New failures:
> FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
>
> Could you please take a look?

Hmm, these pass for me on x86_64-pc-linux-gnu.
The test is most probably failing due to branch costs being too low for the
transformation to kick in. The test passes for me with -mtune=intel and -mtune=generic.
Do you know what the default tuning CPU is used for that failing test?

Thanks,
Kyrill

>
> Thanks,
> Igor
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-09-02 15:18                                   ` Zamyatin, Igor
  2015-09-02 16:02                                     ` Kyrill Tkachov
@ 2015-09-02 21:01                                     ` Jeff Law
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Law @ 2015-09-02 21:01 UTC (permalink / raw)
  To: Zamyatin, Igor, Kyrill Tkachov, Steven Bosscher
  Cc: Bernhard Reutner-Fischer, GCC Patches

On 09/02/2015 09:18 AM, Zamyatin, Igor wrote:
>>
>>
>> On 19/08/15 17:57, Jeff Law wrote:
>>> On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>>>> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>
>>>>        * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>>>>        then_cost, else_cost fields.  Change branch_cost field to
>>>> unsigned int.
>>>>        (end_ifcvt_sequence): Call set_used_flags on each insn in the
>>>>        sequence.
>>>>        Include rtl-iter.h.
>>>>        (noce_simple_bbs): New function.
>>>>        (noce_try_move): Bail if basic blocks are not simple.
>>>>        (noce_try_store_flag): Likewise.
>>>>        (noce_try_store_flag_constants): Likewise.
>>>>        (noce_try_addcc): Likewise.
>>>>        (noce_try_store_flag_mask): Likewise.
>>>>        (noce_try_cmove): Likewise.
>>>>        (noce_try_minmax): Likewise.
>>>>        (noce_try_abs): Likewise.
>>>>        (noce_try_sign_mask): Likewise.
>>>>        (noce_try_bitop): Likewise.
>>>>        (bbs_ok_for_cmove_arith): New function.
>>>>        (noce_emit_all_but_last): Likewise.
>>>>        (noce_emit_insn): Likewise.
>>>>        (noce_emit_bb): Likewise.
>>>>        (noce_try_cmove_arith): Handle non-simple basic blocks.
>>>>        (insn_valid_noce_process_p): New function.
>>>>        (contains_mem_rtx_p): Likewise.
>>>>        (bb_valid_for_noce_process_p): Likewise.
>>>>        (noce_process_if_block): Allow non-simple basic blocks
>>>>        where appropriate.
>>>>
>>>> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>
>>>>        * gcc.dg/ifcvt-1.c: New test.
>>>>        * gcc.dg/ifcvt-2.c: Likewise.
>>>>        * gcc.dg/ifcvt-3.c: Likewise.
>
> Looks like ifcvt-3.c fails on x86_64. I see
>
> New failures:
> FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
>
> Could you please take a look?
Hmm, FWIW, these are passing in my builds/tests:
[law@localhost c-family]$ grep ifcvt-3 /tmp/gcc.sum
PASS: gcc.dg/ifcvt-3.c (test for excess errors)
PASS: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
PASS: gcc.dg/vect/vect-ifcvt-3.c (test for excess errors)
PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "vectorized 1 loops" 1
PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects (test for 
excess errors)
PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects execution test
PASS: gcc.dg/vect/vect-ifcvt-3.c execution test
PASS: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized 1 
loops" 1
Jeff


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
  2015-09-02 16:02                                     ` Kyrill Tkachov
@ 2015-09-05 15:22                                       ` H.J. Lu
  0 siblings, 0 replies; 31+ messages in thread
From: H.J. Lu @ 2015-09-05 15:22 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Zamyatin, Igor, Jeff Law, Steven Bosscher,
	Bernhard Reutner-Fischer, GCC Patches

On Wed, Sep 2, 2015 at 9:01 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
> On 02/09/15 16:18, Zamyatin, Igor wrote:
>>>
>>>
>>> On 19/08/15 17:57, Jeff Law wrote:
>>>>
>>>> On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>>>>>
>>>>> 2015-08-10  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>>>>>        then_cost, else_cost fields.  Change branch_cost field to
>>>>> unsigned int.
>>>>>        (end_ifcvt_sequence): Call set_used_flags on each insn in the
>>>>>        sequence.
>>>>>        Include rtl-iter.h.
>>>>>        (noce_simple_bbs): New function.
>>>>>        (noce_try_move): Bail if basic blocks are not simple.
>>>>>        (noce_try_store_flag): Likewise.
>>>>>        (noce_try_store_flag_constants): Likewise.
>>>>>        (noce_try_addcc): Likewise.
>>>>>        (noce_try_store_flag_mask): Likewise.
>>>>>        (noce_try_cmove): Likewise.
>>>>>        (noce_try_minmax): Likewise.
>>>>>        (noce_try_abs): Likewise.
>>>>>        (noce_try_sign_mask): Likewise.
>>>>>        (noce_try_bitop): Likewise.
>>>>>        (bbs_ok_for_cmove_arith): New function.
>>>>>        (noce_emit_all_but_last): Likewise.
>>>>>        (noce_emit_insn): Likewise.
>>>>>        (noce_emit_bb): Likewise.
>>>>>        (noce_try_cmove_arith): Handle non-simple basic blocks.
>>>>>        (insn_valid_noce_process_p): New function.
>>>>>        (contains_mem_rtx_p): Likewise.
>>>>>        (bb_valid_for_noce_process_p): Likewise.
>>>>>        (noce_process_if_block): Allow non-simple basic blocks
>>>>>        where appropriate.
>>>>>
>>>>> 2015-08-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * gcc.dg/ifcvt-1.c: New test.
>>>>>        * gcc.dg/ifcvt-2.c: Likewise.
>>>>>        * gcc.dg/ifcvt-3.c: Likewise.
>>
>> Looks like ifcvt-3.c fails on x86_64. I see
>>
>> New failures:
>> FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
>>
>> Could you please take a look?
>
>
> Hmm, these pass for me on x86_64-pc-linux-gnu.
> The test is most probably failing due to branch costs being too low for the
> transformation to kick in. The test passes for me with -mtune=intel and
> -mtune=generic.
> Do you know what the default tuning CPU is used for that failing test?

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462


-- 
H.J.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-09-05 12:53 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 12:31 [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive Kyrill Tkachov
2015-07-10 13:05 ` Kyrill Tkachov
2015-07-10 23:00 ` Bernhard Reutner-Fischer
2015-07-10 23:08   ` Bernhard Reutner-Fischer
2015-07-13  9:46   ` Kyrill Tkachov
2015-07-13 10:00     ` Kyrill Tkachov
2015-07-13 10:48     ` Bernhard Reutner-Fischer
2015-07-13 13:12       ` Kyrill Tkachov
2015-07-13 14:03     ` Kyrill Tkachov
2015-07-20 10:59       ` Kyrill Tkachov
2015-07-23 20:57       ` Jeff Law
2015-07-24  9:44         ` Kyrill Tkachov
2015-07-24 18:44           ` Jeff Law
2015-07-27 10:30             ` Kyrill Tkachov
2015-07-27 16:14               ` Jeff Law
2015-07-27 17:54                 ` Kyrill Tkachov
2015-07-28 10:21                   ` Kyrill Tkachov
2015-07-31 18:08                     ` Jeff Law
2015-08-09 21:21                       ` Steven Bosscher
2015-08-11 17:05                         ` Jeff Law
2015-08-11 17:09                           ` Kyrill Tkachov
2015-08-12 14:31                             ` Kyrill Tkachov
2015-08-19 12:59                               ` Kyrill Tkachov
2015-08-19 16:59                               ` Jeff Law
2015-08-19 17:20                                 ` Kyrill Tkachov
2015-08-19 17:38                                   ` Jeff Law
2015-09-02 15:18                                   ` Zamyatin, Igor
2015-09-02 16:02                                     ` Kyrill Tkachov
2015-09-05 15:22                                       ` H.J. Lu
2015-09-02 21:01                                     ` Jeff Law
2015-07-31 17:05                   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).