[PATCH] Remove some restrictions on loop shape in tree-if-conv.c

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
@ 2015-04-28 13:57 Alan Lawrence
  2015-04-28 14:22 ` Alan Lawrence
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Alan Lawrence @ 2015-04-28 13:57 UTC (permalink / raw)
  To: gcc-patches

Tree if-conversion currently bails out for loops that (a) contain nested loops; 
(b) have more than one exit; (c) where the exit block (source of the exit edge) 
does not dominate the loop latch; (d) where the exit block is the loop header, 
or there are statements after the exit.

This patch removes restrictions (c) and (d). The intuition is that, for (c), "if 
(P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q) break;" and 
this is mostly handled by existing code for propagating conditions. For (d), "if 
(P) break; stmts" is equivalent to "if (!P) stmts; if (P) break;" - this 
requires inserting the predicated stmts before the branch rather than after.

Mostly thus this patch is just removing assumptions about when we do/don't need 
to store predicates. One 'gotcha' was in some test cases the latch block passed 
into if-conversion is non-empty; in such cases, if-conversion will now restore 
"good form" by moving the statement into the exit block (predicated with 
!exit-condition).

The condition on dominance in add_to_predicate_list, I haven't quite managed to 
convince myself is right; we _do_ want to store a predicate for the latch block 
to handle the above case, but I'm not totally sure of the postdominance 
condition - I think it may store conditions in cases where we don't really need 
to (e.g. "for (;;) { ... if (P) { for (;;) ; } }" which might look nested but 
isn't, and has no route to the function exit). However, storing conditions when 
we don't need to, is OK, unlike failing to store when we do need to ;).

A simple example of the patch at work:

int
foo ()
{
   for (int i = 0; i < N ; i++)
   {
     int m = (a[i] & i) ? 5 : 4;
     b[i] = a[i] * m;
   }
}

compiled at -O3, -fdump-tree-ivcanon shows this immediately before 
tree-if-conversion:

...function entry, variables, etc...
   <bb 2>:
   _10 = a[0];
   goto <bb 6>;

   <bb 3>:
   _5 = a[i_9];
   _6 = _5 & i_9;
   if (_6 != 0)
     goto <bb 5>;
   else
     goto <bb 4>;

   <bb 4>:

   <bb 5>:
   # m_14 = PHI <5(3), 4(4)>

   <bb 6>:
   # m_2 = PHI <m_14(5), 4(2)>
   # _15 = PHI <_5(5), _10(2)>
   # i_16 = PHI <i_9(5), 0(2)>
   # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
   _7 = m_2 * _15;
   b[i_16] = _7;
   i_9 = i_16 + 1;
   ivtmp_3 = ivtmp_13 - 1;
   if (ivtmp_3 != 0)
     goto <bb 3>;
   else
     goto <bb 7>;

which previously was not if-converted. With this patch:

   <bb 2>:
   _10 = a[0];
   goto <bb 4>;

   <bb 3>:

   <bb 4>:
   # m_2 = PHI <m_14(3), 4(2)>
   # _15 = PHI <_5(3), _10(2)>
   # i_16 = PHI <i_9(3), 0(2)>
   # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
   _7 = m_2 * _15;
   b[i_16] = _7;
   i_9 = i_16 + 1;
   ivtmp_3 = ivtmp_13 - 1;
   _5 = a[i_9];
   _6 = _5 & i_9;
   m_14 = _6 != 0 ? 5 : 4;
   if (ivtmp_3 != 0)
     goto <bb 3>;
   else
     goto <bb 5>;

   <bb 5>:
   return;

(Unfortunately the vectorizer still doesn't handle this loop either, but that's 
another issue/patch...)

Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and aarch64-none-linux-gnu.
Cross-tested check-gcc on aarch64-none-elf.
I'm investigating impact on benchmarks - on AArch64 Spec2k6, this touches a 
number of object files, leading to an overall slight decrease in the number of 
instructions, but no change that looks significant (specifically, no more or 
less vectorization).

Is this OK for trunk?

Cheers, Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-28 13:57 [PATCH] Remove some restrictions on loop shape in tree-if-conv.c Alan Lawrence
@ 2015-04-28 14:22 ` Alan Lawrence
  2015-04-29  5:01 ` Jeff Law
  2015-04-30 10:26 ` Richard Biener
  2 siblings, 0 replies; 7+ messages in thread
From: Alan Lawrence @ 2015-04-28 14:22 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3734 bytes --]

Alan Lawrence wrote:
> Tree if-conversion currently bails out for loops that (a) contain nested loops; 
> (b) have more than one exit; (c) where the exit block (source of the exit edge) 
> does not dominate the loop latch; (d) where the exit block is the loop header, 
> or there are statements after the exit.
> 
> This patch removes restrictions (c) and (d). The intuition is that, for (c), "if 
> (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q) break;" and 
> this is mostly handled by existing code for propagating conditions. For (d), "if 
> (P) break; stmts" is equivalent to "if (!P) stmts; if (P) break;" - this 
> requires inserting the predicated stmts before the branch rather than after.
> 
> Mostly thus this patch is just removing assumptions about when we do/don't need 
> to store predicates. One 'gotcha' was in some test cases the latch block passed 
> into if-conversion is non-empty; in such cases, if-conversion will now restore 
> "good form" by moving the statement into the exit block (predicated with 
> !exit-condition).
> 
> The condition on dominance in add_to_predicate_list, I haven't quite managed to 
> convince myself is right; we _do_ want to store a predicate for the latch block 
> to handle the above case, but I'm not totally sure of the postdominance 
> condition - I think it may store conditions in cases where we don't really need 
> to (e.g. "for (;;) { ... if (P) { for (;;) ; } }" which might look nested but 
> isn't, and has no route to the function exit). However, storing conditions when 
> we don't need to, is OK, unlike failing to store when we do need to ;).
> 
> A simple example of the patch at work:
> 
> int
> foo ()
> {
>    for (int i = 0; i < N ; i++)
>    {
>      int m = (a[i] & i) ? 5 : 4;
>      b[i] = a[i] * m;
>    }
> }
> 
> compiled at -O3, -fdump-tree-ivcanon shows this immediately before 
> tree-if-conversion:
> 
> ...function entry, variables, etc...
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 6>;
> 
>    <bb 3>:
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    if (_6 != 0)
>      goto <bb 5>;
>    else
>      goto <bb 4>;
> 
>    <bb 4>:
> 
>    <bb 5>:
>    # m_14 = PHI <5(3), 4(4)>
> 
>    <bb 6>:
>    # m_2 = PHI <m_14(5), 4(2)>
>    # _15 = PHI <_5(5), _10(2)>
>    # i_16 = PHI <i_9(5), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 7>;
> 
> which previously was not if-converted. With this patch:
> 
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 4>;
> 
>    <bb 3>:
> 
>    <bb 4>:
>    # m_2 = PHI <m_14(3), 4(2)>
>    # _15 = PHI <_5(3), _10(2)>
>    # i_16 = PHI <i_9(3), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    m_14 = _6 != 0 ? 5 : 4;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
> 
>    <bb 5>:
>    return;
> 
> (Unfortunately the vectorizer still doesn't handle this loop either, but that's 
> another issue/patch...)
> 
> Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and aarch64-none-linux-gnu.
> Cross-tested check-gcc on aarch64-none-elf.
> I'm investigating impact on benchmarks - on AArch64 Spec2k6, this touches a 
> number of object files, leading to an overall slight decrease in the number of 
> instructions, but no change that looks significant (specifically, no more or 
> less vectorization).
> 
> Is this OK for trunk?
> 
> Cheers, Alan

Attached!

--Alan


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ifconv.patch --]
[-- Type: text/x-patch; name=ifconv.patch, Size: 7571 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 400ee01a5122c0c74b21a9b100f36b2951e45a46..6bd642e54848ed2faf72aa99987ab9f838634ca6 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -455,8 +455,8 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
     return;
 
   /* If dominance tells us this basic block is always executed,
-     don't record any predicates for it.  */
-  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+     i.e. it postdominates the header, don't record any predicates for it.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, loop->header, bb))
     return;
 
   dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
@@ -1025,11 +1025,10 @@ has_pred_critical_p (basic_block bb)
 
    Last restriction is valid if aggressive_if_conv is false.
 
-   EXIT_BB is the basic block containing the exit of the LOOP.  BB is
-   inside LOOP.  */
+   BB is inside LOOP.  */
 
 static bool
-if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
+if_convertible_bb_p (struct loop *loop, basic_block bb)
 {
   edge e;
   edge_iterator ei;
@@ -1044,30 +1043,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
       && !aggressive_if_conv)
     return false;
 
-  if (exit_bb)
-    {
-      if (bb != loop->latch)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "basic block after exit bb but before latch\n");
-	  return false;
-	}
-      else if (!empty_block_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "non empty basic block after exit bb\n");
-	  return false;
-	}
-      else if (bb == loop->latch
-	       && bb != exit_bb
-	       && !dominated_by_p (CDI_DOMINATORS, bb, exit_bb))
-	  {
-	    if (dump_file && (dump_flags & TDF_DETAILS))
-	      fprintf (dump_file, "latch is not dominated by exit_block\n");
-	    return false;
-	  }
-    }
-
   /* Be less adventurous and handle only normal edges.  */
   FOR_EACH_EDGE (e, ei, bb->succs)
     if (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP))
@@ -1199,15 +1174,6 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch and loop exit block are always executed and
-	 have no extra conditions to be processed: skip them.  */
-      if (bb == loop->latch
-	  || bb_with_exit_edge_p (loop, bb))
-	{
-	  reset_bb_predicate (bb);
-	  continue;
-	}
-
       cond = bb_predicate (bb);
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
@@ -1271,7 +1237,6 @@ if_convertible_loop_p_1 (struct loop *loop,
 {
   bool res;
   unsigned int i;
-  basic_block exit_bb = NULL;
 
   /* Don't if-convert the loop when the data dependences cannot be
      computed: the loop won't be vectorized in that case.  */
@@ -1295,11 +1260,8 @@ if_convertible_loop_p_1 (struct loop *loop,
     {
       basic_block bb = ifc_bbs[i];
 
-      if (!if_convertible_bb_p (loop, bb, exit_bb))
+      if (!if_convertible_bb_p (loop, bb))
 	return false;
-
-      if (bb_with_exit_edge_p (loop, bb))
-	exit_bb = bb;
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -1375,14 +1337,11 @@ if_convertible_loop_p_1 (struct loop *loop,
    - it is innermost,
    - it has two or more basic blocks,
    - it has only one exit,
-   - loop header is not the exit edge,
    - if its basic blocks and phi nodes are if convertible.  */
 
 static bool
 if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
 {
-  edge e;
-  edge_iterator ei;
   bool res = false;
   vec<data_reference_p> refs;
   vec<ddr_p> ddrs;
@@ -1411,12 +1370,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
       return false;
     }
 
-  /* If one of the loop header's edge is an exit edge then do not
-     apply if-conversion.  */
-  FOR_EACH_EDGE (e, ei, loop->header->succs)
-    if (loop_exit_edge_p (loop, e))
-      return false;
-
   refs.create (5);
   ddrs.create (25);
   auto_vec<loop_p, 3> loop_nest;
@@ -2249,7 +2202,6 @@ remove_conditions_and_labels (loop_p loop)
 static void
 combine_blocks (struct loop *loop, bool any_mask_load_store)
 {
-  basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
   edge e;
@@ -2265,10 +2217,10 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
-  exit_bb = NULL;
+  basic_block exit_bb = NULL;
   for (i = 0; i < orig_loop_num_nodes; i++)
     {
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
       free_bb_predicate (bb);
       if (bb_with_exit_edge_p (loop, bb))
 	{
@@ -2280,7 +2232,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
 
       for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei));)
 	{
@@ -2315,37 +2267,53 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
       set_immediate_dominator (CDI_DOMINATORS, loop->latch, loop->header);
     }
 
-  merge_target_bb = loop->header;
+  basic_block merge_target_bb = loop->header;
+  /* Indicates that any further statements must be inserted
+     _before_ the final jump.  */
+  bool seen_exit = (loop->header == exit_bb);
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
-      gimple_stmt_iterator gsi;
-      gimple_stmt_iterator last;
-
-      bb = ifc_bbs[i];
+      basic_block bb = ifc_bbs[i];
 
-      if (bb == exit_bb || bb == loop->latch)
-	continue;
+      if (bb == exit_bb)
+	{
+	  /* If possible, merge loop header to the block with the exit edge.
+	     This reduces the number of basic blocks to two, to please the
+	     vectorizer that handles only loops with two nodes.  */
+	  if (can_merge_blocks_p (loop->header, bb))
+	    merge_blocks (loop->header, bb);
+	  else
+	    {
+	      /* Must leave stmts in this BB; move following stmts here too.  */
+	      merge_target_bb = bb;
+	    }
+	  seen_exit = true;
+	  continue;
+	}
 
       /* Make stmts member of loop->header.  */
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi);
+	   gsi_next (&gsi))
 	gimple_set_bb (gsi_stmt (gsi), merge_target_bb);
 
       /* Update stmt list.  */
-      last = gsi_last_bb (merge_target_bb);
-      gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT);
+      gimple_stmt_iterator last = gsi_last_bb (merge_target_bb);
+
+      if (seen_exit)
+	{
+	  gcc_assert (is_ctrl_stmt (gsi_stmt (last)));
+	  gsi_insert_seq_before (&last, bb_seq(bb), GSI_SAME_STMT);
+	}
+      else
+	  gsi_insert_seq_after (&last, bb_seq (bb), GSI_NEW_STMT);
+
       set_bb_seq (bb, NULL);
 
-      delete_basic_block (bb);
+      if (bb != loop->latch)
+	delete_basic_block (bb);
     }
 
-  /* If possible, merge loop header to the block with the exit edge.
-     This reduces the number of basic blocks to two, to please the
-     vectorizer that handles only loops with two nodes.  */
-  if (exit_bb
-      && exit_bb != loop->header
-      && can_merge_blocks_p (loop->header, exit_bb))
-    merge_blocks (loop->header, exit_bb);
-
   free (ifc_bbs);
   ifc_bbs = NULL;
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-28 13:57 [PATCH] Remove some restrictions on loop shape in tree-if-conv.c Alan Lawrence
  2015-04-28 14:22 ` Alan Lawrence
@ 2015-04-29  5:01 ` Jeff Law
  2015-04-29  9:27   ` Alan Lawrence
  2015-04-30 10:26 ` Richard Biener
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2015-04-29  5:01 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 04/28/2015 07:55 AM, Alan Lawrence wrote:
> Tree if-conversion currently bails out for loops that (a) contain nested
> loops; (b) have more than one exit; (c) where the exit block (source of
> the exit edge) does not dominate the loop latch; (d) where the exit
> block is the loop header, or there are statements after the exit.
>
> This patch removes restrictions (c) and (d). The intuition is that, for
> (c), "if (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if
> (P&&Q) break;" and this is mostly handled by existing code for
> propagating conditions. For (d), "if (P) break; stmts" is equivalent to
> "if (!P) stmts; if (P) break;" - this requires inserting the predicated
> stmts before the branch rather than after.
>
> Mostly thus this patch is just removing assumptions about when we
> do/don't need to store predicates. One 'gotcha' was in some test cases
> the latch block passed into if-conversion is non-empty; in such cases,
> if-conversion will now restore "good form" by moving the statement into
> the exit block (predicated with !exit-condition).
>
> The condition on dominance in add_to_predicate_list, I haven't quite
> managed to convince myself is right; we _do_ want to store a predicate
> for the latch block to handle the above case, but I'm not totally sure
> of the postdominance condition - I think it may store conditions in
> cases where we don't really need to (e.g. "for (;;) { ... if (P) { for
> (;;) ; } }" which might look nested but isn't, and has no route to the
> function exit). However, storing conditions when we don't need to, is
> OK, unlike failing to store when we do need to ;).
>
> A simple example of the patch at work:
>
> int
> foo ()
> {
>    for (int i = 0; i < N ; i++)
>    {
>      int m = (a[i] & i) ? 5 : 4;
>      b[i] = a[i] * m;
>    }
> }
>
> compiled at -O3, -fdump-tree-ivcanon shows this immediately before
> tree-if-conversion:
>
> ...function entry, variables, etc...
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 6>;
>
>    <bb 3>:
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    if (_6 != 0)
>      goto <bb 5>;
>    else
>      goto <bb 4>;
>
>    <bb 4>:
>
>    <bb 5>:
>    # m_14 = PHI <5(3), 4(4)>
>
>    <bb 6>:
>    # m_2 = PHI <m_14(5), 4(2)>
>    # _15 = PHI <_5(5), _10(2)>
>    # i_16 = PHI <i_9(5), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 7>;
>
> which previously was not if-converted. With this patch:
>
>    <bb 2>:
>    _10 = a[0];
>    goto <bb 4>;
>
>    <bb 3>:
>
>    <bb 4>:
>    # m_2 = PHI <m_14(3), 4(2)>
>    # _15 = PHI <_5(3), _10(2)>
>    # i_16 = PHI <i_9(3), 0(2)>
>    # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
>    _7 = m_2 * _15;
>    b[i_16] = _7;
>    i_9 = i_16 + 1;
>    ivtmp_3 = ivtmp_13 - 1;
>    _5 = a[i_9];
>    _6 = _5 & i_9;
>    m_14 = _6 != 0 ? 5 : 4;
>    if (ivtmp_3 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
>
>    <bb 5>:
>    return;
>
> (Unfortunately the vectorizer still doesn't handle this loop either, but
> that's another issue/patch...)
>
> Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and
> aarch64-none-linux-gnu.
> Cross-tested check-gcc on aarch64-none-elf.
> I'm investigating impact on benchmarks - on AArch64 Spec2k6, this
> touches a number of object files, leading to an overall slight decrease
> in the number of instructions, but no change that looks significant
> (specifically, no more or less vectorization).
>
> Is this OK for trunk?
ENOPATCH
jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-29  5:01 ` Jeff Law
@ 2015-04-29  9:27   ` Alan Lawrence
  0 siblings, 0 replies; 7+ messages in thread
From: Alan Lawrence @ 2015-04-29  9:27 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Sorry, I realize I forgot to attach the patch to the original email, this 
followed a couple of minutes later in message <553F91B9.7050703@arm.com> at 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01745.html .

Cheers, Alan

Jeff Law wrote:
> On 04/28/2015 07:55 AM, Alan Lawrence wrote:
>> Tree if-conversion currently bails out for loops that (a) contain nested
>> loops; (b) have more than one exit; (c) where the exit block (source of
>> the exit edge) does not dominate the loop latch; (d) where the exit
>> block is the loop header, or there are statements after the exit.
>>
>> This patch removes restrictions (c) and (d). The intuition is that, for
>> (c), "if (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if
>> (P&&Q) break;" and this is mostly handled by existing code for
>> propagating conditions. For (d), "if (P) break; stmts" is equivalent to
>> "if (!P) stmts; if (P) break;" - this requires inserting the predicated
>> stmts before the branch rather than after.
>>
>> Mostly thus this patch is just removing assumptions about when we
>> do/don't need to store predicates. One 'gotcha' was in some test cases
>> the latch block passed into if-conversion is non-empty; in such cases,
>> if-conversion will now restore "good form" by moving the statement into
>> the exit block (predicated with !exit-condition).
>>
>> The condition on dominance in add_to_predicate_list, I haven't quite
>> managed to convince myself is right; we _do_ want to store a predicate
>> for the latch block to handle the above case, but I'm not totally sure
>> of the postdominance condition - I think it may store conditions in
>> cases where we don't really need to (e.g. "for (;;) { ... if (P) { for
>> (;;) ; } }" which might look nested but isn't, and has no route to the
>> function exit). However, storing conditions when we don't need to, is
>> OK, unlike failing to store when we do need to ;).
>>
>> A simple example of the patch at work:
>>
>> int
>> foo ()
>> {
>>    for (int i = 0; i < N ; i++)
>>    {
>>      int m = (a[i] & i) ? 5 : 4;
>>      b[i] = a[i] * m;
>>    }
>> }
>>
>> compiled at -O3, -fdump-tree-ivcanon shows this immediately before
>> tree-if-conversion:
>>
>> ...function entry, variables, etc...
>>    <bb 2>:
>>    _10 = a[0];
>>    goto <bb 6>;
>>
>>    <bb 3>:
>>    _5 = a[i_9];
>>    _6 = _5 & i_9;
>>    if (_6 != 0)
>>      goto <bb 5>;
>>    else
>>      goto <bb 4>;
>>
>>    <bb 4>:
>>
>>    <bb 5>:
>>    # m_14 = PHI <5(3), 4(4)>
>>
>>    <bb 6>:
>>    # m_2 = PHI <m_14(5), 4(2)>
>>    # _15 = PHI <_5(5), _10(2)>
>>    # i_16 = PHI <i_9(5), 0(2)>
>>    # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
>>    _7 = m_2 * _15;
>>    b[i_16] = _7;
>>    i_9 = i_16 + 1;
>>    ivtmp_3 = ivtmp_13 - 1;
>>    if (ivtmp_3 != 0)
>>      goto <bb 3>;
>>    else
>>      goto <bb 7>;
>>
>> which previously was not if-converted. With this patch:
>>
>>    <bb 2>:
>>    _10 = a[0];
>>    goto <bb 4>;
>>
>>    <bb 3>:
>>
>>    <bb 4>:
>>    # m_2 = PHI <m_14(3), 4(2)>
>>    # _15 = PHI <_5(3), _10(2)>
>>    # i_16 = PHI <i_9(3), 0(2)>
>>    # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
>>    _7 = m_2 * _15;
>>    b[i_16] = _7;
>>    i_9 = i_16 + 1;
>>    ivtmp_3 = ivtmp_13 - 1;
>>    _5 = a[i_9];
>>    _6 = _5 & i_9;
>>    m_14 = _6 != 0 ? 5 : 4;
>>    if (ivtmp_3 != 0)
>>      goto <bb 3>;
>>    else
>>      goto <bb 5>;
>>
>>    <bb 5>:
>>    return;
>>
>> (Unfortunately the vectorizer still doesn't handle this loop either, but
>> that's another issue/patch...)
>>
>> Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and
>> aarch64-none-linux-gnu.
>> Cross-tested check-gcc on aarch64-none-elf.
>> I'm investigating impact on benchmarks - on AArch64 Spec2k6, this
>> touches a number of object files, leading to an overall slight decrease
>> in the number of instructions, but no change that looks significant
>> (specifically, no more or less vectorization).
>>
>> Is this OK for trunk?
> ENOPATCH
> jeff
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-28 13:57 [PATCH] Remove some restrictions on loop shape in tree-if-conv.c Alan Lawrence
  2015-04-28 14:22 ` Alan Lawrence
  2015-04-29  5:01 ` Jeff Law
@ 2015-04-30 10:26 ` Richard Biener
  2015-04-30 10:45   ` Alan Lawrence
  2 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2015-04-30 10:26 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Tue, Apr 28, 2015 at 3:55 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Tree if-conversion currently bails out for loops that (a) contain nested
> loops; (b) have more than one exit; (c) where the exit block (source of the
> exit edge) does not dominate the loop latch; (d) where the exit block is the
> loop header, or there are statements after the exit.
>
> This patch removes restrictions (c) and (d). The intuition is that, for (c),
> "if (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q)
> break;" and this is mostly handled by existing code for propagating
> conditions. For (d), "if (P) break; stmts" is equivalent to "if (!P) stmts;
> if (P) break;" - this requires inserting the predicated stmts before the
> branch rather than after.

Hum - so you empty the latch by conditionalizing code on the exit condition?

> Mostly thus this patch is just removing assumptions about when we do/don't
> need to store predicates. One 'gotcha' was in some test cases the latch
> block passed into if-conversion is non-empty; in such cases, if-conversion
> will now restore "good form" by moving the statement into the exit block
> (predicated with !exit-condition).

Indeed.

> The condition on dominance in add_to_predicate_list, I haven't quite managed
> to convince myself is right; we _do_ want to store a predicate for the latch
> block to handle the above case, but I'm not totally sure of the
> postdominance condition - I think it may store conditions in cases where we
> don't really need to (e.g. "for (;;) { ... if (P) { for (;;) ; } }" which
> might look nested but isn't, and has no route to the function exit).
> However, storing conditions when we don't need to, is OK, unlike failing to
> store when we do need to ;).

So you still restrict loop form to two blocks - just the latch may now be
non-empty?  Thus I'd say keeping the existing check but amending it by
&& bb != loop->latch would be better.

Otherwise the patch looks good to me.

Can you please add at least one testcase for c) and d) where we now
vectorize something after the patch but not before?

Thanks,
Richard.

> A simple example of the patch at work:
>
> int
> foo ()
> {
>   for (int i = 0; i < N ; i++)
>   {
>     int m = (a[i] & i) ? 5 : 4;
>     b[i] = a[i] * m;
>   }
> }
>
> compiled at -O3, -fdump-tree-ivcanon shows this immediately before
> tree-if-conversion:
>
> ...function entry, variables, etc...
>   <bb 2>:
>   _10 = a[0];
>   goto <bb 6>;
>
>   <bb 3>:
>   _5 = a[i_9];
>   _6 = _5 & i_9;
>   if (_6 != 0)
>     goto <bb 5>;
>   else
>     goto <bb 4>;
>
>   <bb 4>:
>
>   <bb 5>:
>   # m_14 = PHI <5(3), 4(4)>
>
>   <bb 6>:
>   # m_2 = PHI <m_14(5), 4(2)>
>   # _15 = PHI <_5(5), _10(2)>
>   # i_16 = PHI <i_9(5), 0(2)>
>   # ivtmp_13 = PHI <ivtmp_3(5), 32(2)>
>   _7 = m_2 * _15;
>   b[i_16] = _7;
>   i_9 = i_16 + 1;
>   ivtmp_3 = ivtmp_13 - 1;
>   if (ivtmp_3 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 7>;
>
> which previously was not if-converted. With this patch:
>
>   <bb 2>:
>   _10 = a[0];
>   goto <bb 4>;
>
>   <bb 3>:
>
>   <bb 4>:
>   # m_2 = PHI <m_14(3), 4(2)>
>   # _15 = PHI <_5(3), _10(2)>
>   # i_16 = PHI <i_9(3), 0(2)>
>   # ivtmp_13 = PHI <ivtmp_3(3), 32(2)>
>   _7 = m_2 * _15;
>   b[i_16] = _7;
>   i_9 = i_16 + 1;
>   ivtmp_3 = ivtmp_13 - 1;
>   _5 = a[i_9];
>   _6 = _5 & i_9;
>   m_14 = _6 != 0 ? 5 : 4;
>   if (ivtmp_3 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 5>;
>
>   <bb 5>:
>   return;
>
> (Unfortunately the vectorizer still doesn't handle this loop either, but
> that's another issue/patch...)
>
> Bootstrapped + check-gcc on x86_64-unknown-linux-gnu and
> aarch64-none-linux-gnu.
> Cross-tested check-gcc on aarch64-none-elf.
> I'm investigating impact on benchmarks - on AArch64 Spec2k6, this touches a
> number of object files, leading to an overall slight decrease in the number
> of instructions, but no change that looks significant (specifically, no more
> or less vectorization).
>
> Is this OK for trunk?
>
> Cheers, Alan
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-30 10:26 ` Richard Biener
@ 2015-04-30 10:45   ` Alan Lawrence
  2015-04-30 11:07     ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Lawrence @ 2015-04-30 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, law

Richard Biener wrote:
> On Tue, Apr 28, 2015 at 3:55 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> Tree if-conversion currently bails out for loops that (a) contain nested
>> loops; (b) have more than one exit; (c) where the exit block (source of the
>> exit edge) does not dominate the loop latch; (d) where the exit block is the
>> loop header, or there are statements after the exit.
>>
>> This patch removes restrictions (c) and (d). The intuition is that, for (c),
>> "if (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q)
>> break;" and this is mostly handled by existing code for propagating
>> conditions. For (d), "if (P) break; stmts" is equivalent to "if (!P) stmts;
>> if (P) break;" - this requires inserting the predicated stmts before the
>> branch rather than after.
> 
> Hum - so you empty the latch by conditionalizing code on the exit condition?

Well, !(exit condition), but yes.

> So you still restrict loop form to two blocks - just the latch may now be
> non-empty?  Thus I'd say keeping the existing check but amending it by
> && bb != loop->latch would be better.

The idea was to try to end up with a loop with exactly two blocks, a main block 
with a condition at the end, and an empty latch; but to convert more 
bad-loop-form loops into this form.

> Otherwise the patch looks good to me.
> 
> Can you please add at least one testcase for c) and d) where we now
> vectorize something after the patch but not before?

So I think I have made an inconsistency, by changing the logic for 
dominance-of-latch to postdominance-of-header, in one place but not another 
(where it deals with conditional stores) - but I haven't managed to tickle that yet.

However, I'm struggling to find a case where this patch enables vectorization; 
the fancy if-converted loops tend to have other problems preventing 
vectorization, e.g. location of PHI nodes. In contrast, your suggestion of 
putting in another loop-header-copying pass, enables both if-conversion (with 
the existing tree-if-conv.c) and vectorization of a bunch of things (including 
the example I posted of this patch if-converting but still not vectorizing). So 
(short of massive changes to the vectorizer) that approach now feels more 
promising, although there are a good bunch of scan-tree-dump test failures that 
I need to look into...

Where (something like) this patch might be useful, could be as a first step to 
handling loops with multiple exits, that is, changing

for (;;)
{
  S1;
  if (P) break;
  S2;
  if (Q) break;
  S3;
}

into the equivalent of

for (;;)
{
   S1;
   if (!P) S2;
   if (!P && !Q) S3;
   if (P || Q) break;
}

But that's more work, and another patch, and I'm not yet clear how many loops of 
that form the vectorizer would do anything with anyway (let alone profitably!)...

Cheers, Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Remove some restrictions on loop shape in tree-if-conv.c
  2015-04-30 10:45   ` Alan Lawrence
@ 2015-04-30 11:07     ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2015-04-30 11:07 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, law

On Thu, Apr 30, 2015 at 12:34 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Richard Biener wrote:
>>
>> On Tue, Apr 28, 2015 at 3:55 PM, Alan Lawrence <alan.lawrence@arm.com>
>> wrote:
>>>
>>> Tree if-conversion currently bails out for loops that (a) contain nested
>>> loops; (b) have more than one exit; (c) where the exit block (source of
>>> the
>>> exit edge) does not dominate the loop latch; (d) where the exit block is
>>> the
>>> loop header, or there are statements after the exit.
>>>
>>> This patch removes restrictions (c) and (d). The intuition is that, for
>>> (c),
>>> "if (P) {... if (Q) break;}" is equivalent to "if (P) {...}; if (P&&Q)
>>> break;" and this is mostly handled by existing code for propagating
>>> conditions. For (d), "if (P) break; stmts" is equivalent to "if (!P)
>>> stmts;
>>> if (P) break;" - this requires inserting the predicated stmts before the
>>> branch rather than after.
>>
>>
>> Hum - so you empty the latch by conditionalizing code on the exit
>> condition?
>
>
> Well, !(exit condition), but yes.
>
>> So you still restrict loop form to two blocks - just the latch may now be
>> non-empty?  Thus I'd say keeping the existing check but amending it by
>> && bb != loop->latch would be better.
>
>
> The idea was to try to end up with a loop with exactly two blocks, a main
> block with a condition at the end, and an empty latch; but to convert more
> bad-loop-form loops into this form.
>
>> Otherwise the patch looks good to me.
>>
>> Can you please add at least one testcase for c) and d) where we now
>> vectorize something after the patch but not before?
>
>
> So I think I have made an inconsistency, by changing the logic for
> dominance-of-latch to postdominance-of-header, in one place but not another
> (where it deals with conditional stores) - but I haven't managed to tickle
> that yet.
>
> However, I'm struggling to find a case where this patch enables
> vectorization; the fancy if-converted loops tend to have other problems
> preventing vectorization, e.g. location of PHI nodes. In contrast, your
> suggestion of putting in another loop-header-copying pass, enables both
> if-conversion (with the existing tree-if-conv.c) and vectorization of a
> bunch of things (including the example I posted of this patch if-converting
> but still not vectorizing). So (short of massive changes to the vectorizer)
> that approach now feels more promising, although there are a good bunch of
> scan-tree-dump test failures that I need to look into...

Heh.  I would have said the loop header copying could be done by the
vectorizer itself when it detects the non-empty latch simply call a factored
function from loop header copying that forcefully duplicates the header
(well, or simply inline the short relevant portion).

We'd eventually want to undo this if vectorization fails but that's possibly
a secondary concern.

Another possibility is to piggy-back that onto the if-conversion pass and
force the use of versioning for that.

Moving the CH pass is sth entirely different and I'd rather not do that.

Richard.


> Where (something like) this patch might be useful, could be as a first step
> to handling loops with multiple exits, that is, changing
>
> for (;;)
> {
>  S1;
>  if (P) break;
>  S2;
>  if (Q) break;
>  S3;
> }
>
> into the equivalent of
>
> for (;;)
> {
>   S1;
>   if (!P) S2;
>   if (!P && !Q) S3;
>   if (P || Q) break;
> }
>
> But that's more work, and another patch, and I'm not yet clear how many
> loops of that form the vectorizer would do anything with anyway (let alone
> profitably!)...
>
> Cheers, Alan
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-04-30 10:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 13:57 [PATCH] Remove some restrictions on loop shape in tree-if-conv.c Alan Lawrence
2015-04-28 14:22 ` Alan Lawrence
2015-04-29  5:01 ` Jeff Law
2015-04-29  9:27   ` Alan Lawrence
2015-04-30 10:26 ` Richard Biener
2015-04-30 10:45   ` Alan Lawrence
2015-04-30 11:07     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).