public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
@ 2014-10-13 10:00 Yuri Rumyantsev
  2014-10-15 10:09 ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-13 10:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 22557 bytes --]

Richard,

Here is updated patch (part1) for extended if conversion.

Second part of patch will be sent later.

Changelog.

2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_to_predicate_list): Check unconditionally that bb is always
executed to early exit. Use predicate of cd-equivalent block
for join blocks if it exists.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if
destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also. Add check that if
fold_build2 produces bool conversion, recompute predicate using
build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_all_scalar_phis): Introduce new variable BEFORE.
Invoke find_insertion_point to initialize gsi and
predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
that extended predication must be applied).
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd and
FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
for blocks with two successors.




2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> here is reduced patch (part.1) which was reduced almost twice.
> Let's me also answer on your comments.
>
> 1. I really use edge field 'aux' to keep predicate for critical edges.
> My previous code was not correct and now it looks like:
>
>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>     /* Edge E is not critical,  use predicate of edge source bb. */
>     c = bb_predicate (b);
>   else
>     /* Edge E is critical and its aux field contains predicate.  */
>     c = edge_predicate (e);
>
> 2. I completely delete all code related to creation of conditional
> expressions and completely rely on bool pattern recognition in
> vectorizer. But we need to delete all dead predicate computations
> which are not used since they prevent vectorization. I will add this
> local-dce function in next patch.
> 3. I also did not include in this patch recognition of general
> phi-nodes with two arguments only for which conversion of conditional
> scalar reduction can be applied also.
> Note that all these changes are applied for loop marked with pragma
> omp simd only.
>
> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (convert_name_to_cmp): New function.
> (add_to_predicate_list): Check unconditionally that bb is always
> executed to early exit. Use predicate of cd-equivalent block
> for join blocks if it exists.
> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
> destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
> to reject block if-conversion with incoming critical edges only if
> FLAG_FORCE_VECTORIZE was not set-up.
> (predicate_bbs): Skip loop exit block also. Add check that if
> fold_build2 produces bool conversion, recompute predicate using
> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_all_scalar_phis): Introduce new variable BEFORE.
> Invoke find_insertion_point to initialize gsi and
> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
> that extended predication must be applied).
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare).Do loop versioning
> for innermost loop marked with pragma omp simd and
> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
> for blocks with two successors.
>
>
>
>
> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard!
>>> Here is updated patch with the following changes:
>>>
>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>> negate_predicate was deleted.
>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>> be critical.
>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>> blocks to simplify it.
>>> 5. I decided to not design pre-pass since it will lead generating
>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>> of kind
>>>   x = PHI <1(2), 1(3), 2(4)>
>>> only one cond expression is required and this is considered as simple
>>> optimization for arbitrary phi-function. More precise,
>>> if phi-function have only two different arguments and one of them has
>>> single occurrence, if- conversion is performed as if phi have only 2
>>> arguments.
>>> For arbitrary phi function a chain of cond expressions is produced.
>>>
>>> Updated patch is attached.
>>>
>>> Any comments will be appreciated.
>>
>> The patch is still very big and does multiple things at once which makes
>> it hard to review.
>>
>> In addition to that it changes function singatures without updating
>> the function comments.  For example what is the convert_bool
>> argument doing to add_to_dst_predicate_list?  Why do we need
>> all this added logic.
>>
>> You duplicate operand_equal_for_phi_arg_p.
>>
>> I think the code handling PHIs with more than two operands but
>> only two unequal operands is useful generally, so that's an obvious
>> candidate for splitting out into a separate patch.
>>
>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>> +   which is not supported by vectorizer to int type through creating of
>> +   conditional expressions.  */
>>
>> Example?  The vectorizer has patterns for bool predicate computations.
>> This seems to be another feature that needs splitting out.
>>
>> The way you get around the critical edge parts looks awkward to me.
>> Please either do _all_ predicates as edge predicates or simply
>> split critical edges (of the respective loop body).
>>
>> I still think that an utility doing same PHI arg merging by introducing
>> forwarder blocks would be nicer to have.
>>
>> I'd restructure the main tree_if_conversion function to apply these
>> CFG pre-transforms when we are going to version the loop
>> for if conversion (eventually transitioning to always doing that).
>>
>> So - please split up the patch.  It's way too big.
>>
>> Thanks,
>> Richard.
>>
>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>> (convert_name_to_cmp): New function.
>>> (get_type_for_cond): New function.
>>> (convert_bool_predicate): New function.
>>> (predicate_disjunction): New function.
>>> (predicate_conjunction): New function.
>>> (add_to_predicate_list): Add convert_bool argument.
>>> Use predicate of cd-equivalent block if convert_bool is true and
>>> such bb exists; save it in static variable for further possible use.
>>> Add call of predicate_disjunction if convert_bool argument is true.
>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>> Add early function exit if edge target block is always executed.
>>> Add call of predicate_conjunction if convert_bool argument is true.
>>> Pass convert_bool argument for add_to_predicate_list.
>>> Set-up predicate for crritical edge if convert_bool is true.
>>> (equal_phi_args): New function.
>>> (phi_has_two_different_args): New function.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if flag_force_vectorize wa set-up.
>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>> (if_convertible_stmt_p): Allow calls of function clones if
>>> flag_force_vectorize was set-up.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>> to reject block if-conversion with imcoming critical edges only if
>>> flag_force_vectorize was not set-up.
>>> (walk_cond_tree): New function.
>>> (vect_bool_pattern_is_applicable): New function.
>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>> comparison expressions of boolean type into conditional expressions
>>> with integral operands. If convert_bool argument was set-up and
>>> vect bool pattern can be appied perform the following transformation:
>>> (bool) x != 0  --> y = (int) x; x != 0;
>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>> was set-up, recompute predicate using build2_loc. Additional argument
>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>> add_to_predicate_list.
>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>> Call predicate_bbs with additional argument equal to false.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>> phi arguments must be evaluated through phi_has_two_different_args.
>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_arbitrary_phi): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>> iterator for predication of extended scalar phi's for insertion.
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>> predication to build mask.
>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>> for innermost loop marked with pragma omp simd.
>>>
>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> We implemented additional support for pragma omp simd in part of
>>>>> extended if-conversion loops with such pragma. These extensions
>>>>> include:
>>>>>
>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>    loops behavior was not changed.
>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>    predecessors.
>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>
>>>> How is that so?  If the PHI is predicated then its result will be used
>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>
>>>> No?
>>>>
>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>> with some limitations:
>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>    arguments are different and one of them has the only occurence,
>>>>> transformation to  single COND_EXPR can be done.
>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>    will be generated for it. In current design very simple check is used:
>>>>>    check starting from end that two edges correspondent to neighbor
>>>>> arguments have common predecessor which is used for further check
>>>>> with next edge.
>>>>>  These guarantee that phi predication will produce the correct result.
>>>>
>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>> inserting forwarder blocks.  Thus
>>>>
>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>
>>>> becomes
>>>>
>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>
>>>>   x = PHI <1(5), 2(4)>
>>>>
>>>> and
>>>>
>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>
>>>> becomes
>>>>
>>>>   bb 5:
>>>>   x' = PHI <1(2), 2(3)>
>>>>
>>>>   b = PHI<x'(5), 3(4)>
>>>>
>>>> which means that 3) has to work.  Note that we want this kind of
>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>> copies we need to insert on edges.
>>>>
>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>> And make 3) work properly if it doesn't already.
>>>>
>>>> It looks like you introduce a "negate predicate" to work around the
>>>> critical edge limitation?  Please instead change if-conversion to
>>>> work with edge predicates (as opposed to BB predicates).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>> #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>       if (c[i] != 0)
>>>>> res += 1;
>>>>>   }
>>>>>   <bb 4>:
>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>   t_5 = a[i_16];
>>>>>   _6 = t_5 > 0.0;
>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>   _8 = _7 & _6;
>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>   _10 = &c[i_16];
>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>   res_1 = res_15 + _ifc__35;
>>>>>   i_11 = i_16 + 1;
>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>   if (ivtmp_14 != 0)
>>>>>     goto <bb 4>;
>>>>>
>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>
>>>>> gcc/ChageLog
>>>>>
>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>> (bb_negate_predicate): New function.
>>>>> (set_bb_negate_predicate): New function.
>>>>> (bb_copy_predicate): New function.
>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>> (convert_name_to_cmp): New function.
>>>>> (get_type_for_cond): New function.
>>>>> (convert_bool_predicate): New function.
>>>>> (predicate_disjunction): New function.
>>>>> (predicate_conjunction): New function.
>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>> Add early function exit if edge target block is always executed.
>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>> (equal_phi_args): New function.
>>>>> (phi_has_two_different_args): New function.
>>>>> (phi_args_disjoint): New function.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>> in non-predicated basic blocks.
>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>> flag_force_vectorize was not setup.
>>>>> (walk_cond_tree): New function.
>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>> comparison expressions of boolean type into conditional expressions
>>>>> with integral operands. If bool_conv argument is false or both
>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>> is used, otherwise the following code was added: check on applicable
>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>> compute predicates for both outgoing edges one of which is critical
>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>> but generated gimplified statements are stored in their destination
>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>> equal to false.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>> (get_predicate_for_edge): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_phi_disjoint_args): New function.
>>>>> (predicate_extended_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>> predicates at the block begining for extended if-conversion.
>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>> predication to build mask.
>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>> (split_crit_edge): New function.
>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: patch.part-1 --]
[-- Type: application/octet-stream, Size: 21959 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
old mode 100644
new mode 100755
index 1f8ef03..f213506
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +152,16 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +173,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -396,25 +419,51 @@ fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 }
 
 /* Add condition NC to the predicate list of basic block BB.  LOOP is
-   the loop to be if-converted.  */
+   the loop to be if-converted. Use predicate of cd-equivalent block
+   if it exists for join bb.  */
 
 static inline void
 add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 {
   tree bc, *tp;
+  basic_block dom_bb;
+  static basic_block join_bb = NULL;
 
   if (is_true_predicate (nc))
     return;
 
-  if (!is_predicated (bb))
+  /* If dominance tells us this basic block is always executed,
+     don't record any predicates for it.  */
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+    return;
+
+  /* If predicate has been already set up for given bb using cd-equivalent
+     block predicate, simply escape. Post-dominator tree was built under
+     flag_force_vectorize only.  */
+  if (flag_force_vectorize)
     {
-      /* If dominance tells us this basic block is always executed, don't
-	 record any predicates for it.  */
-      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+      if (join_bb == bb)
 	return;
+      dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      /* We use notion of cd equivalence to get simplier predicate for
+	 join block, e.g. if join block has 2 predecessors with predicates
+	 p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
+	 p1 & p2 | p1 & !p2.  */
+      if (dom_bb != loop->header
+	  && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
+	{
+	  gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
+	  bc = bb_predicate (dom_bb);
+	  gcc_assert (!is_true_predicate (bc));
+	  set_bb_predicate (bb, bc);
 
-      bc = nc;
+	  /* Save bb in join_bb to not handle it once more.  */
+	  join_bb = bb;
+	  return;
+	}
     }
+  if (!is_predicated (bb))
+    bc = nc;
   else
     {
       bc = bb_predicate (bb);
@@ -455,10 +504,15 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -482,7 +536,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -494,11 +550,18 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -728,7 +791,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -865,7 +928,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -912,6 +976,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 2 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -920,6 +1000,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -932,9 +1014,13 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
 
   if (exit_bb)
     {
@@ -971,18 +1057,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_edges_are_critical (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1064,6 +1149,7 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1096,9 +1182,10 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
 	  reset_bb_predicate (loop->latch);
 	  continue;
@@ -1108,27 +1195,41 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
-
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+
+	  /* Compute predicates for true and false edges.  */
+	  c = fold_build2_loc (loc, code,
+			       boolean_type_node,
+			       lopnd,
+			       gimple_cond_rhs (stmt));
+	  /* Fold_build2 can produce bool conversion which is not
+             supported by vectorizer, so re-build it without folding.
+	     For example, such conversion is generated for sequence:
+		_Bool _7, _8, _9;
+		_7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
+		if (_9 != 0)  --> (bool)_9.  */
+
+	  if (CONVERT_EXPR_P (c)
+	      && TREE_CODE_CLASS (code) == tcc_comparison)
+	    c = build2_loc (loc, code, boolean_type_node,
+			    lopnd, gimple_cond_rhs (stmt));
+	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			   unshare_expr (c));
 
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+	  if (flag_force_vectorize)
+	    true_edge->aux = false_edge->aux = NULL;
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
 
 	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
-	  add_to_dst_predicate_list (loop, false_edge,
-				     unshare_expr (cond), c2);
+	  add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
+				     unshare_expr (c2));
 
 	  cond = NULL_TREE;
 	}
@@ -1176,6 +1277,8 @@ if_convertible_loop_p_1 (struct loop *loop,
     return false;
 
   calculate_dominance_info (CDI_DOMINATORS);
+  if (flag_force_vectorize)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
   ifc_bbs = get_loop_body_in_if_conv_order (loop);
@@ -1337,7 +1440,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1346,7 +1451,13 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
+  if (EDGE_COUNT (bb->preds) != 2
+      || all_edges_are_critical (bb))
+    {
+      gcc_assert (flag_force_vectorize);
+      return NULL;
+    }
+
   first_edge = EDGE_PRED (bb, 0);
   second_edge = EDGE_PRED (bb, 1);
 
@@ -1624,6 +1735,237 @@ predicate_scalar_phi (gimple phi, tree cond,
     }
 }
 
+/* Returns predicate of edge associated with argument of phi node.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
+    /* Edge E is not critical, use predicate of edge source bb.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+  return c;
+}
+
+/* This is enhancement for predication of a phi node with arbitrary
+   number of arguments, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   a chain of recurrent cond expressions will be produced.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point.  */
+
+static void
+predicate_arbitrary_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+				bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  gimple stmt;
+  tree lhs;
+  tree rhs;
+  tree res;
+  tree cond;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, i));
+      swap = false;
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  swap = true;
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+
+      stmt = gimple_build_assign_with_ops (COND_EXPR, lhs,
+					   unshare_expr (cond),
+					   swap? last : curr,
+					   swap? curr : last);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, 0));
+  swap = false;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      cond = TREE_OPERAND (cond, 0);
+      swap = true;
+    }
+  if (before)
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, true,
+				       GSI_SAME_STMT);
+  else
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, false,
+				       GSI_CONTINUE_LINKING);
+  rhs = fold_build_cond_expr (type,
+			      unshare_expr (cond),
+			      swap? last : curr,
+			      swap? curr : last);
+  stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+  update_stmt (stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new phi replacement stmt\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+}
+
+/* Returns gimple statement iterator to insert code for predicated phi.  */
+
+static gimple_stmt_iterator
+find_insertion_point (basic_block bb, bool* before)
+{
+  edge e;
+  edge_iterator ei;
+  tree cond;
+  gimple last = NULL;
+  gimple curr;
+  int num_opnd;
+  tree opnd1, opnd2;
+
+  /* Found last statement in bb after which code for predicated phi can be
+     inserted using edge predicates.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == SSA_NAME)
+	{
+	  opnd1 = cond;
+	  opnd2 = NULL_TREE;
+	}
+      else if (TREE_CONSTANT (cond))
+	continue;
+      else if ((num_opnd = TREE_OPERAND_LENGTH (cond)) == 2)
+	{
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = TREE_OPERAND (cond, 1);
+	}
+      else
+	{
+	  gcc_assert (num_opnd == 1);
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = NULL_TREE;
+	}
+      /* Process each operand of cond to determine the latest defenition.  */
+      while (true)
+	{
+	  if (TREE_CODE (opnd1) == SSA_NAME)
+	    {
+	      curr = SSA_NAME_DEF_STMT (opnd1);
+	      /* Skip defenition in other bb's.  */
+	      if (gimple_bb (curr) == bb)
+		{
+		  if (last == NULL)
+		    last = curr;
+		  else
+		    {
+		      /* Determine what stmt is latest in bb.  */
+		      gimple_stmt_iterator gsi;
+		      gimple stmt;
+		      for (gsi = gsi_last_bb (bb);
+			   !gsi_end_p (gsi);
+			    gsi_prev (&gsi))
+			if ((stmt = gsi_stmt (gsi)) == last)
+			  break;
+			else if (stmt == curr)
+			  {
+			    last = curr;
+			    break;
+			  }
+		    }
+		}
+	    }
+	    if (opnd2 != NULL_TREE)
+	      {
+		opnd1 = opnd2;
+		opnd2 = NULL_TREE;
+	      }
+	    else
+	      break;
+	}
+    }
+
+  if (last == NULL)
+    {
+      *before = true;
+      return gsi_after_labels (bb);
+    }
+  *before = false;
+  return gsi_for_stmt (last);
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1633,6 +1975,7 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  bool before = false;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1653,11 +1996,17 @@ predicate_all_scalar_phis (struct loop *loop)
 	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
       true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      if (!true_bb)
+	/* Will use extended predication, find out insertion point.  */
+	gsi = find_insertion_point (bb, &before);
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (true_bb)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_arbitrary_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1673,13 +2022,12 @@ static void
 insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 {
   unsigned int i;
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1692,7 +2040,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || flag_force_vectorize)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1849,7 +2198,6 @@ predicate_mem_writes (loop_p loop)
 	  swap = true;
 	  cond = TREE_OPERAND (cond, 0);
 	}
-
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
 	  continue;
@@ -2102,6 +2450,7 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2113,6 +2462,15 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  flag_force_vectorize = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!flag_force_vectorize)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	flag_force_vectorize = true;
+    }
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2122,7 +2480,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2143,7 +2503,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-13 10:00 [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd Yuri Rumyantsev
@ 2014-10-15 10:09 ` Richard Biener
  2014-10-16 15:52   ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-15 10:09 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Here is updated patch (part1) for extended if conversion.
>
> Second part of patch will be sent later.

Ok, I'm starting to look at this.  I'd still like you to split things up
more.

 static inline void
 add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 {
...

+      /* We use notion of cd equivalence to get simplier predicate for
+        join block, e.g. if join block has 2 predecessors with predicates
+        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
+        p1 & p2 | p1 & !p2.  */
+      if (dom_bb != loop->header
+         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
+       {
+         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
+         bc = bb_predicate (dom_bb);
+         gcc_assert (!is_true_predicate (bc));

these changes look worthwhile even for !flag_force_vectorize.  So please
split the change to add_to_predicate_list out and compute post-dominators
unconditionally.  Note that you should call free_dominance_info
(CDI_POST_DOMINATORS) at the end of if-conversion.

+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);

how do we know the edge is critical by this simple check?  Why not
simply always save edge predicates (well, you kind of do but omit
the case where e->src dominates e->dest).

Btw, you can rely on edge->aux being NULL at the start of the
pass but need to clear it at the end (best use clear_aux_for_edges ()
for that).  So stuff like

+         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+         if (flag_force_vectorize)
+           true_edge->aux = false_edge->aux = NULL;

shouldn't be necessary.

I think the edge predicate handling should also be unconditionally
and not depend on flag_force_vectorize.

+      /* The loop latch and loop exit block are always executed and
+        have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+         || bb_with_exit_edge_p (loop, bb))

I don't think the edge stuff is true - given you still only reset the
loop->latch bb predicate the change looks broken.

+         /* Fold_build2 can produce bool conversion which is not
+             supported by vectorizer, so re-build it without folding.
+            For example, such conversion is generated for sequence:
+               _Bool _7, _8, _9;
+               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
+               if (_9 != 0)  --> (bool)_9.  */
+
+         if (CONVERT_EXPR_P (c)
+             && TREE_CODE_CLASS (code) == tcc_comparison)

I think you should simply use canonicalize_cond_expr_cond on the
folding result.  Or rather _not_ fold at all - we are taking the
operands from the GIMPLE condition unmodified after all.

-         add_to_dst_predicate_list (loop, false_edge,
-                                    unshare_expr (cond), c2);
+         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
+                                    unshare_expr (c2));

why is it necessary to unshare c2?

Please split out the PHI-with-multi-arg handling  (I have not looked at
that in detail).

Thanks,
Richard.


> Changelog.
>
> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_to_predicate_list): Check unconditionally that bb is always
> executed to early exit. Use predicate of cd-equivalent block
> for join blocks if it exists.
> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
> destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
> to reject block if-conversion with incoming critical edges only if
> FLAG_FORCE_VECTORIZE was not set-up.
> (predicate_bbs): Skip loop exit block also. Add check that if
> fold_build2 produces bool conversion, recompute predicate using
> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_all_scalar_phis): Introduce new variable BEFORE.
> Invoke find_insertion_point to initialize gsi and
> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
> that extended predication must be applied).
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare).Do loop versioning
> for innermost loop marked with pragma omp simd and
> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
> for blocks with two successors.
>
>
>
>
> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Richard,
>>
>> here is reduced patch (part.1) which was reduced almost twice.
>> Let's me also answer on your comments.
>>
>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>> My previous code was not correct and now it looks like:
>>
>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>     c = bb_predicate (b);
>>   else
>>     /* Edge E is critical and its aux field contains predicate.  */
>>     c = edge_predicate (e);
>>
>> 2. I completely delete all code related to creation of conditional
>> expressions and completely rely on bool pattern recognition in
>> vectorizer. But we need to delete all dead predicate computations
>> which are not used since they prevent vectorization. I will add this
>> local-dce function in next patch.
>> 3. I also did not include in this patch recognition of general
>> phi-nodes with two arguments only for which conversion of conditional
>> scalar reduction can be applied also.
>> Note that all these changes are applied for loop marked with pragma
>> omp simd only.
>>
>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (convert_name_to_cmp): New function.
>> (add_to_predicate_list): Check unconditionally that bb is always
>> executed to early exit. Use predicate of cd-equivalent block
>> for join blocks if it exists.
>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>> destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>> to reject block if-conversion with incoming critical edges only if
>> FLAG_FORCE_VECTORIZE was not set-up.
>> (predicate_bbs): Skip loop exit block also. Add check that if
>> fold_build2 produces bool conversion, recompute predicate using
>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_arbitrary_scalar_phi): New function.
>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>> Invoke find_insertion_point to initialize gsi and
>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>> that extended predication must be applied).
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare).Do loop versioning
>> for innermost loop marked with pragma omp simd and
>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>> for blocks with two successors.
>>
>>
>>
>>
>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard!
>>>> Here is updated patch with the following changes:
>>>>
>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>> negate_predicate was deleted.
>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>> be critical.
>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>> blocks to simplify it.
>>>> 5. I decided to not design pre-pass since it will lead generating
>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>> of kind
>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>> only one cond expression is required and this is considered as simple
>>>> optimization for arbitrary phi-function. More precise,
>>>> if phi-function have only two different arguments and one of them has
>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>> arguments.
>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>
>>>> Updated patch is attached.
>>>>
>>>> Any comments will be appreciated.
>>>
>>> The patch is still very big and does multiple things at once which makes
>>> it hard to review.
>>>
>>> In addition to that it changes function singatures without updating
>>> the function comments.  For example what is the convert_bool
>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>> all this added logic.
>>>
>>> You duplicate operand_equal_for_phi_arg_p.
>>>
>>> I think the code handling PHIs with more than two operands but
>>> only two unequal operands is useful generally, so that's an obvious
>>> candidate for splitting out into a separate patch.
>>>
>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>> +   which is not supported by vectorizer to int type through creating of
>>> +   conditional expressions.  */
>>>
>>> Example?  The vectorizer has patterns for bool predicate computations.
>>> This seems to be another feature that needs splitting out.
>>>
>>> The way you get around the critical edge parts looks awkward to me.
>>> Please either do _all_ predicates as edge predicates or simply
>>> split critical edges (of the respective loop body).
>>>
>>> I still think that an utility doing same PHI arg merging by introducing
>>> forwarder blocks would be nicer to have.
>>>
>>> I'd restructure the main tree_if_conversion function to apply these
>>> CFG pre-transforms when we are going to version the loop
>>> for if conversion (eventually transitioning to always doing that).
>>>
>>> So - please split up the patch.  It's way too big.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>> (convert_name_to_cmp): New function.
>>>> (get_type_for_cond): New function.
>>>> (convert_bool_predicate): New function.
>>>> (predicate_disjunction): New function.
>>>> (predicate_conjunction): New function.
>>>> (add_to_predicate_list): Add convert_bool argument.
>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>> such bb exists; save it in static variable for further possible use.
>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>> Add early function exit if edge target block is always executed.
>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>> Pass convert_bool argument for add_to_predicate_list.
>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>> (equal_phi_args): New function.
>>>> (phi_has_two_different_args): New function.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if flag_force_vectorize wa set-up.
>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>> flag_force_vectorize was set-up.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>> to reject block if-conversion with imcoming critical edges only if
>>>> flag_force_vectorize was not set-up.
>>>> (walk_cond_tree): New function.
>>>> (vect_bool_pattern_is_applicable): New function.
>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>> comparison expressions of boolean type into conditional expressions
>>>> with integral operands. If convert_bool argument was set-up and
>>>> vect bool pattern can be appied perform the following transformation:
>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>> add_to_predicate_list.
>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>> Call predicate_bbs with additional argument equal to false.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_arbitrary_phi): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>> iterator for predication of extended scalar phi's for insertion.
>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>> blocks that there are no gimplified statements to insert. Insert
>>>> predicates at the block begining for extended if-conversion.
>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>> predication to build mask.
>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>> for innermost loop marked with pragma omp simd.
>>>>
>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>> include:
>>>>>>
>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>    loops behavior was not changed.
>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>    predecessors.
>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>
>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>
>>>>> No?
>>>>>
>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>> with some limitations:
>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>    arguments are different and one of them has the only occurence,
>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>> arguments have common predecessor which is used for further check
>>>>>> with next edge.
>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>
>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>> inserting forwarder blocks.  Thus
>>>>>
>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>
>>>>> becomes
>>>>>
>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>
>>>>>   x = PHI <1(5), 2(4)>
>>>>>
>>>>> and
>>>>>
>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>
>>>>> becomes
>>>>>
>>>>>   bb 5:
>>>>>   x' = PHI <1(2), 2(3)>
>>>>>
>>>>>   b = PHI<x'(5), 3(4)>
>>>>>
>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>> copies we need to insert on edges.
>>>>>
>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>> And make 3) work properly if it doesn't already.
>>>>>
>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>> #pragma omp simd safelen(8)
>>>>>>   for (i=0; i<512; i++)
>>>>>>   {
>>>>>>     float t = a[i];
>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>       if (c[i] != 0)
>>>>>> res += 1;
>>>>>>   }
>>>>>>   <bb 4>:
>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>   t_5 = a[i_16];
>>>>>>   _6 = t_5 > 0.0;
>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>   _8 = _7 & _6;
>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>   _10 = &c[i_16];
>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>   i_11 = i_16 + 1;
>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>   if (ivtmp_14 != 0)
>>>>>>     goto <bb 4>;
>>>>>>
>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>
>>>>>> gcc/ChageLog
>>>>>>
>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>> (bb_negate_predicate): New function.
>>>>>> (set_bb_negate_predicate): New function.
>>>>>> (bb_copy_predicate): New function.
>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>> (convert_name_to_cmp): New function.
>>>>>> (get_type_for_cond): New function.
>>>>>> (convert_bool_predicate): New function.
>>>>>> (predicate_disjunction): New function.
>>>>>> (predicate_conjunction): New function.
>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>> Add early function exit if edge target block is always executed.
>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>> (equal_phi_args): New function.
>>>>>> (phi_has_two_different_args): New function.
>>>>>> (phi_args_disjoint): New function.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>> in non-predicated basic blocks.
>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>> flag_force_vectorize was not setup.
>>>>>> (walk_cond_tree): New function.
>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>> but generated gimplified statements are stored in their destination
>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>> equal to false.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>> predication to build mask.
>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>> (split_crit_edge): New function.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-15 10:09 ` Richard Biener
@ 2014-10-16 15:52   ` Yuri Rumyantsev
  2014-10-17  9:11     ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-16 15:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 28737 bytes --]

Richard,

Here is reduced patch as you requested. All your remarks have been fixed.
Could you please look at it ( I have already sent the patch with
changes in add_to_predicate_list for review).

Thanks.
Yuri.
ChangeLog
2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>

(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
if destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also.Invoke build2_loc
to compute predicate instead of fold_build2_loc.
Add zeroing of edge 'aux' field.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
Nullify 'aux' field of edges for blocks with two successors.



2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Here is updated patch (part1) for extended if conversion.
>>
>> Second part of patch will be sent later.
>
> Ok, I'm starting to look at this.  I'd still like you to split things up
> more.
>
>  static inline void
>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>  {
> ...
>
> +      /* We use notion of cd equivalence to get simplier predicate for
> +        join block, e.g. if join block has 2 predecessors with predicates
> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
> +        p1 & p2 | p1 & !p2.  */
> +      if (dom_bb != loop->header
> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
> +       {
> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
> +         bc = bb_predicate (dom_bb);
> +         gcc_assert (!is_true_predicate (bc));
>
> these changes look worthwhile even for !flag_force_vectorize.  So please
> split the change to add_to_predicate_list out and compute post-dominators
> unconditionally.  Note that you should call free_dominance_info
> (CDI_POST_DOMINATORS) at the end of if-conversion.
>
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
> +    add_to_predicate_list (loop, e->dest, cond);
> +
> +  /* If edge E is critical save predicate on it.  */
> +  if (EDGE_COUNT (e->dest->preds) >= 2)
> +    set_edge_predicate (e, cond);
>
> how do we know the edge is critical by this simple check?  Why not
> simply always save edge predicates (well, you kind of do but omit
> the case where e->src dominates e->dest).
>
> Btw, you can rely on edge->aux being NULL at the start of the
> pass but need to clear it at the end (best use clear_aux_for_edges ()
> for that).  So stuff like
>
> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
> +         if (flag_force_vectorize)
> +           true_edge->aux = false_edge->aux = NULL;
>
> shouldn't be necessary.
>
> I think the edge predicate handling should also be unconditionally
> and not depend on flag_force_vectorize.
>
> +      /* The loop latch and loop exit block are always executed and
> +        have no extra conditions to be processed: skip them.  */
> +      if (bb == loop->latch
> +         || bb_with_exit_edge_p (loop, bb))
>
> I don't think the edge stuff is true - given you still only reset the
> loop->latch bb predicate the change looks broken.
>
> +         /* Fold_build2 can produce bool conversion which is not
> +             supported by vectorizer, so re-build it without folding.
> +            For example, such conversion is generated for sequence:
> +               _Bool _7, _8, _9;
> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
> +               if (_9 != 0)  --> (bool)_9.  */
> +
> +         if (CONVERT_EXPR_P (c)
> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>
> I think you should simply use canonicalize_cond_expr_cond on the
> folding result.  Or rather _not_ fold at all - we are taking the
> operands from the GIMPLE condition unmodified after all.
>
> -         add_to_dst_predicate_list (loop, false_edge,
> -                                    unshare_expr (cond), c2);
> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
> +                                    unshare_expr (c2));
>
> why is it necessary to unshare c2?
>
> Please split out the PHI-with-multi-arg handling  (I have not looked at
> that in detail).
>
> Thanks,
> Richard.
>
>
>> Changelog.
>>
>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_to_predicate_list): Check unconditionally that bb is always
>> executed to early exit. Use predicate of cd-equivalent block
>> for join blocks if it exists.
>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>> destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>> to reject block if-conversion with incoming critical edges only if
>> FLAG_FORCE_VECTORIZE was not set-up.
>> (predicate_bbs): Skip loop exit block also. Add check that if
>> fold_build2 produces bool conversion, recompute predicate using
>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_arbitrary_scalar_phi): New function.
>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>> Invoke find_insertion_point to initialize gsi and
>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>> that extended predication must be applied).
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare).Do loop versioning
>> for innermost loop marked with pragma omp simd and
>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>> for blocks with two successors.
>>
>>
>>
>>
>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>> Richard,
>>>
>>> here is reduced patch (part.1) which was reduced almost twice.
>>> Let's me also answer on your comments.
>>>
>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>> My previous code was not correct and now it looks like:
>>>
>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>     c = bb_predicate (b);
>>>   else
>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>     c = edge_predicate (e);
>>>
>>> 2. I completely delete all code related to creation of conditional
>>> expressions and completely rely on bool pattern recognition in
>>> vectorizer. But we need to delete all dead predicate computations
>>> which are not used since they prevent vectorization. I will add this
>>> local-dce function in next patch.
>>> 3. I also did not include in this patch recognition of general
>>> phi-nodes with two arguments only for which conversion of conditional
>>> scalar reduction can be applied also.
>>> Note that all these changes are applied for loop marked with pragma
>>> omp simd only.
>>>
>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (convert_name_to_cmp): New function.
>>> (add_to_predicate_list): Check unconditionally that bb is always
>>> executed to early exit. Use predicate of cd-equivalent block
>>> for join blocks if it exists.
>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>> destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>> to reject block if-conversion with incoming critical edges only if
>>> FLAG_FORCE_VECTORIZE was not set-up.
>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>> fold_build2 produces bool conversion, recompute predicate using
>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_arbitrary_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>> Invoke find_insertion_point to initialize gsi and
>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>> that extended predication must be applied).
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>> for innermost loop marked with pragma omp simd and
>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>> for blocks with two successors.
>>>
>>>
>>>
>>>
>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard!
>>>>> Here is updated patch with the following changes:
>>>>>
>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>> negate_predicate was deleted.
>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>> be critical.
>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>> blocks to simplify it.
>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>> of kind
>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>> only one cond expression is required and this is considered as simple
>>>>> optimization for arbitrary phi-function. More precise,
>>>>> if phi-function have only two different arguments and one of them has
>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>> arguments.
>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>
>>>>> Updated patch is attached.
>>>>>
>>>>> Any comments will be appreciated.
>>>>
>>>> The patch is still very big and does multiple things at once which makes
>>>> it hard to review.
>>>>
>>>> In addition to that it changes function singatures without updating
>>>> the function comments.  For example what is the convert_bool
>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>> all this added logic.
>>>>
>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>
>>>> I think the code handling PHIs with more than two operands but
>>>> only two unequal operands is useful generally, so that's an obvious
>>>> candidate for splitting out into a separate patch.
>>>>
>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>> +   which is not supported by vectorizer to int type through creating of
>>>> +   conditional expressions.  */
>>>>
>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>> This seems to be another feature that needs splitting out.
>>>>
>>>> The way you get around the critical edge parts looks awkward to me.
>>>> Please either do _all_ predicates as edge predicates or simply
>>>> split critical edges (of the respective loop body).
>>>>
>>>> I still think that an utility doing same PHI arg merging by introducing
>>>> forwarder blocks would be nicer to have.
>>>>
>>>> I'd restructure the main tree_if_conversion function to apply these
>>>> CFG pre-transforms when we are going to version the loop
>>>> for if conversion (eventually transitioning to always doing that).
>>>>
>>>> So - please split up the patch.  It's way too big.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>> (convert_name_to_cmp): New function.
>>>>> (get_type_for_cond): New function.
>>>>> (convert_bool_predicate): New function.
>>>>> (predicate_disjunction): New function.
>>>>> (predicate_conjunction): New function.
>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>> such bb exists; save it in static variable for further possible use.
>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>> Add early function exit if edge target block is always executed.
>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>> (equal_phi_args): New function.
>>>>> (phi_has_two_different_args): New function.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if flag_force_vectorize wa set-up.
>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>> flag_force_vectorize was set-up.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>> flag_force_vectorize was not set-up.
>>>>> (walk_cond_tree): New function.
>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>> comparison expressions of boolean type into conditional expressions
>>>>> with integral operands. If convert_bool argument was set-up and
>>>>> vect bool pattern can be appied perform the following transformation:
>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>> add_to_predicate_list.
>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>> Call predicate_bbs with additional argument equal to false.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>> (get_predicate_for_edge): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_arbitrary_phi): New function.
>>>>> (predicate_extended_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>> predicates at the block begining for extended if-conversion.
>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>> predication to build mask.
>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>> for innermost loop marked with pragma omp simd.
>>>>>
>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>> include:
>>>>>>>
>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>    loops behavior was not changed.
>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>    predecessors.
>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>
>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>
>>>>>> No?
>>>>>>
>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>> with some limitations:
>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>> arguments have common predecessor which is used for further check
>>>>>>> with next edge.
>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>
>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>> inserting forwarder blocks.  Thus
>>>>>>
>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>
>>>>>> becomes
>>>>>>
>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>
>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>
>>>>>> and
>>>>>>
>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>
>>>>>> becomes
>>>>>>
>>>>>>   bb 5:
>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>
>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>
>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>> copies we need to insert on edges.
>>>>>>
>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>> And make 3) work properly if it doesn't already.
>>>>>>
>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>> #pragma omp simd safelen(8)
>>>>>>>   for (i=0; i<512; i++)
>>>>>>>   {
>>>>>>>     float t = a[i];
>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>       if (c[i] != 0)
>>>>>>> res += 1;
>>>>>>>   }
>>>>>>>   <bb 4>:
>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>   t_5 = a[i_16];
>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>   _8 = _7 & _6;
>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>   _10 = &c[i_16];
>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>   i_11 = i_16 + 1;
>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>     goto <bb 4>;
>>>>>>>
>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>
>>>>>>> gcc/ChageLog
>>>>>>>
>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>> (bb_negate_predicate): New function.
>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>> (bb_copy_predicate): New function.
>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>> (convert_name_to_cmp): New function.
>>>>>>> (get_type_for_cond): New function.
>>>>>>> (convert_bool_predicate): New function.
>>>>>>> (predicate_disjunction): New function.
>>>>>>> (predicate_conjunction): New function.
>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>> (equal_phi_args): New function.
>>>>>>> (phi_has_two_different_args): New function.
>>>>>>> (phi_args_disjoint): New function.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>> in non-predicated basic blocks.
>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>> flag_force_vectorize was not setup.
>>>>>>> (walk_cond_tree): New function.
>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>> equal to false.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>> predication to build mask.
>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>> (split_crit_edge): New function.
>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: if-conv.patch2 --]
[-- Type: application/octet-stream, Size: 10585 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 3453292..081e379
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +152,17 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->src->succs) >= 2);
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +174,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.
+   Assumes that #(E->src->succs) >=2 & #(E->dest->preds) >= 2.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -481,10 +505,16 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.
+     Assume that #(e->src->succs) >= 2.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -508,7 +538,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -520,11 +552,18 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -754,7 +793,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -891,7 +930,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -938,6 +978,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 2 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -946,6 +1002,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -958,9 +1016,13 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
 
   if (exit_bb)
     {
@@ -997,18 +1059,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_edges_are_critical (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1090,6 +1151,7 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1122,11 +1184,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1134,25 +1197,27 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
-
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+
+	  /* Compute predicates for true and false edges.  */
+	  c = build2_loc (loc, code,
+			  boolean_type_node,
+			  lopnd,
+			  gimple_cond_rhs (stmt));
+	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			   unshare_expr (c));
 
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+	  true_edge->aux = false_edge->aux = NULL;
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
 
 	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
 	  add_to_dst_predicate_list (loop, false_edge,
 				     unshare_expr (cond), c2);
 
@@ -1364,7 +1429,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1373,7 +1440,13 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
+  if (EDGE_COUNT (bb->preds) != 2
+      || all_edges_are_critical (bb))
+    {
+      gcc_assert (flag_force_vectorize);
+      return NULL;
+    }
+
   first_edge = EDGE_PRED (bb, 0);
   second_edge = EDGE_PRED (bb, 1);
 
@@ -2140,6 +2213,9 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Temporary set up this flag to false.  */
+  flag_force_vectorize = false;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2149,7 +2225,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2170,7 +2248,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-16 15:52   ` Yuri Rumyantsev
@ 2014-10-17  9:11     ` Richard Biener
  2014-10-17 14:15       ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-17  9:11 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Here is reduced patch as you requested. All your remarks have been fixed.
> Could you please look at it ( I have already sent the patch with
> changes in add_to_predicate_list for review).

+             if (dump_file && (dump_flags & TDF_DETAILS))
+               fprintf (dump_file, "More than two phi node args.\n");
+             return false;
+           }
+
+        }

Excess vertical space.


+/* Assumes that BB has more than 2 predecessors.

More than 1 predecessor?

+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{

"all_preds_critical_p" would be a better name

+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+       return false;
+    }

as I said in the last review I don't think we should restrict edge
predicates to flag_force_vectorize.  At least I can't see how
if-conversion is magically more expensive for that case?

So please rework the patch so critical edges are always handled
correctly.

Ok with that and the above suggested changes.

Thanks,
Richard.


> Thanks.
> Yuri.
> ChangeLog
> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
> if destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
> to reject block if-conversion with incoming critical edges only if
> FLAG_FORCE_VECTORIZE was not set-up.
> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
> to compute predicate instead of fold_build2_loc.
> Add zeroing of edge 'aux' field.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
> Nullify 'aux' field of edges for blocks with two successors.
>
>
>
> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is updated patch (part1) for extended if conversion.
>>>
>>> Second part of patch will be sent later.
>>
>> Ok, I'm starting to look at this.  I'd still like you to split things up
>> more.
>>
>>  static inline void
>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>  {
>> ...
>>
>> +      /* We use notion of cd equivalence to get simplier predicate for
>> +        join block, e.g. if join block has 2 predecessors with predicates
>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>> +        p1 & p2 | p1 & !p2.  */
>> +      if (dom_bb != loop->header
>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>> +       {
>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>> +         bc = bb_predicate (dom_bb);
>> +         gcc_assert (!is_true_predicate (bc));
>>
>> these changes look worthwhile even for !flag_force_vectorize.  So please
>> split the change to add_to_predicate_list out and compute post-dominators
>> unconditionally.  Note that you should call free_dominance_info
>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>
>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>> +    add_to_predicate_list (loop, e->dest, cond);
>> +
>> +  /* If edge E is critical save predicate on it.  */
>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>> +    set_edge_predicate (e, cond);
>>
>> how do we know the edge is critical by this simple check?  Why not
>> simply always save edge predicates (well, you kind of do but omit
>> the case where e->src dominates e->dest).
>>
>> Btw, you can rely on edge->aux being NULL at the start of the
>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>> for that).  So stuff like
>>
>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>> +         if (flag_force_vectorize)
>> +           true_edge->aux = false_edge->aux = NULL;
>>
>> shouldn't be necessary.
>>
>> I think the edge predicate handling should also be unconditionally
>> and not depend on flag_force_vectorize.
>>
>> +      /* The loop latch and loop exit block are always executed and
>> +        have no extra conditions to be processed: skip them.  */
>> +      if (bb == loop->latch
>> +         || bb_with_exit_edge_p (loop, bb))
>>
>> I don't think the edge stuff is true - given you still only reset the
>> loop->latch bb predicate the change looks broken.
>>
>> +         /* Fold_build2 can produce bool conversion which is not
>> +             supported by vectorizer, so re-build it without folding.
>> +            For example, such conversion is generated for sequence:
>> +               _Bool _7, _8, _9;
>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>> +               if (_9 != 0)  --> (bool)_9.  */
>> +
>> +         if (CONVERT_EXPR_P (c)
>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>
>> I think you should simply use canonicalize_cond_expr_cond on the
>> folding result.  Or rather _not_ fold at all - we are taking the
>> operands from the GIMPLE condition unmodified after all.
>>
>> -         add_to_dst_predicate_list (loop, false_edge,
>> -                                    unshare_expr (cond), c2);
>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>> +                                    unshare_expr (c2));
>>
>> why is it necessary to unshare c2?
>>
>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>> that in detail).
>>
>> Thanks,
>> Richard.
>>
>>
>>> Changelog.
>>>
>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_to_predicate_list): Check unconditionally that bb is always
>>> executed to early exit. Use predicate of cd-equivalent block
>>> for join blocks if it exists.
>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>> destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>> to reject block if-conversion with incoming critical edges only if
>>> FLAG_FORCE_VECTORIZE was not set-up.
>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>> fold_build2 produces bool conversion, recompute predicate using
>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_arbitrary_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>> Invoke find_insertion_point to initialize gsi and
>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>> that extended predication must be applied).
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>> for innermost loop marked with pragma omp simd and
>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>> for blocks with two successors.
>>>
>>>
>>>
>>>
>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>> Richard,
>>>>
>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>> Let's me also answer on your comments.
>>>>
>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>> My previous code was not correct and now it looks like:
>>>>
>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>     c = bb_predicate (b);
>>>>   else
>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>     c = edge_predicate (e);
>>>>
>>>> 2. I completely delete all code related to creation of conditional
>>>> expressions and completely rely on bool pattern recognition in
>>>> vectorizer. But we need to delete all dead predicate computations
>>>> which are not used since they prevent vectorization. I will add this
>>>> local-dce function in next patch.
>>>> 3. I also did not include in this patch recognition of general
>>>> phi-nodes with two arguments only for which conversion of conditional
>>>> scalar reduction can be applied also.
>>>> Note that all these changes are applied for loop marked with pragma
>>>> omp simd only.
>>>>
>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (convert_name_to_cmp): New function.
>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>> executed to early exit. Use predicate of cd-equivalent block
>>>> for join blocks if it exists.
>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>> destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>> to reject block if-conversion with incoming critical edges only if
>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>> fold_build2 produces bool conversion, recompute predicate using
>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (get_predicate_for_edge): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_arbitrary_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>> Invoke find_insertion_point to initialize gsi and
>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>> that extended predication must be applied).
>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>> blocks that there are no gimplified statements to insert. Insert
>>>> predicates at the block begining for extended if-conversion.
>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>> for innermost loop marked with pragma omp simd and
>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>> for blocks with two successors.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard!
>>>>>> Here is updated patch with the following changes:
>>>>>>
>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>> negate_predicate was deleted.
>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>> be critical.
>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>> blocks to simplify it.
>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>> of kind
>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>> only one cond expression is required and this is considered as simple
>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>> if phi-function have only two different arguments and one of them has
>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>> arguments.
>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>
>>>>>> Updated patch is attached.
>>>>>>
>>>>>> Any comments will be appreciated.
>>>>>
>>>>> The patch is still very big and does multiple things at once which makes
>>>>> it hard to review.
>>>>>
>>>>> In addition to that it changes function singatures without updating
>>>>> the function comments.  For example what is the convert_bool
>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>> all this added logic.
>>>>>
>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>
>>>>> I think the code handling PHIs with more than two operands but
>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>> candidate for splitting out into a separate patch.
>>>>>
>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>> +   conditional expressions.  */
>>>>>
>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>> This seems to be another feature that needs splitting out.
>>>>>
>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>> split critical edges (of the respective loop body).
>>>>>
>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>> forwarder blocks would be nicer to have.
>>>>>
>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>> CFG pre-transforms when we are going to version the loop
>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>
>>>>> So - please split up the patch.  It's way too big.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>> (convert_name_to_cmp): New function.
>>>>>> (get_type_for_cond): New function.
>>>>>> (convert_bool_predicate): New function.
>>>>>> (predicate_disjunction): New function.
>>>>>> (predicate_conjunction): New function.
>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>> Add early function exit if edge target block is always executed.
>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>> (equal_phi_args): New function.
>>>>>> (phi_has_two_different_args): New function.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if flag_force_vectorize wa set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>> flag_force_vectorize was set-up.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>> flag_force_vectorize was not set-up.
>>>>>> (walk_cond_tree): New function.
>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>> add_to_predicate_list.
>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_arbitrary_phi): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>> predication to build mask.
>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>
>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>> include:
>>>>>>>>
>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>    loops behavior was not changed.
>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>    predecessors.
>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>
>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>
>>>>>>> No?
>>>>>>>
>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>> with some limitations:
>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>> with next edge.
>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>
>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>
>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>
>>>>>>> becomes
>>>>>>>
>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>
>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>
>>>>>>> becomes
>>>>>>>
>>>>>>>   bb 5:
>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>
>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>
>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>> copies we need to insert on edges.
>>>>>>>
>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>
>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>   {
>>>>>>>>     float t = a[i];
>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>       if (c[i] != 0)
>>>>>>>> res += 1;
>>>>>>>>   }
>>>>>>>>   <bb 4>:
>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>   t_5 = a[i_16];
>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>   _8 = _7 & _6;
>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>   _10 = &c[i_16];
>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>     goto <bb 4>;
>>>>>>>>
>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>
>>>>>>>> gcc/ChageLog
>>>>>>>>
>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>> (get_type_for_cond): New function.
>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>> (predicate_disjunction): New function.
>>>>>>>> (predicate_conjunction): New function.
>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>> (equal_phi_args): New function.
>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>> in non-predicated basic blocks.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>> (walk_cond_tree): New function.
>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>> equal to false.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>> predication to build mask.
>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>> (split_crit_edge): New function.
>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-17  9:11     ` Richard Biener
@ 2014-10-17 14:15       ` Yuri Rumyantsev
  2014-10-20  8:02         ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-17 14:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 32973 bytes --]

Richard,

I reworked the patch as you proposed, but I didn't understand what
did you mean by:

>So please rework the patch so critical edges are always handled
>correctly.

In current patch flag_force_vectorize is used (1) to reject phi nodes
with more than 2 arguments; (2) to reject basic blocks with only
critical incoming edges since support for extended predication of phi
nodes will be in next patch.

Could you please clarify your statement.

I attached modified patch.

ChangeLog:

2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>

(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
if destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Use call of all_preds_critical_p
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also.Invoke build2_loc
to compute predicate instead of fold_build2_loc.
Add zeroing of edge 'aux' field.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
Nullify 'aux' field of edges for blocks with two successors.




2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Here is reduced patch as you requested. All your remarks have been fixed.
>> Could you please look at it ( I have already sent the patch with
>> changes in add_to_predicate_list for review).
>
> +             if (dump_file && (dump_flags & TDF_DETAILS))
> +               fprintf (dump_file, "More than two phi node args.\n");
> +             return false;
> +           }
> +
> +        }
>
> Excess vertical space.
>
>
> +/* Assumes that BB has more than 2 predecessors.
>
> More than 1 predecessor?
>
> +   Returns false if at least one successor is not on critical edge
> +   and true otherwise.  */
> +
> +static inline bool
> +all_edges_are_critical (basic_block bb)
> +{
>
> "all_preds_critical_p" would be a better name
>
> +  if (EDGE_COUNT (bb->preds) > 2)
> +    {
> +      if (!flag_force_vectorize)
> +       return false;
> +    }
>
> as I said in the last review I don't think we should restrict edge
> predicates to flag_force_vectorize.  At least I can't see how
> if-conversion is magically more expensive for that case?
>
> So please rework the patch so critical edges are always handled
> correctly.
>
> Ok with that and the above suggested changes.
>
> Thanks,
> Richard.
>
>
>> Thanks.
>> Yuri.
>> ChangeLog
>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>> if destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>> to reject block if-conversion with incoming critical edges only if
>> FLAG_FORCE_VECTORIZE was not set-up.
>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>> to compute predicate instead of fold_build2_loc.
>> Add zeroing of edge 'aux' field.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>> Nullify 'aux' field of edges for blocks with two successors.
>>
>>
>>
>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is updated patch (part1) for extended if conversion.
>>>>
>>>> Second part of patch will be sent later.
>>>
>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>> more.
>>>
>>>  static inline void
>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>  {
>>> ...
>>>
>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>> +        p1 & p2 | p1 & !p2.  */
>>> +      if (dom_bb != loop->header
>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>> +       {
>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>> +         bc = bb_predicate (dom_bb);
>>> +         gcc_assert (!is_true_predicate (bc));
>>>
>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>> split the change to add_to_predicate_list out and compute post-dominators
>>> unconditionally.  Note that you should call free_dominance_info
>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>
>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>> +    add_to_predicate_list (loop, e->dest, cond);
>>> +
>>> +  /* If edge E is critical save predicate on it.  */
>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>> +    set_edge_predicate (e, cond);
>>>
>>> how do we know the edge is critical by this simple check?  Why not
>>> simply always save edge predicates (well, you kind of do but omit
>>> the case where e->src dominates e->dest).
>>>
>>> Btw, you can rely on edge->aux being NULL at the start of the
>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>> for that).  So stuff like
>>>
>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>> +         if (flag_force_vectorize)
>>> +           true_edge->aux = false_edge->aux = NULL;
>>>
>>> shouldn't be necessary.
>>>
>>> I think the edge predicate handling should also be unconditionally
>>> and not depend on flag_force_vectorize.
>>>
>>> +      /* The loop latch and loop exit block are always executed and
>>> +        have no extra conditions to be processed: skip them.  */
>>> +      if (bb == loop->latch
>>> +         || bb_with_exit_edge_p (loop, bb))
>>>
>>> I don't think the edge stuff is true - given you still only reset the
>>> loop->latch bb predicate the change looks broken.
>>>
>>> +         /* Fold_build2 can produce bool conversion which is not
>>> +             supported by vectorizer, so re-build it without folding.
>>> +            For example, such conversion is generated for sequence:
>>> +               _Bool _7, _8, _9;
>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>> +               if (_9 != 0)  --> (bool)_9.  */
>>> +
>>> +         if (CONVERT_EXPR_P (c)
>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>
>>> I think you should simply use canonicalize_cond_expr_cond on the
>>> folding result.  Or rather _not_ fold at all - we are taking the
>>> operands from the GIMPLE condition unmodified after all.
>>>
>>> -         add_to_dst_predicate_list (loop, false_edge,
>>> -                                    unshare_expr (cond), c2);
>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>> +                                    unshare_expr (c2));
>>>
>>> why is it necessary to unshare c2?
>>>
>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>> that in detail).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
>>>> Changelog.
>>>>
>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>> executed to early exit. Use predicate of cd-equivalent block
>>>> for join blocks if it exists.
>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>> destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>> to reject block if-conversion with incoming critical edges only if
>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>> fold_build2 produces bool conversion, recompute predicate using
>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (get_predicate_for_edge): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_arbitrary_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>> Invoke find_insertion_point to initialize gsi and
>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>> that extended predication must be applied).
>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>> blocks that there are no gimplified statements to insert. Insert
>>>> predicates at the block begining for extended if-conversion.
>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>> for innermost loop marked with pragma omp simd and
>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>> for blocks with two successors.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>> Richard,
>>>>>
>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>> Let's me also answer on your comments.
>>>>>
>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>> My previous code was not correct and now it looks like:
>>>>>
>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>     c = bb_predicate (b);
>>>>>   else
>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>     c = edge_predicate (e);
>>>>>
>>>>> 2. I completely delete all code related to creation of conditional
>>>>> expressions and completely rely on bool pattern recognition in
>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>> which are not used since they prevent vectorization. I will add this
>>>>> local-dce function in next patch.
>>>>> 3. I also did not include in this patch recognition of general
>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>> scalar reduction can be applied also.
>>>>> Note that all these changes are applied for loop marked with pragma
>>>>> omp simd only.
>>>>>
>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (convert_name_to_cmp): New function.
>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>> for join blocks if it exists.
>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>> destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with incoming critical edges only if
>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (get_predicate_for_edge): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>> Invoke find_insertion_point to initialize gsi and
>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>> that extended predication must be applied).
>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>> predicates at the block begining for extended if-conversion.
>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>> for innermost loop marked with pragma omp simd and
>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>> for blocks with two successors.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard!
>>>>>>> Here is updated patch with the following changes:
>>>>>>>
>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>> negate_predicate was deleted.
>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>> be critical.
>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>> blocks to simplify it.
>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>> of kind
>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>> arguments.
>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>
>>>>>>> Updated patch is attached.
>>>>>>>
>>>>>>> Any comments will be appreciated.
>>>>>>
>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>> it hard to review.
>>>>>>
>>>>>> In addition to that it changes function singatures without updating
>>>>>> the function comments.  For example what is the convert_bool
>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>> all this added logic.
>>>>>>
>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>
>>>>>> I think the code handling PHIs with more than two operands but
>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>> candidate for splitting out into a separate patch.
>>>>>>
>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>> +   conditional expressions.  */
>>>>>>
>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>> This seems to be another feature that needs splitting out.
>>>>>>
>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>> split critical edges (of the respective loop body).
>>>>>>
>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>> forwarder blocks would be nicer to have.
>>>>>>
>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>
>>>>>> So - please split up the patch.  It's way too big.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>> (convert_name_to_cmp): New function.
>>>>>>> (get_type_for_cond): New function.
>>>>>>> (convert_bool_predicate): New function.
>>>>>>> (predicate_disjunction): New function.
>>>>>>> (predicate_conjunction): New function.
>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>> (equal_phi_args): New function.
>>>>>>> (phi_has_two_different_args): New function.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>> flag_force_vectorize was set-up.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>> flag_force_vectorize was not set-up.
>>>>>>> (walk_cond_tree): New function.
>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>> add_to_predicate_list.
>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>> predication to build mask.
>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>
>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>> include:
>>>>>>>>>
>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>    loops behavior was not changed.
>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>    predecessors.
>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>
>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>
>>>>>>>> No?
>>>>>>>>
>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>> with some limitations:
>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>> with next edge.
>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>
>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>
>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>
>>>>>>>> becomes
>>>>>>>>
>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>
>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>
>>>>>>>> becomes
>>>>>>>>
>>>>>>>>   bb 5:
>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>
>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>
>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>> copies we need to insert on edges.
>>>>>>>>
>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>
>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>   {
>>>>>>>>>     float t = a[i];
>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>       if (c[i] != 0)
>>>>>>>>> res += 1;
>>>>>>>>>   }
>>>>>>>>>   <bb 4>:
>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>     goto <bb 4>;
>>>>>>>>>
>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>
>>>>>>>>> gcc/ChageLog
>>>>>>>>>
>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>> equal to false.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>> predication to build mask.
>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: if-conv.patch2.new --]
[-- Type: application/octet-stream, Size: 10502 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
old mode 100644
new mode 100755
index 3453292..8791a53
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +152,17 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->src->succs) >= 2);
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +174,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.
+   Assumes that #(E->src->succs) >=2 & #(E->dest->preds) >= 2.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -481,10 +505,16 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.
+     Assume that #(e->src->succs) >= 2.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -508,7 +538,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -520,11 +552,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -754,7 +792,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -891,7 +929,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -938,6 +977,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -946,6 +1001,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -958,8 +1015,7 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
   if (exit_bb)
@@ -997,18 +1053,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_preds_critical_p (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1090,6 +1145,7 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1122,11 +1178,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1134,25 +1191,27 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
-
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+
+	  /* Compute predicates for true and false edges.  */
+	  c = build2_loc (loc, code,
+			  boolean_type_node,
+			  lopnd,
+			  gimple_cond_rhs (stmt));
+	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			   unshare_expr (c));
 
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+	  true_edge->aux = false_edge->aux = NULL;
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
 
 	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
 	  add_to_dst_predicate_list (loop, false_edge,
 				     unshare_expr (cond), c2);
 
@@ -1364,7 +1423,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1373,7 +1434,13 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
+  if (EDGE_COUNT (bb->preds) != 2
+      || all_preds_critical_p (bb))
+    {
+      gcc_assert (flag_force_vectorize);
+      return NULL;
+    }
+
   first_edge = EDGE_PRED (bb, 0);
   second_edge = EDGE_PRED (bb, 1);
 
@@ -2140,6 +2207,9 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Temporary set up this flag to false.  */
+  flag_force_vectorize = false;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2149,7 +2219,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2170,7 +2242,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-17 14:15       ` Yuri Rumyantsev
@ 2014-10-20  8:02         ` Richard Biener
  2014-10-20 14:11           ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-20  8:02 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I reworked the patch as you proposed, but I didn't understand what
> did you mean by:
>
>>So please rework the patch so critical edges are always handled
>>correctly.
>
> In current patch flag_force_vectorize is used (1) to reject phi nodes
> with more than 2 arguments; (2) to reject basic blocks with only
> critical incoming edges since support for extended predication of phi
> nodes will be in next patch.

I mean that (2) should not be rejected dependent on flag_force_vectorize.
It was rejected because if-cvt couldn't handle it correctly before but with
this patch this is fixed.  I see no reason to still reject this then even
for !flag_force_vectorize.

Rejecting PHIs with more than two arguments with flag_force_vectorize
is ok.

Richard.

> Could you please clarify your statement.
>
> I attached modified patch.
>
> ChangeLog:
>
> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
> if destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Use call of all_preds_critical_p
> to reject block if-conversion with incoming critical edges only if
> FLAG_FORCE_VECTORIZE was not set-up.
> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
> to compute predicate instead of fold_build2_loc.
> Add zeroing of edge 'aux' field.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
> Nullify 'aux' field of edges for blocks with two successors.
>
>
>
>
> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>> Could you please look at it ( I have already sent the patch with
>>> changes in add_to_predicate_list for review).
>>
>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>> +               fprintf (dump_file, "More than two phi node args.\n");
>> +             return false;
>> +           }
>> +
>> +        }
>>
>> Excess vertical space.
>>
>>
>> +/* Assumes that BB has more than 2 predecessors.
>>
>> More than 1 predecessor?
>>
>> +   Returns false if at least one successor is not on critical edge
>> +   and true otherwise.  */
>> +
>> +static inline bool
>> +all_edges_are_critical (basic_block bb)
>> +{
>>
>> "all_preds_critical_p" would be a better name
>>
>> +  if (EDGE_COUNT (bb->preds) > 2)
>> +    {
>> +      if (!flag_force_vectorize)
>> +       return false;
>> +    }
>>
>> as I said in the last review I don't think we should restrict edge
>> predicates to flag_force_vectorize.  At least I can't see how
>> if-conversion is magically more expensive for that case?
>>
>> So please rework the patch so critical edges are always handled
>> correctly.
>>
>> Ok with that and the above suggested changes.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks.
>>> Yuri.
>>> ChangeLog
>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>> if destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>> to reject block if-conversion with incoming critical edges only if
>>> FLAG_FORCE_VECTORIZE was not set-up.
>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>> to compute predicate instead of fold_build2_loc.
>>> Add zeroing of edge 'aux' field.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>> Nullify 'aux' field of edges for blocks with two successors.
>>>
>>>
>>>
>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>
>>>>> Second part of patch will be sent later.
>>>>
>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>> more.
>>>>
>>>>  static inline void
>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>  {
>>>> ...
>>>>
>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>> +        p1 & p2 | p1 & !p2.  */
>>>> +      if (dom_bb != loop->header
>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>> +       {
>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>> +         bc = bb_predicate (dom_bb);
>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>
>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>> unconditionally.  Note that you should call free_dominance_info
>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>
>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>> +
>>>> +  /* If edge E is critical save predicate on it.  */
>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>> +    set_edge_predicate (e, cond);
>>>>
>>>> how do we know the edge is critical by this simple check?  Why not
>>>> simply always save edge predicates (well, you kind of do but omit
>>>> the case where e->src dominates e->dest).
>>>>
>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>> for that).  So stuff like
>>>>
>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>> +         if (flag_force_vectorize)
>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>
>>>> shouldn't be necessary.
>>>>
>>>> I think the edge predicate handling should also be unconditionally
>>>> and not depend on flag_force_vectorize.
>>>>
>>>> +      /* The loop latch and loop exit block are always executed and
>>>> +        have no extra conditions to be processed: skip them.  */
>>>> +      if (bb == loop->latch
>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>
>>>> I don't think the edge stuff is true - given you still only reset the
>>>> loop->latch bb predicate the change looks broken.
>>>>
>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>> +             supported by vectorizer, so re-build it without folding.
>>>> +            For example, such conversion is generated for sequence:
>>>> +               _Bool _7, _8, _9;
>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>> +
>>>> +         if (CONVERT_EXPR_P (c)
>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>
>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>> operands from the GIMPLE condition unmodified after all.
>>>>
>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>> -                                    unshare_expr (cond), c2);
>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>> +                                    unshare_expr (c2));
>>>>
>>>> why is it necessary to unshare c2?
>>>>
>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>> that in detail).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>> Changelog.
>>>>>
>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>> for join blocks if it exists.
>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>> destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with incoming critical edges only if
>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (get_predicate_for_edge): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>> Invoke find_insertion_point to initialize gsi and
>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>> that extended predication must be applied).
>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>> predicates at the block begining for extended if-conversion.
>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>> for innermost loop marked with pragma omp simd and
>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>> for blocks with two successors.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>> Richard,
>>>>>>
>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>> Let's me also answer on your comments.
>>>>>>
>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>> My previous code was not correct and now it looks like:
>>>>>>
>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>     c = bb_predicate (b);
>>>>>>   else
>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>     c = edge_predicate (e);
>>>>>>
>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>> local-dce function in next patch.
>>>>>> 3. I also did not include in this patch recognition of general
>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>> scalar reduction can be applied also.
>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>> omp simd only.
>>>>>>
>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (convert_name_to_cmp): New function.
>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>> for join blocks if it exists.
>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>> that extended predication must be applied).
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>> for innermost loop marked with pragma omp simd and
>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>> for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard!
>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>
>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>> negate_predicate was deleted.
>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>> be critical.
>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>> blocks to simplify it.
>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>> of kind
>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>> arguments.
>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>
>>>>>>>> Updated patch is attached.
>>>>>>>>
>>>>>>>> Any comments will be appreciated.
>>>>>>>
>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>> it hard to review.
>>>>>>>
>>>>>>> In addition to that it changes function singatures without updating
>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>> all this added logic.
>>>>>>>
>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>
>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>
>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>> +   conditional expressions.  */
>>>>>>>
>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>
>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>> split critical edges (of the respective loop body).
>>>>>>>
>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>
>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>
>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>> (get_type_for_cond): New function.
>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>> (predicate_disjunction): New function.
>>>>>>>> (predicate_conjunction): New function.
>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>> (equal_phi_args): New function.
>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>> (walk_cond_tree): New function.
>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>> add_to_predicate_list.
>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>> predication to build mask.
>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>
>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>> include:
>>>>>>>>>>
>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>    predecessors.
>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>
>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>
>>>>>>>>> No?
>>>>>>>>>
>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>> with some limitations:
>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>> with next edge.
>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>
>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>
>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>
>>>>>>>>> becomes
>>>>>>>>>
>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>
>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>
>>>>>>>>> becomes
>>>>>>>>>
>>>>>>>>>   bb 5:
>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>
>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>
>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>
>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>
>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>   {
>>>>>>>>>>     float t = a[i];
>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>> res += 1;
>>>>>>>>>>   }
>>>>>>>>>>   <bb 4>:
>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>
>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>
>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>
>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>> equal to false.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>> predication to build mask.
>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-20  8:02         ` Richard Biener
@ 2014-10-20 14:11           ` Yuri Rumyantsev
  2014-10-21 12:29             ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-20 14:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Thanks for your answer!

In current implementation phi node conversion assume that one of
incoming edge to bb containing given phi has at least one non-critical
edge and choose it to insert predicated code. But if we choose
critical edge we need to determine insert point and insertion
direction (before/after) since in other case we can get invalid ssa
form (use before def). This is done by my new function which is not in
current patch ( I will present this patch later). SO I assume that we
need to leave this patch as it is to not introduce new bugs.

Thanks.
Yuri.

2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I reworked the patch as you proposed, but I didn't understand what
>> did you mean by:
>>
>>>So please rework the patch so critical edges are always handled
>>>correctly.
>>
>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>> with more than 2 arguments; (2) to reject basic blocks with only
>> critical incoming edges since support for extended predication of phi
>> nodes will be in next patch.
>
> I mean that (2) should not be rejected dependent on flag_force_vectorize.
> It was rejected because if-cvt couldn't handle it correctly before but with
> this patch this is fixed.  I see no reason to still reject this then even
> for !flag_force_vectorize.
>
> Rejecting PHIs with more than two arguments with flag_force_vectorize
> is ok.
>
> Richard.
>
>> Could you please clarify your statement.
>>
>> I attached modified patch.
>>
>> ChangeLog:
>>
>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>> if destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Use call of all_preds_critical_p
>> to reject block if-conversion with incoming critical edges only if
>> FLAG_FORCE_VECTORIZE was not set-up.
>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>> to compute predicate instead of fold_build2_loc.
>> Add zeroing of edge 'aux' field.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>> Nullify 'aux' field of edges for blocks with two successors.
>>
>>
>>
>>
>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>> Could you please look at it ( I have already sent the patch with
>>>> changes in add_to_predicate_list for review).
>>>
>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>> +             return false;
>>> +           }
>>> +
>>> +        }
>>>
>>> Excess vertical space.
>>>
>>>
>>> +/* Assumes that BB has more than 2 predecessors.
>>>
>>> More than 1 predecessor?
>>>
>>> +   Returns false if at least one successor is not on critical edge
>>> +   and true otherwise.  */
>>> +
>>> +static inline bool
>>> +all_edges_are_critical (basic_block bb)
>>> +{
>>>
>>> "all_preds_critical_p" would be a better name
>>>
>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>> +    {
>>> +      if (!flag_force_vectorize)
>>> +       return false;
>>> +    }
>>>
>>> as I said in the last review I don't think we should restrict edge
>>> predicates to flag_force_vectorize.  At least I can't see how
>>> if-conversion is magically more expensive for that case?
>>>
>>> So please rework the patch so critical edges are always handled
>>> correctly.
>>>
>>> Ok with that and the above suggested changes.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
>>>> Thanks.
>>>> Yuri.
>>>> ChangeLog
>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>> if destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>> to reject block if-conversion with incoming critical edges only if
>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>> to compute predicate instead of fold_build2_loc.
>>>> Add zeroing of edge 'aux' field.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>
>>>>
>>>>
>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>
>>>>>> Second part of patch will be sent later.
>>>>>
>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>> more.
>>>>>
>>>>>  static inline void
>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>  {
>>>>> ...
>>>>>
>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>> +      if (dom_bb != loop->header
>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>> +       {
>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>> +         bc = bb_predicate (dom_bb);
>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>
>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>
>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>> +
>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>> +    set_edge_predicate (e, cond);
>>>>>
>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>> the case where e->src dominates e->dest).
>>>>>
>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>> for that).  So stuff like
>>>>>
>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>> +         if (flag_force_vectorize)
>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>
>>>>> shouldn't be necessary.
>>>>>
>>>>> I think the edge predicate handling should also be unconditionally
>>>>> and not depend on flag_force_vectorize.
>>>>>
>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>> +      if (bb == loop->latch
>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>
>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>> loop->latch bb predicate the change looks broken.
>>>>>
>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>> +            For example, such conversion is generated for sequence:
>>>>> +               _Bool _7, _8, _9;
>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>> +
>>>>> +         if (CONVERT_EXPR_P (c)
>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>
>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>
>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>> -                                    unshare_expr (cond), c2);
>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>> +                                    unshare_expr (c2));
>>>>>
>>>>> why is it necessary to unshare c2?
>>>>>
>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>> that in detail).
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>
>>>>>> Changelog.
>>>>>>
>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>> for join blocks if it exists.
>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>> that extended predication must be applied).
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>> for innermost loop marked with pragma omp simd and
>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>> for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>> Richard,
>>>>>>>
>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>> Let's me also answer on your comments.
>>>>>>>
>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>
>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>     c = bb_predicate (b);
>>>>>>>   else
>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>     c = edge_predicate (e);
>>>>>>>
>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>> local-dce function in next patch.
>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>> scalar reduction can be applied also.
>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>> omp simd only.
>>>>>>>
>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (convert_name_to_cmp): New function.
>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>> for join blocks if it exists.
>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>> that extended predication must be applied).
>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>> for blocks with two successors.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard!
>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>
>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>> negate_predicate was deleted.
>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>> be critical.
>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>> blocks to simplify it.
>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>> of kind
>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>> arguments.
>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>
>>>>>>>>> Updated patch is attached.
>>>>>>>>>
>>>>>>>>> Any comments will be appreciated.
>>>>>>>>
>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>> it hard to review.
>>>>>>>>
>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>> all this added logic.
>>>>>>>>
>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>
>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>
>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>> +   conditional expressions.  */
>>>>>>>>
>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>
>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>
>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>
>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>
>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>> add_to_predicate_list.
>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>> predication to build mask.
>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>
>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>> include:
>>>>>>>>>>>
>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>    predecessors.
>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>
>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>
>>>>>>>>>> No?
>>>>>>>>>>
>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>> with some limitations:
>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>> with next edge.
>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>
>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>
>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>
>>>>>>>>>> becomes
>>>>>>>>>>
>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>
>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>
>>>>>>>>>> becomes
>>>>>>>>>>
>>>>>>>>>>   bb 5:
>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>
>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>
>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>
>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>
>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>   {
>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>> res += 1;
>>>>>>>>>>>   }
>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>
>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>
>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>> equal to false.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>> predication to build mask.
>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-20 14:11           ` Yuri Rumyantsev
@ 2014-10-21 12:29             ` Yuri Rumyantsev
  2014-10-21 12:56               ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-21 12:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 37928 bytes --]

Richard,

I did some changes in patch and ChangeLog to mark that support for
if-convert of blocks with only critical incoming edges will be added
in the future (more precise in patch.4).

Could you please review it.

Thanks.

ChangeLog:

2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>

(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
if destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_preds_critical_p): New function.
(if_convertible_bb_p): Use call of all_preds_critical_p
to reject temporarily block if-conversion with incoming critical edges
if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
after adding support for extended predication.
(predicate_bbs): Skip loop exit block also.Invoke build2_loc
to compute predicate instead of fold_build2_loc.
Add zeroing of edge 'aux' field.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and at least one incoming edge is not critical original
algorithm is used.
(tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
Nullify 'aux' field of edges for blocks with two successors.

2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Thanks for your answer!
>
> In current implementation phi node conversion assume that one of
> incoming edge to bb containing given phi has at least one non-critical
> edge and choose it to insert predicated code. But if we choose
> critical edge we need to determine insert point and insertion
> direction (before/after) since in other case we can get invalid ssa
> form (use before def). This is done by my new function which is not in
> current patch ( I will present this patch later). SO I assume that we
> need to leave this patch as it is to not introduce new bugs.
>
> Thanks.
> Yuri.
>
> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I reworked the patch as you proposed, but I didn't understand what
>>> did you mean by:
>>>
>>>>So please rework the patch so critical edges are always handled
>>>>correctly.
>>>
>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>> with more than 2 arguments; (2) to reject basic blocks with only
>>> critical incoming edges since support for extended predication of phi
>>> nodes will be in next patch.
>>
>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>> It was rejected because if-cvt couldn't handle it correctly before but with
>> this patch this is fixed.  I see no reason to still reject this then even
>> for !flag_force_vectorize.
>>
>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>> is ok.
>>
>> Richard.
>>
>>> Could you please clarify your statement.
>>>
>>> I attached modified patch.
>>>
>>> ChangeLog:
>>>
>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>> if destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>> to reject block if-conversion with incoming critical edges only if
>>> FLAG_FORCE_VECTORIZE was not set-up.
>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>> to compute predicate instead of fold_build2_loc.
>>> Add zeroing of edge 'aux' field.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>> Nullify 'aux' field of edges for blocks with two successors.
>>>
>>>
>>>
>>>
>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>> Could you please look at it ( I have already sent the patch with
>>>>> changes in add_to_predicate_list for review).
>>>>
>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>> +             return false;
>>>> +           }
>>>> +
>>>> +        }
>>>>
>>>> Excess vertical space.
>>>>
>>>>
>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>
>>>> More than 1 predecessor?
>>>>
>>>> +   Returns false if at least one successor is not on critical edge
>>>> +   and true otherwise.  */
>>>> +
>>>> +static inline bool
>>>> +all_edges_are_critical (basic_block bb)
>>>> +{
>>>>
>>>> "all_preds_critical_p" would be a better name
>>>>
>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>> +    {
>>>> +      if (!flag_force_vectorize)
>>>> +       return false;
>>>> +    }
>>>>
>>>> as I said in the last review I don't think we should restrict edge
>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>> if-conversion is magically more expensive for that case?
>>>>
>>>> So please rework the patch so critical edges are always handled
>>>> correctly.
>>>>
>>>> Ok with that and the above suggested changes.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>> ChangeLog
>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with incoming critical edges only if
>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>> to compute predicate instead of fold_build2_loc.
>>>>> Add zeroing of edge 'aux' field.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>
>>>>>
>>>>>
>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>
>>>>>>> Second part of patch will be sent later.
>>>>>>
>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>> more.
>>>>>>
>>>>>>  static inline void
>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>  {
>>>>>> ...
>>>>>>
>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>> +      if (dom_bb != loop->header
>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>> +       {
>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>
>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>
>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>> +
>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>> +    set_edge_predicate (e, cond);
>>>>>>
>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>> the case where e->src dominates e->dest).
>>>>>>
>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>> for that).  So stuff like
>>>>>>
>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>> +         if (flag_force_vectorize)
>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>
>>>>>> shouldn't be necessary.
>>>>>>
>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>> and not depend on flag_force_vectorize.
>>>>>>
>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>> +      if (bb == loop->latch
>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>
>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>
>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>> +            For example, such conversion is generated for sequence:
>>>>>> +               _Bool _7, _8, _9;
>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>> +
>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>
>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>
>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>> -                                    unshare_expr (cond), c2);
>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>> +                                    unshare_expr (c2));
>>>>>>
>>>>>> why is it necessary to unshare c2?
>>>>>>
>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>> that in detail).
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> Changelog.
>>>>>>>
>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>> for join blocks if it exists.
>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>> that extended predication must be applied).
>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>> for blocks with two successors.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>> Let's me also answer on your comments.
>>>>>>>>
>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>
>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>     c = bb_predicate (b);
>>>>>>>>   else
>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>     c = edge_predicate (e);
>>>>>>>>
>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>> local-dce function in next patch.
>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>> scalar reduction can be applied also.
>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>> omp simd only.
>>>>>>>>
>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>> for join blocks if it exists.
>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>> that extended predication must be applied).
>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>> for blocks with two successors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard!
>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>
>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>> be critical.
>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>> blocks to simplify it.
>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>> of kind
>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>> arguments.
>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>
>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>
>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>
>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>> it hard to review.
>>>>>>>>>
>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>> all this added logic.
>>>>>>>>>
>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>
>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>
>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>
>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>
>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>
>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>
>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>
>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>> predication to build mask.
>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>
>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>> include:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>
>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>
>>>>>>>>>>> No?
>>>>>>>>>>>
>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>
>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>
>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>
>>>>>>>>>>> becomes
>>>>>>>>>>>
>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>
>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>
>>>>>>>>>>> becomes
>>>>>>>>>>>
>>>>>>>>>>>   bb 5:
>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>
>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>
>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>
>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>
>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>   {
>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>   }
>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>
>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>> equal to false.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: if-conv.patch2.final --]
[-- Type: application/octet-stream, Size: 9629 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
old mode 100644
new mode 100755
index 3453292..74c17d9
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +152,17 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->src->succs) >= 2);
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +174,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.
+   Assumes that #(E->src->succs) >=2 & #(E->dest->preds) >= 2.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -481,10 +505,16 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.
+     Assume that #(e->src->succs) >= 2.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -508,7 +538,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -520,11 +552,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -754,7 +792,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -891,7 +929,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -938,6 +977,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -946,6 +1001,9 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction will be deleted after adding support for extended
+   predication.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -958,8 +1016,7 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
   if (exit_bb)
@@ -997,18 +1054,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction will be removed after adding support for
+     extended predication.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_preds_critical_p (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1122,11 +1178,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1137,7 +1194,7 @@ predicate_bbs (loop_p loop)
 	  tree c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+	  tree c = build2_loc (loc, gimple_cond_code (stmt),
 				    boolean_type_node,
 				    gimple_cond_lhs (stmt),
 				    gimple_cond_rhs (stmt));
@@ -1146,6 +1203,8 @@ predicate_bbs (loop_p loop)
 	  extract_true_false_edges_from_block (gimple_bb (stmt),
 					       &true_edge, &false_edge);
 
+          true_edge->aux = false_edge->aux = NULL;
+
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
@@ -1364,7 +1423,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1373,7 +1434,13 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
+  if (EDGE_COUNT (bb->preds) > 2
+      || all_preds_critical_p (bb))
+    {
+      gcc_assert (flag_force_vectorize);
+      return NULL;
+    }
+
   first_edge = EDGE_PRED (bb, 0);
   second_edge = EDGE_PRED (bb, 1);
 
@@ -2140,6 +2207,9 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Temporary set up this flag to false.  */
+  flag_force_vectorize = false;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2149,7 +2219,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2170,7 +2242,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 12:29             ` Yuri Rumyantsev
@ 2014-10-21 12:56               ` Richard Biener
  2014-10-21 13:26                 ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-21 12:56 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did some changes in patch and ChangeLog to mark that support for
> if-convert of blocks with only critical incoming edges will be added
> in the future (more precise in patch.4).

But the same reasoning applies to this version of the patch when
flag_force_vectorize is true!?  (insertion point and invalid SSA form)

Which means the patch doesn't make sense in isolation?

Btw, I think for the case you should simply do gsi_insert_on_edge ()
and commit_edge_insertions () before the call to combine_blocks
(pushing the edge predicate to the newly created block).

Richard.

> Could you please review it.
>
> Thanks.
>
> ChangeLog:
>
> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
> if destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_preds_critical_p): New function.
> (if_convertible_bb_p): Use call of all_preds_critical_p
> to reject temporarily block if-conversion with incoming critical edges
> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
> after adding support for extended predication.
> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
> to compute predicate instead of fold_build2_loc.
> Add zeroing of edge 'aux' field.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and at least one incoming edge is not critical original
> algorithm is used.
> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
> Nullify 'aux' field of edges for blocks with two successors.
>
> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Richard,
>>
>> Thanks for your answer!
>>
>> In current implementation phi node conversion assume that one of
>> incoming edge to bb containing given phi has at least one non-critical
>> edge and choose it to insert predicated code. But if we choose
>> critical edge we need to determine insert point and insertion
>> direction (before/after) since in other case we can get invalid ssa
>> form (use before def). This is done by my new function which is not in
>> current patch ( I will present this patch later). SO I assume that we
>> need to leave this patch as it is to not introduce new bugs.
>>
>> Thanks.
>> Yuri.
>>
>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I reworked the patch as you proposed, but I didn't understand what
>>>> did you mean by:
>>>>
>>>>>So please rework the patch so critical edges are always handled
>>>>>correctly.
>>>>
>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>> critical incoming edges since support for extended predication of phi
>>>> nodes will be in next patch.
>>>
>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>> this patch this is fixed.  I see no reason to still reject this then even
>>> for !flag_force_vectorize.
>>>
>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>> is ok.
>>>
>>> Richard.
>>>
>>>> Could you please clarify your statement.
>>>>
>>>> I attached modified patch.
>>>>
>>>> ChangeLog:
>>>>
>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>> if destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>> to reject block if-conversion with incoming critical edges only if
>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>> to compute predicate instead of fold_build2_loc.
>>>> Add zeroing of edge 'aux' field.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>> changes in add_to_predicate_list for review).
>>>>>
>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>> +             return false;
>>>>> +           }
>>>>> +
>>>>> +        }
>>>>>
>>>>> Excess vertical space.
>>>>>
>>>>>
>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>
>>>>> More than 1 predecessor?
>>>>>
>>>>> +   Returns false if at least one successor is not on critical edge
>>>>> +   and true otherwise.  */
>>>>> +
>>>>> +static inline bool
>>>>> +all_edges_are_critical (basic_block bb)
>>>>> +{
>>>>>
>>>>> "all_preds_critical_p" would be a better name
>>>>>
>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>> +    {
>>>>> +      if (!flag_force_vectorize)
>>>>> +       return false;
>>>>> +    }
>>>>>
>>>>> as I said in the last review I don't think we should restrict edge
>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>> if-conversion is magically more expensive for that case?
>>>>>
>>>>> So please rework the patch so critical edges are always handled
>>>>> correctly.
>>>>>
>>>>> Ok with that and the above suggested changes.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>> ChangeLog
>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>> Add zeroing of edge 'aux' field.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>
>>>>>>>> Second part of patch will be sent later.
>>>>>>>
>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>> more.
>>>>>>>
>>>>>>>  static inline void
>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>  {
>>>>>>> ...
>>>>>>>
>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>> +      if (dom_bb != loop->header
>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>> +       {
>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>
>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>
>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>> +
>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>
>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>> the case where e->src dominates e->dest).
>>>>>>>
>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>> for that).  So stuff like
>>>>>>>
>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>> +         if (flag_force_vectorize)
>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>
>>>>>>> shouldn't be necessary.
>>>>>>>
>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>
>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>> +      if (bb == loop->latch
>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>
>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>
>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>> +               _Bool _7, _8, _9;
>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>> +
>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>
>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>
>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>> +                                    unshare_expr (c2));
>>>>>>>
>>>>>>> why is it necessary to unshare c2?
>>>>>>>
>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>> that in detail).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>
>>>>>>>> Changelog.
>>>>>>>>
>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>> for join blocks if it exists.
>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>> that extended predication must be applied).
>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>> for blocks with two successors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>
>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>
>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>   else
>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>
>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>> local-dce function in next patch.
>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>> omp simd only.
>>>>>>>>>
>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>> for join blocks if it exists.
>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>> that extended predication must be applied).
>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>> for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard!
>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>
>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>> be critical.
>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>> of kind
>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>> arguments.
>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>
>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>
>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>
>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>> it hard to review.
>>>>>>>>>>
>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>> all this added logic.
>>>>>>>>>>
>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>
>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>
>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>
>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>
>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>
>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>
>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>
>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>> predication to build mask.
>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>
>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>> include:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>
>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>
>>>>>>>>>>>> No?
>>>>>>>>>>>>
>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>
>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>
>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>
>>>>>>>>>>>> becomes
>>>>>>>>>>>>
>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>
>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>>>>>>>>>
>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>
>>>>>>>>>>>> becomes
>>>>>>>>>>>>
>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>
>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>
>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>
>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 12:56               ` Richard Biener
@ 2014-10-21 13:26                 ` Yuri Rumyantsev
  2014-10-21 13:45                   ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-21 13:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Yes, This patch does not make sense since phi node predication for bb
with critical incoming edges only performs another function which is
absent (predicate_extended_scalar_phi).

BTW I see that commit_edge_insertions() is used for rtx instructions
only but you propose to use it for tree also.
Did I miss something?

Thanks ahead.


2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I did some changes in patch and ChangeLog to mark that support for
>> if-convert of blocks with only critical incoming edges will be added
>> in the future (more precise in patch.4).
>
> But the same reasoning applies to this version of the patch when
> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>
> Which means the patch doesn't make sense in isolation?
>
> Btw, I think for the case you should simply do gsi_insert_on_edge ()
> and commit_edge_insertions () before the call to combine_blocks
> (pushing the edge predicate to the newly created block).
>
> Richard.
>
>> Could you please review it.
>>
>> Thanks.
>>
>> ChangeLog:
>>
>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>> if destination block of edge is not always executed. Set-up predicate
>> for critical edge.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if FLAG_FORCE_VECTORIZE was set-up.
>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>> (if_convertible_stmt_p): Fix up pre-function comments.
>> (all_preds_critical_p): New function.
>> (if_convertible_bb_p): Use call of all_preds_critical_p
>> to reject temporarily block if-conversion with incoming critical edges
>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>> after adding support for extended predication.
>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>> to compute predicate instead of fold_build2_loc.
>> Add zeroing of edge 'aux' field.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and at least one incoming edge is not critical original
>> algorithm is used.
>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>> Nullify 'aux' field of edges for blocks with two successors.
>>
>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>> Richard,
>>>
>>> Thanks for your answer!
>>>
>>> In current implementation phi node conversion assume that one of
>>> incoming edge to bb containing given phi has at least one non-critical
>>> edge and choose it to insert predicated code. But if we choose
>>> critical edge we need to determine insert point and insertion
>>> direction (before/after) since in other case we can get invalid ssa
>>> form (use before def). This is done by my new function which is not in
>>> current patch ( I will present this patch later). SO I assume that we
>>> need to leave this patch as it is to not introduce new bugs.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>> did you mean by:
>>>>>
>>>>>>So please rework the patch so critical edges are always handled
>>>>>>correctly.
>>>>>
>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>> critical incoming edges since support for extended predication of phi
>>>>> nodes will be in next patch.
>>>>
>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>> for !flag_force_vectorize.
>>>>
>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>> is ok.
>>>>
>>>> Richard.
>>>>
>>>>> Could you please clarify your statement.
>>>>>
>>>>> I attached modified patch.
>>>>>
>>>>> ChangeLog:
>>>>>
>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>> to reject block if-conversion with incoming critical edges only if
>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>> to compute predicate instead of fold_build2_loc.
>>>>> Add zeroing of edge 'aux' field.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>> changes in add_to_predicate_list for review).
>>>>>>
>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>> +             return false;
>>>>>> +           }
>>>>>> +
>>>>>> +        }
>>>>>>
>>>>>> Excess vertical space.
>>>>>>
>>>>>>
>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>
>>>>>> More than 1 predecessor?
>>>>>>
>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>> +   and true otherwise.  */
>>>>>> +
>>>>>> +static inline bool
>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>> +{
>>>>>>
>>>>>> "all_preds_critical_p" would be a better name
>>>>>>
>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>> +    {
>>>>>> +      if (!flag_force_vectorize)
>>>>>> +       return false;
>>>>>> +    }
>>>>>>
>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>> if-conversion is magically more expensive for that case?
>>>>>>
>>>>>> So please rework the patch so critical edges are always handled
>>>>>> correctly.
>>>>>>
>>>>>> Ok with that and the above suggested changes.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>> ChangeLog
>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>
>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>
>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>> more.
>>>>>>>>
>>>>>>>>  static inline void
>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>  {
>>>>>>>> ...
>>>>>>>>
>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>> +       {
>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>
>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>
>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>> +
>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>
>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>
>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>> for that).  So stuff like
>>>>>>>>
>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>
>>>>>>>> shouldn't be necessary.
>>>>>>>>
>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>
>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>> +      if (bb == loop->latch
>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>
>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>
>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>> +
>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>
>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>
>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>
>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>
>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>> that in detail).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Changelog.
>>>>>>>>>
>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>> for join blocks if it exists.
>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>> that extended predication must be applied).
>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>> for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>
>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>
>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>   else
>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>
>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>> omp simd only.
>>>>>>>>>>
>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard!
>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>> be critical.
>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>> of kind
>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>> arguments.
>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>
>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>
>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>> it hard to review.
>>>>>>>>>>>
>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>> all this added logic.
>>>>>>>>>>>
>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>
>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>
>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>
>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>
>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>
>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>
>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>
>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> No?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>
>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>
>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>
>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>
>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>
>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 13:26                 ` Yuri Rumyantsev
@ 2014-10-21 13:45                   ` Richard Biener
  2014-10-21 14:01                     ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-21 13:45 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Yes, This patch does not make sense since phi node predication for bb
> with critical incoming edges only performs another function which is
> absent (predicate_extended_scalar_phi).
>
> BTW I see that commit_edge_insertions() is used for rtx instructions
> only but you propose to use it for tree also.
> Did I miss something?

Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
if you want easy access to the newly created basic block to push
the predicate to - see gsi_commit_edge_inserts implementation).

Richard.

> Thanks ahead.
>
>
> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I did some changes in patch and ChangeLog to mark that support for
>>> if-convert of blocks with only critical incoming edges will be added
>>> in the future (more precise in patch.4).
>>
>> But the same reasoning applies to this version of the patch when
>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>
>> Which means the patch doesn't make sense in isolation?
>>
>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>> and commit_edge_insertions () before the call to combine_blocks
>> (pushing the edge predicate to the newly created block).
>>
>> Richard.
>>
>>> Could you please review it.
>>>
>>> Thanks.
>>>
>>> ChangeLog:
>>>
>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>> if destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_preds_critical_p): New function.
>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>> to reject temporarily block if-conversion with incoming critical edges
>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>> after adding support for extended predication.
>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>> to compute predicate instead of fold_build2_loc.
>>> Add zeroing of edge 'aux' field.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and at least one incoming edge is not critical original
>>> algorithm is used.
>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>> Nullify 'aux' field of edges for blocks with two successors.
>>>
>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>> Richard,
>>>>
>>>> Thanks for your answer!
>>>>
>>>> In current implementation phi node conversion assume that one of
>>>> incoming edge to bb containing given phi has at least one non-critical
>>>> edge and choose it to insert predicated code. But if we choose
>>>> critical edge we need to determine insert point and insertion
>>>> direction (before/after) since in other case we can get invalid ssa
>>>> form (use before def). This is done by my new function which is not in
>>>> current patch ( I will present this patch later). SO I assume that we
>>>> need to leave this patch as it is to not introduce new bugs.
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>> did you mean by:
>>>>>>
>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>correctly.
>>>>>>
>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>> critical incoming edges since support for extended predication of phi
>>>>>> nodes will be in next patch.
>>>>>
>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>> for !flag_force_vectorize.
>>>>>
>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>> is ok.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Could you please clarify your statement.
>>>>>>
>>>>>> I attached modified patch.
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>> Add zeroing of edge 'aux' field.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>
>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>> +             return false;
>>>>>>> +           }
>>>>>>> +
>>>>>>> +        }
>>>>>>>
>>>>>>> Excess vertical space.
>>>>>>>
>>>>>>>
>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>
>>>>>>> More than 1 predecessor?
>>>>>>>
>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>> +   and true otherwise.  */
>>>>>>> +
>>>>>>> +static inline bool
>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>> +{
>>>>>>>
>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>
>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>> +    {
>>>>>>> +      if (!flag_force_vectorize)
>>>>>>> +       return false;
>>>>>>> +    }
>>>>>>>
>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>
>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>> correctly.
>>>>>>>
>>>>>>> Ok with that and the above suggested changes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>> ChangeLog
>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>
>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>
>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>> more.
>>>>>>>>>
>>>>>>>>>  static inline void
>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>  {
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>> +       {
>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>
>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>
>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>> +
>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>
>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>
>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>> for that).  So stuff like
>>>>>>>>>
>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>
>>>>>>>>> shouldn't be necessary.
>>>>>>>>>
>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>
>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>
>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>
>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>> +
>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>
>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>
>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>
>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>
>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>> that in detail).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Changelog.
>>>>>>>>>>
>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>
>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>
>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>   else
>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>
>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>> omp simd only.
>>>>>>>>>>>
>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>
>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>
>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>
>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>
>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>
>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>
>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>
>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 13:45                   ` Richard Biener
@ 2014-10-21 14:01                     ` Yuri Rumyantsev
  2014-10-21 14:11                       ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-21 14:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

I saw the sources of these functions, but I can't understand why I
should use something else? Note that all predicate computations are
located in basic blocks ( by design of if-conv) and there is special
function that put these computations in bb
(insert_gimplified_predicates). Edge contains only predicate not its
computations. New function - find_insertion_point() does very simple
search - it finds out the latest (in current bb) operand def-stmt of
predicates taken from all incoming edges.
In original algorithm the predicate of non-critical edge is taken to
perform phi-node predication since for critical edge it does not work
properly.

My question is: does your comments mean that I should re-design my extensions?

Thanks.
Yuri.

BTW Jeff did initial review of my changes related to predicate
computation for join blocks. I presented him updated patch with
test-case and some minor changes in patch. But still did not get any
feedback on it. Could you please take a look also on it?


2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Yes, This patch does not make sense since phi node predication for bb
>> with critical incoming edges only performs another function which is
>> absent (predicate_extended_scalar_phi).
>>
>> BTW I see that commit_edge_insertions() is used for rtx instructions
>> only but you propose to use it for tree also.
>> Did I miss something?
>
> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
> if you want easy access to the newly created basic block to push
> the predicate to - see gsi_commit_edge_inserts implementation).
>
> Richard.
>
>> Thanks ahead.
>>
>>
>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I did some changes in patch and ChangeLog to mark that support for
>>>> if-convert of blocks with only critical incoming edges will be added
>>>> in the future (more precise in patch.4).
>>>
>>> But the same reasoning applies to this version of the patch when
>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>
>>> Which means the patch doesn't make sense in isolation?
>>>
>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>> and commit_edge_insertions () before the call to combine_blocks
>>> (pushing the edge predicate to the newly created block).
>>>
>>> Richard.
>>>
>>>> Could you please review it.
>>>>
>>>> Thanks.
>>>>
>>>> ChangeLog:
>>>>
>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> (flag_force_vectorize): New variable.
>>>> (edge_predicate): New function.
>>>> (set_edge_predicate): New function.
>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>> if destination block of edge is not always executed. Set-up predicate
>>>> for critical edge.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>> (all_preds_critical_p): New function.
>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>> to reject temporarily block if-conversion with incoming critical edges
>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>> after adding support for extended predication.
>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>> to compute predicate instead of fold_build2_loc.
>>>> Add zeroing of edge 'aux' field.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and at least one incoming edge is not critical original
>>>> algorithm is used.
>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>
>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>> Richard,
>>>>>
>>>>> Thanks for your answer!
>>>>>
>>>>> In current implementation phi node conversion assume that one of
>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>> edge and choose it to insert predicated code. But if we choose
>>>>> critical edge we need to determine insert point and insertion
>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>> form (use before def). This is done by my new function which is not in
>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>> did you mean by:
>>>>>>>
>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>correctly.
>>>>>>>
>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>> nodes will be in next patch.
>>>>>>
>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>> for !flag_force_vectorize.
>>>>>>
>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>> is ok.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Could you please clarify your statement.
>>>>>>>
>>>>>>> I attached modified patch.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_edges_are_critical): New function.
>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>
>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>> +             return false;
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +        }
>>>>>>>>
>>>>>>>> Excess vertical space.
>>>>>>>>
>>>>>>>>
>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>
>>>>>>>> More than 1 predecessor?
>>>>>>>>
>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>> +   and true otherwise.  */
>>>>>>>> +
>>>>>>>> +static inline bool
>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>> +{
>>>>>>>>
>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>
>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>> +    {
>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>> +       return false;
>>>>>>>> +    }
>>>>>>>>
>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>
>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Yuri.
>>>>>>>>> ChangeLog
>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>
>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>
>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>> more.
>>>>>>>>>>
>>>>>>>>>>  static inline void
>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>  {
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>> +       {
>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>
>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>
>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>> +
>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>
>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>
>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>
>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>
>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>
>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>
>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>
>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>
>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>> +
>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>
>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>
>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>
>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>
>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>> that in detail).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Changelog.
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>
>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>   else
>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>
>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>
>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 14:01                     ` Yuri Rumyantsev
@ 2014-10-21 14:11                       ` Richard Biener
  2014-10-21 14:20                         ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-21 14:11 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I saw the sources of these functions, but I can't understand why I
> should use something else? Note that all predicate computations are
> located in basic blocks ( by design of if-conv) and there is special
> function that put these computations in bb
> (insert_gimplified_predicates). Edge contains only predicate not its
> computations. New function - find_insertion_point() does very simple
> search - it finds out the latest (in current bb) operand def-stmt of
> predicates taken from all incoming edges.
> In original algorithm the predicate of non-critical edge is taken to
> perform phi-node predication since for critical edge it does not work
> properly.
>
> My question is: does your comments mean that I should re-design my extensions?

Well, we have infrastructure for inserting code on edges and you've
made critical edges predicated correctly.  So why re-invent the wheel?
I realize this is very similar to my initial suggestion to simply split
critical edges in loops you want to if-convert but delays splitting
until it turns out to be necessary (which might be good for the
!force_vect case).

For edge predicates you simply can emit their computation on the
edge, no?

Btw, I very originally suggested to rework if-conversion to only
record edge predicates - having both block and edge predicates
somewhat complicates the code and makes it harder to
maintain (thus also the suggestion to simply split critical edges
if necessary to make BB predicates work always).

Your patches add a lot of code and to me it seems we can avoid
doing so much special casing.

Richard.

> Thanks.
> Yuri.
>
> BTW Jeff did initial review of my changes related to predicate
> computation for join blocks. I presented him updated patch with
> test-case and some minor changes in patch. But still did not get any
> feedback on it. Could you please take a look also on it?
>
>
> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Yes, This patch does not make sense since phi node predication for bb
>>> with critical incoming edges only performs another function which is
>>> absent (predicate_extended_scalar_phi).
>>>
>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>> only but you propose to use it for tree also.
>>> Did I miss something?
>>
>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>> if you want easy access to the newly created basic block to push
>> the predicate to - see gsi_commit_edge_inserts implementation).
>>
>> Richard.
>>
>>> Thanks ahead.
>>>
>>>
>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>> in the future (more precise in patch.4).
>>>>
>>>> But the same reasoning applies to this version of the patch when
>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>
>>>> Which means the patch doesn't make sense in isolation?
>>>>
>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>> and commit_edge_insertions () before the call to combine_blocks
>>>> (pushing the edge predicate to the newly created block).
>>>>
>>>> Richard.
>>>>
>>>>> Could you please review it.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> ChangeLog:
>>>>>
>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_preds_critical_p): New function.
>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>> after adding support for extended predication.
>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>> to compute predicate instead of fold_build2_loc.
>>>>> Add zeroing of edge 'aux' field.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>
>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>> Richard,
>>>>>>
>>>>>> Thanks for your answer!
>>>>>>
>>>>>> In current implementation phi node conversion assume that one of
>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>> critical edge we need to determine insert point and insertion
>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>> form (use before def). This is done by my new function which is not in
>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>> did you mean by:
>>>>>>>>
>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>correctly.
>>>>>>>>
>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>> nodes will be in next patch.
>>>>>>>
>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>> for !flag_force_vectorize.
>>>>>>>
>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>> is ok.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Could you please clarify your statement.
>>>>>>>>
>>>>>>>> I attached modified patch.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>
>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>> +             return false;
>>>>>>>>> +           }
>>>>>>>>> +
>>>>>>>>> +        }
>>>>>>>>>
>>>>>>>>> Excess vertical space.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>
>>>>>>>>> More than 1 predecessor?
>>>>>>>>>
>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>> +   and true otherwise.  */
>>>>>>>>> +
>>>>>>>>> +static inline bool
>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>> +{
>>>>>>>>>
>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>
>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>> +    {
>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>> +       return false;
>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>
>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>> correctly.
>>>>>>>>>
>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> Yuri.
>>>>>>>>>> ChangeLog
>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>
>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>
>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>> more.
>>>>>>>>>>>
>>>>>>>>>>>  static inline void
>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>  {
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>> +       {
>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>
>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>
>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>> +
>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>
>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>
>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>
>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>
>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>
>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>
>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>
>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>
>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>> +
>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>
>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>
>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>
>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>
>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>> that in detail).
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>   else
>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 14:11                       ` Richard Biener
@ 2014-10-21 14:20                         ` Richard Biener
  2014-10-21 14:36                           ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-21 14:20 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I saw the sources of these functions, but I can't understand why I
>> should use something else? Note that all predicate computations are
>> located in basic blocks ( by design of if-conv) and there is special
>> function that put these computations in bb
>> (insert_gimplified_predicates). Edge contains only predicate not its
>> computations. New function - find_insertion_point() does very simple
>> search - it finds out the latest (in current bb) operand def-stmt of
>> predicates taken from all incoming edges.
>> In original algorithm the predicate of non-critical edge is taken to
>> perform phi-node predication since for critical edge it does not work
>> properly.
>>
>> My question is: does your comments mean that I should re-design my extensions?
>
> Well, we have infrastructure for inserting code on edges and you've
> made critical edges predicated correctly.  So why re-invent the wheel?
> I realize this is very similar to my initial suggestion to simply split
> critical edges in loops you want to if-convert but delays splitting
> until it turns out to be necessary (which might be good for the
> !force_vect case).
>
> For edge predicates you simply can emit their computation on the
> edge, no?
>
> Btw, I very originally suggested to rework if-conversion to only
> record edge predicates - having both block and edge predicates
> somewhat complicates the code and makes it harder to
> maintain (thus also the suggestion to simply split critical edges
> if necessary to make BB predicates work always).
>
> Your patches add a lot of code and to me it seems we can avoid
> doing so much special casing.

For example attacking the critical edge issue by a simple

Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c      (revision 216508)
+++ tree-if-conv.c      (working copy)
@@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
        if (EDGE_COUNT (e->src->succs) == 1)
          found = true;
       if (!found)
-       {
-         if (dump_file && (dump_flags & TDF_DETAILS))
-           fprintf (dump_file, "only critical predecessors\n");
-         return false;
-       }
+       split_edge (EDGE_PRED (bb, 0));
     }

   return true;

it changes the number of blocks in the loop, so
get_loop_body_in_if_conv_order should probably be re-done with the
above eventually signalling that it created a new block.  Or the above
should populate a vector of edges to split and do that after the
loop calling if_convertible_bb_p.

Richard.

> Richard.
>
>> Thanks.
>> Yuri.
>>
>> BTW Jeff did initial review of my changes related to predicate
>> computation for join blocks. I presented him updated patch with
>> test-case and some minor changes in patch. But still did not get any
>> feedback on it. Could you please take a look also on it?
>>
>>
>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Yes, This patch does not make sense since phi node predication for bb
>>>> with critical incoming edges only performs another function which is
>>>> absent (predicate_extended_scalar_phi).
>>>>
>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>> only but you propose to use it for tree also.
>>>> Did I miss something?
>>>
>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>> if you want easy access to the newly created basic block to push
>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>>
>>>>
>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>> in the future (more precise in patch.4).
>>>>>
>>>>> But the same reasoning applies to this version of the patch when
>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>
>>>>> Which means the patch doesn't make sense in isolation?
>>>>>
>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>> (pushing the edge predicate to the newly created block).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Could you please review it.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_preds_critical_p): New function.
>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>> after adding support for extended predication.
>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>> Add zeroing of edge 'aux' field.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>
>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Thanks for your answer!
>>>>>>>
>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>> did you mean by:
>>>>>>>>>
>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>correctly.
>>>>>>>>>
>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>> nodes will be in next patch.
>>>>>>>>
>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>> for !flag_force_vectorize.
>>>>>>>>
>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>> is ok.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>
>>>>>>>>> I attached modified patch.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>
>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>> +             return false;
>>>>>>>>>> +           }
>>>>>>>>>> +
>>>>>>>>>> +        }
>>>>>>>>>>
>>>>>>>>>> Excess vertical space.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>
>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>
>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>> +
>>>>>>>>>> +static inline bool
>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>> +{
>>>>>>>>>>
>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>
>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>> +    {
>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>> +       return false;
>>>>>>>>>> +    }
>>>>>>>>>>
>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>
>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>> correctly.
>>>>>>>>>>
>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> ChangeLog
>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>> more.
>>>>>>>>>>>>
>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>  {
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>> +       {
>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>
>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>> +
>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>
>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>
>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>
>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>
>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>
>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>
>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>
>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>
>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>
>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 14:20                         ` Richard Biener
@ 2014-10-21 14:36                           ` Yuri Rumyantsev
  2014-10-24  9:14                             ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-21 14:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

In my initial design I did such splitting but before start real
if-conversion but I decided to not perform it since code size for
if-converted loop is growing (number of phi nodes is increased). It is
worth noting also that for phi with #nodes > 2 we need to get all
predicates (except one) to do phi-predication and it means that block
containing such phi can have only 1 critical edge.

Thanks.
Yuri.

2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I saw the sources of these functions, but I can't understand why I
>>> should use something else? Note that all predicate computations are
>>> located in basic blocks ( by design of if-conv) and there is special
>>> function that put these computations in bb
>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>> computations. New function - find_insertion_point() does very simple
>>> search - it finds out the latest (in current bb) operand def-stmt of
>>> predicates taken from all incoming edges.
>>> In original algorithm the predicate of non-critical edge is taken to
>>> perform phi-node predication since for critical edge it does not work
>>> properly.
>>>
>>> My question is: does your comments mean that I should re-design my extensions?
>>
>> Well, we have infrastructure for inserting code on edges and you've
>> made critical edges predicated correctly.  So why re-invent the wheel?
>> I realize this is very similar to my initial suggestion to simply split
>> critical edges in loops you want to if-convert but delays splitting
>> until it turns out to be necessary (which might be good for the
>> !force_vect case).
>>
>> For edge predicates you simply can emit their computation on the
>> edge, no?
>>
>> Btw, I very originally suggested to rework if-conversion to only
>> record edge predicates - having both block and edge predicates
>> somewhat complicates the code and makes it harder to
>> maintain (thus also the suggestion to simply split critical edges
>> if necessary to make BB predicates work always).
>>
>> Your patches add a lot of code and to me it seems we can avoid
>> doing so much special casing.
>
> For example attacking the critical edge issue by a simple
>
> Index: tree-if-conv.c
> ===================================================================
> --- tree-if-conv.c      (revision 216508)
> +++ tree-if-conv.c      (working copy)
> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>         if (EDGE_COUNT (e->src->succs) == 1)
>           found = true;
>        if (!found)
> -       {
> -         if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "only critical predecessors\n");
> -         return false;
> -       }
> +       split_edge (EDGE_PRED (bb, 0));
>      }
>
>    return true;
>
> it changes the number of blocks in the loop, so
> get_loop_body_in_if_conv_order should probably be re-done with the
> above eventually signalling that it created a new block.  Or the above
> should populate a vector of edges to split and do that after the
> loop calling if_convertible_bb_p.
>
> Richard.
>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> BTW Jeff did initial review of my changes related to predicate
>>> computation for join blocks. I presented him updated patch with
>>> test-case and some minor changes in patch. But still did not get any
>>> feedback on it. Could you please take a look also on it?
>>>
>>>
>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>> with critical incoming edges only performs another function which is
>>>>> absent (predicate_extended_scalar_phi).
>>>>>
>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>> only but you propose to use it for tree also.
>>>>> Did I miss something?
>>>>
>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>> if you want easy access to the newly created basic block to push
>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>>
>>>>>
>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>> in the future (more precise in patch.4).
>>>>>>
>>>>>> But the same reasoning applies to this version of the patch when
>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>
>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>
>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Could you please review it.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_preds_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>> after adding support for extended predication.
>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>
>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Thanks for your answer!
>>>>>>>>
>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>> did you mean by:
>>>>>>>>>>
>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>correctly.
>>>>>>>>>>
>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>
>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>
>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>> is ok.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>
>>>>>>>>>> I attached modified patch.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>
>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>> +             return false;
>>>>>>>>>>> +           }
>>>>>>>>>>> +
>>>>>>>>>>> +        }
>>>>>>>>>>>
>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>
>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>
>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>> +
>>>>>>>>>>> +static inline bool
>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>> +{
>>>>>>>>>>>
>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>
>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>> +       return false;
>>>>>>>>>>> +    }
>>>>>>>>>>>
>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>
>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>> correctly.
>>>>>>>>>>>
>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>> more.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>> +       {
>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>
>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>
>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>
>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>
>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-21 14:36                           ` Yuri Rumyantsev
@ 2014-10-24  9:14                             ` Richard Biener
  2014-10-24 10:23                               ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2014-10-24  9:14 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> In my initial design I did such splitting but before start real
> if-conversion but I decided to not perform it since code size for
> if-converted loop is growing (number of phi nodes is increased). It is
> worth noting also that for phi with #nodes > 2 we need to get all
> predicates (except one) to do phi-predication and it means that block
> containing such phi can have only 1 critical edge.

Can you point me to the patch with the special insertion code then?
I definitely want to avoid the mess we ran into with the reassoc
code "clever" insertion code.

Richard.

> Thanks.
> Yuri.
>
> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I saw the sources of these functions, but I can't understand why I
>>>> should use something else? Note that all predicate computations are
>>>> located in basic blocks ( by design of if-conv) and there is special
>>>> function that put these computations in bb
>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>> computations. New function - find_insertion_point() does very simple
>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>> predicates taken from all incoming edges.
>>>> In original algorithm the predicate of non-critical edge is taken to
>>>> perform phi-node predication since for critical edge it does not work
>>>> properly.
>>>>
>>>> My question is: does your comments mean that I should re-design my extensions?
>>>
>>> Well, we have infrastructure for inserting code on edges and you've
>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>> I realize this is very similar to my initial suggestion to simply split
>>> critical edges in loops you want to if-convert but delays splitting
>>> until it turns out to be necessary (which might be good for the
>>> !force_vect case).
>>>
>>> For edge predicates you simply can emit their computation on the
>>> edge, no?
>>>
>>> Btw, I very originally suggested to rework if-conversion to only
>>> record edge predicates - having both block and edge predicates
>>> somewhat complicates the code and makes it harder to
>>> maintain (thus also the suggestion to simply split critical edges
>>> if necessary to make BB predicates work always).
>>>
>>> Your patches add a lot of code and to me it seems we can avoid
>>> doing so much special casing.
>>
>> For example attacking the critical edge issue by a simple
>>
>> Index: tree-if-conv.c
>> ===================================================================
>> --- tree-if-conv.c      (revision 216508)
>> +++ tree-if-conv.c      (working copy)
>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>         if (EDGE_COUNT (e->src->succs) == 1)
>>           found = true;
>>        if (!found)
>> -       {
>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>> -           fprintf (dump_file, "only critical predecessors\n");
>> -         return false;
>> -       }
>> +       split_edge (EDGE_PRED (bb, 0));
>>      }
>>
>>    return true;
>>
>> it changes the number of blocks in the loop, so
>> get_loop_body_in_if_conv_order should probably be re-done with the
>> above eventually signalling that it created a new block.  Or the above
>> should populate a vector of edges to split and do that after the
>> loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> BTW Jeff did initial review of my changes related to predicate
>>>> computation for join blocks. I presented him updated patch with
>>>> test-case and some minor changes in patch. But still did not get any
>>>> feedback on it. Could you please take a look also on it?
>>>>
>>>>
>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>> with critical incoming edges only performs another function which is
>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>
>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>> only but you propose to use it for tree also.
>>>>>> Did I miss something?
>>>>>
>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>> if you want easy access to the newly created basic block to push
>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks ahead.
>>>>>>
>>>>>>
>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>> in the future (more precise in patch.4).
>>>>>>>
>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>
>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>
>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Could you please review it.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>> after adding support for extended predication.
>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>
>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your answer!
>>>>>>>>>
>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>> did you mean by:
>>>>>>>>>>>
>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>correctly.
>>>>>>>>>>>
>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>
>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>
>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>> is ok.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>
>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>
>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>> +             return false;
>>>>>>>>>>>> +           }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        }
>>>>>>>>>>>>
>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>
>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>
>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>> +{
>>>>>>>>>>>>
>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>> +       return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>
>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>
>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>> correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-24  9:14                             ` Richard Biener
@ 2014-10-24 10:23                               ` Yuri Rumyantsev
  2014-11-07 14:08                                 ` Yuri Rumyantsev
  0 siblings, 1 reply; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-10-24 10:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 53857 bytes --]

Richard,

Patch containing new core related to extended predication is attached.
Here is few comments which explain a main goal of design.

1. I don't want to insert any critical edge splitting since it may
lead to less efficient binaries (I remember some performance issue
when we designed lazy code motion algorithm in SPARC compiler).
2. One special case of extended PHI node predication was introduced
when #arguments is more than 2 but only two arguments are different
and one argument has the only occurrence. For such PHI conditional
scalar reduction is applied.
This is correspondent to the following:
    if (q1 && q2 && q3) var++
 New function phi_has_two_different_args was introduced to detect such phi.
3. Original algorithm for PHI predication used assumption that at
least one incoming edge for blocks containing PHI is not critical - it
guarantees that all computations related to predicate of normal edge
are already inserted above this block and
core related to PHI predication can be inserted at the beginning of
block. But this is not true for critical edges for which predicate
computations are  in the block where code for phi predication must be
inserted. So new function find_insertion_point is introduced which is
simply found out the last statement in block defining predicates
correspondent to all incoming edges and insert phi predication code
after it (with some minor exceptions).

If you need more comments or something unclear will let me know.

Thanks.
Yuri.

ChangeLog:

2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
FLAG_FORCE_VECTORIZE instead of loop flag.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
FLAG_FORCE_VECTORIZE is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(phi_has_two_different_args): New function.
(is_cond_scalar_reduction): Add argument EXTENDED to choose access
to phi arguments. Invoke phi_has_two_different_args to get phi
arguments if EXTENDED is true. Change check that block
containing reduction statement candidate is predecessor
of phi-block since phi may have more than two arguments.
(convert_scalar_cond_reduction): Add argument BEFORE to insert
statement before/after gsi point.
(predicate_scalar_phi): Add argument false (which means non-extended
predication) to call of is_cond_scalar_reduction. Add argument
true (which correspondent to argument BEFORE) to call of
convert_scalar_cond_reduction.
(get_predicate_for_edge): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_extended_scalar_phi): New function.
(find_insertion_point): New function.
(predicate_all_scalar_phis): Add two boolean variables EXTENDED and
BEFORE. Initialize EXTENDED to true if BB containing phi has more
than 2 predecessors or both incoming edges are critical. Invoke
find_phi_replacement_condition and predicate_scalar_phi or
find_insertion_point and predicate_extended_scalar_phi depending on
EXTENDED value.
(insert_gimplified_predicates): Add check that non-predicated block
may have statements to insert. Insert predicate of BB just after label
if FLAG_FORCE_VECTORIZE is true.
(tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
is copy of inner or outer loop field force_vectorize.


2014-10-24 13:12 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> In my initial design I did such splitting but before start real
>> if-conversion but I decided to not perform it since code size for
>> if-converted loop is growing (number of phi nodes is increased). It is
>> worth noting also that for phi with #nodes > 2 we need to get all
>> predicates (except one) to do phi-predication and it means that block
>> containing such phi can have only 1 critical edge.
>
> Can you point me to the patch with the special insertion code then?
> I definitely want to avoid the mess we ran into with the reassoc
> code "clever" insertion code.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I saw the sources of these functions, but I can't understand why I
>>>>> should use something else? Note that all predicate computations are
>>>>> located in basic blocks ( by design of if-conv) and there is special
>>>>> function that put these computations in bb
>>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>>> computations. New function - find_insertion_point() does very simple
>>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>>> predicates taken from all incoming edges.
>>>>> In original algorithm the predicate of non-critical edge is taken to
>>>>> perform phi-node predication since for critical edge it does not work
>>>>> properly.
>>>>>
>>>>> My question is: does your comments mean that I should re-design my extensions?
>>>>
>>>> Well, we have infrastructure for inserting code on edges and you've
>>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>>> I realize this is very similar to my initial suggestion to simply split
>>>> critical edges in loops you want to if-convert but delays splitting
>>>> until it turns out to be necessary (which might be good for the
>>>> !force_vect case).
>>>>
>>>> For edge predicates you simply can emit their computation on the
>>>> edge, no?
>>>>
>>>> Btw, I very originally suggested to rework if-conversion to only
>>>> record edge predicates - having both block and edge predicates
>>>> somewhat complicates the code and makes it harder to
>>>> maintain (thus also the suggestion to simply split critical edges
>>>> if necessary to make BB predicates work always).
>>>>
>>>> Your patches add a lot of code and to me it seems we can avoid
>>>> doing so much special casing.
>>>
>>> For example attacking the critical edge issue by a simple
>>>
>>> Index: tree-if-conv.c
>>> ===================================================================
>>> --- tree-if-conv.c      (revision 216508)
>>> +++ tree-if-conv.c      (working copy)
>>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>>         if (EDGE_COUNT (e->src->succs) == 1)
>>>           found = true;
>>>        if (!found)
>>> -       {
>>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>>> -           fprintf (dump_file, "only critical predecessors\n");
>>> -         return false;
>>> -       }
>>> +       split_edge (EDGE_PRED (bb, 0));
>>>      }
>>>
>>>    return true;
>>>
>>> it changes the number of blocks in the loop, so
>>> get_loop_body_in_if_conv_order should probably be re-done with the
>>> above eventually signalling that it created a new block.  Or the above
>>> should populate a vector of edges to split and do that after the
>>> loop calling if_convertible_bb_p.
>>>
>>> Richard.
>>>
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> BTW Jeff did initial review of my changes related to predicate
>>>>> computation for join blocks. I presented him updated patch with
>>>>> test-case and some minor changes in patch. But still did not get any
>>>>> feedback on it. Could you please take a look also on it?
>>>>>
>>>>>
>>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>>> with critical incoming edges only performs another function which is
>>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>>
>>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>>> only but you propose to use it for tree also.
>>>>>>> Did I miss something?
>>>>>>
>>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>>> if you want easy access to the newly created basic block to push
>>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks ahead.
>>>>>>>
>>>>>>>
>>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>>> in the future (more precise in patch.4).
>>>>>>>>
>>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>>
>>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>>
>>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Could you please review it.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>>> after adding support for extended predication.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>
>>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>>> did you mean by:
>>>>>>>>>>>>
>>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>>
>>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>>
>>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>>> is ok.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>>
>>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>>
>>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>>> +             return false;
>>>>>>>>>>>>> +           }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>>
>>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>>
>>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>>> +    {
>>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>
>>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>>
>>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: ChangeLog.patch-3 --]
[-- Type: application/octet-stream, Size: 1870 bytes --]

2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>

	* tree-if-conv.c (ifcvt_can_use_mask_load_store): Use 
	FLAG_FORCE_VECTORIZE instead of loop flag.
	(if_convertible_bb_p): Allow bb has more than 2 predecessors if
	FLAG_FORCE_VECTORIZE is true.
	(if_convertible_bb_p): Delete check that bb has at least one
	non-critical incoming edge.
	(phi_has_two_different_args): New function.
	(is_cond_scalar_reduction): Add argument EXTENDED to choose access
	to phi arguments. Invoke phi_has_two_different_args to get phi
	arguments if EXTENDED is true. Change check that block
	containing reduction statement candidate is predecessor
	of phi-block since phi may have more than two arguments.
	(convert_scalar_cond_reduction): Add argument BEFORE to insert
	statement before/after gsi point.
	(predicate_scalar_phi): Add argument false (which means non-extended
	predication) to call of is_cond_scalar_reduction. Add argument
	true (which correspondent to argument BEFORE) to call of
	convert_scalar_cond_reduction.
	(get_predicate_for_edge): New function.
	(predicate_arbitrary_scalar_phi): New function.
	(predicate_extended_scalar_phi): New function.
	(find_insertion_point): New function.
	(predicate_all_scalar_phis): Add two boolean variables EXTENDED and
	BEFORE. Initialize EXTENDED to true if BB containing phi has more
	than 2 predecessors or both incoming edges are critical. Invoke
	find_phi_replacement_condition and predicate_scalar_phi or
	find_insertion_point and predicate_extended_scalar_phi depending on
	EXTENDED value.
	(insert_gimplified_predicates): Add check that non-predicated block
	may have statements to insert. Insert predicate of BB just after label
	if FLAG_FORCE_VECTORIZE is true.
	(tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
	is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
  2014-10-24 10:23                               ` Yuri Rumyantsev
@ 2014-11-07 14:08                                 ` Yuri Rumyantsev
  0 siblings, 0 replies; 18+ messages in thread
From: Yuri Rumyantsev @ 2014-11-07 14:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Did you have a chance to look at it?

Thanks.
Yuri.

2014-10-24 14:21 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Patch containing new core related to extended predication is attached.
> Here is few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries (I remember some performance issue
> when we designed lazy code motion algorithm in SPARC compiler).
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following:
>     if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal edge
> are already inserted above this block and
> core related to PHI predication can be inserted at the beginning of
> block. But this is not true for critical edges for which predicate
> computations are  in the block where code for phi predication must be
> inserted. So new function find_insertion_point is introduced which is
> simply found out the last statement in block defining predicates
> correspondent to all incoming edges and insert phi predication code
> after it (with some minor exceptions).
>
> If you need more comments or something unclear will let me know.
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
> FLAG_FORCE_VECTORIZE instead of loop flag.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> FLAG_FORCE_VECTORIZE is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (phi_has_two_different_args): New function.
> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
> to phi arguments. Invoke phi_has_two_different_args to get phi
> arguments if EXTENDED is true. Change check that block
> containing reduction statement candidate is predecessor
> of phi-block since phi may have more than two arguments.
> (convert_scalar_cond_reduction): Add argument BEFORE to insert
> statement before/after gsi point.
> (predicate_scalar_phi): Add argument false (which means non-extended
> predication) to call of is_cond_scalar_reduction. Add argument
> true (which correspondent to argument BEFORE) to call of
> convert_scalar_cond_reduction.
> (get_predicate_for_edge): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (find_insertion_point): New function.
> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
> BEFORE. Initialize EXTENDED to true if BB containing phi has more
> than 2 predecessors or both incoming edges are critical. Invoke
> find_phi_replacement_condition and predicate_scalar_phi or
> find_insertion_point and predicate_extended_scalar_phi depending on
> EXTENDED value.
> (insert_gimplified_predicates): Add check that non-predicated block
> may have statements to insert. Insert predicate of BB just after label
> if FLAG_FORCE_VECTORIZE is true.
> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
> is copy of inner or outer loop field force_vectorize.
>
>
> 2014-10-24 13:12 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> In my initial design I did such splitting but before start real
>>> if-conversion but I decided to not perform it since code size for
>>> if-converted loop is growing (number of phi nodes is increased). It is
>>> worth noting also that for phi with #nodes > 2 we need to get all
>>> predicates (except one) to do phi-predication and it means that block
>>> containing such phi can have only 1 critical edge.
>>
>> Can you point me to the patch with the special insertion code then?
>> I definitely want to avoid the mess we ran into with the reassoc
>> code "clever" insertion code.
>>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I saw the sources of these functions, but I can't understand why I
>>>>>> should use something else? Note that all predicate computations are
>>>>>> located in basic blocks ( by design of if-conv) and there is special
>>>>>> function that put these computations in bb
>>>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>>>> computations. New function - find_insertion_point() does very simple
>>>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>>>> predicates taken from all incoming edges.
>>>>>> In original algorithm the predicate of non-critical edge is taken to
>>>>>> perform phi-node predication since for critical edge it does not work
>>>>>> properly.
>>>>>>
>>>>>> My question is: does your comments mean that I should re-design my extensions?
>>>>>
>>>>> Well, we have infrastructure for inserting code on edges and you've
>>>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>>>> I realize this is very similar to my initial suggestion to simply split
>>>>> critical edges in loops you want to if-convert but delays splitting
>>>>> until it turns out to be necessary (which might be good for the
>>>>> !force_vect case).
>>>>>
>>>>> For edge predicates you simply can emit their computation on the
>>>>> edge, no?
>>>>>
>>>>> Btw, I very originally suggested to rework if-conversion to only
>>>>> record edge predicates - having both block and edge predicates
>>>>> somewhat complicates the code and makes it harder to
>>>>> maintain (thus also the suggestion to simply split critical edges
>>>>> if necessary to make BB predicates work always).
>>>>>
>>>>> Your patches add a lot of code and to me it seems we can avoid
>>>>> doing so much special casing.
>>>>
>>>> For example attacking the critical edge issue by a simple
>>>>
>>>> Index: tree-if-conv.c
>>>> ===================================================================
>>>> --- tree-if-conv.c      (revision 216508)
>>>> +++ tree-if-conv.c      (working copy)
>>>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>>>         if (EDGE_COUNT (e->src->succs) == 1)
>>>>           found = true;
>>>>        if (!found)
>>>> -       {
>>>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>>>> -           fprintf (dump_file, "only critical predecessors\n");
>>>> -         return false;
>>>> -       }
>>>> +       split_edge (EDGE_PRED (bb, 0));
>>>>      }
>>>>
>>>>    return true;
>>>>
>>>> it changes the number of blocks in the loop, so
>>>> get_loop_body_in_if_conv_order should probably be re-done with the
>>>> above eventually signalling that it created a new block.  Or the above
>>>> should populate a vector of edges to split and do that after the
>>>> loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> BTW Jeff did initial review of my changes related to predicate
>>>>>> computation for join blocks. I presented him updated patch with
>>>>>> test-case and some minor changes in patch. But still did not get any
>>>>>> feedback on it. Could you please take a look also on it?
>>>>>>
>>>>>>
>>>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>>>> with critical incoming edges only performs another function which is
>>>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>>>
>>>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>>>> only but you propose to use it for tree also.
>>>>>>>> Did I miss something?
>>>>>>>
>>>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>>>> if you want easy access to the newly created basic block to push
>>>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>>>> in the future (more precise in patch.4).
>>>>>>>>>
>>>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>>>
>>>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>>>
>>>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please review it.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>>>> after adding support for extended predication.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>>
>>>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>>>> did you mean by:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>>correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>>>> is ok.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>>>> +             return false;
>>>>>>>>>>>>>> +           }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-11-07 14:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-13 10:00 [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd Yuri Rumyantsev
2014-10-15 10:09 ` Richard Biener
2014-10-16 15:52   ` Yuri Rumyantsev
2014-10-17  9:11     ` Richard Biener
2014-10-17 14:15       ` Yuri Rumyantsev
2014-10-20  8:02         ` Richard Biener
2014-10-20 14:11           ` Yuri Rumyantsev
2014-10-21 12:29             ` Yuri Rumyantsev
2014-10-21 12:56               ` Richard Biener
2014-10-21 13:26                 ` Yuri Rumyantsev
2014-10-21 13:45                   ` Richard Biener
2014-10-21 14:01                     ` Yuri Rumyantsev
2014-10-21 14:11                       ` Richard Biener
2014-10-21 14:20                         ` Richard Biener
2014-10-21 14:36                           ` Yuri Rumyantsev
2014-10-24  9:14                             ` Richard Biener
2014-10-24 10:23                               ` Yuri Rumyantsev
2014-11-07 14:08                                 ` Yuri Rumyantsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).