public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
@ 2016-05-13 16:02 Bin Cheng
  2016-05-13 16:54 ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Bin Cheng @ 2016-05-13 16:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

Hi,
As PR69848 reported, GCC vectorizer now generates comparison outside of VEC_COND_EXPR for COND_REDUCTION case, as below:

  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;

This results in inefficient expanding.  With IR like:

  vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, 0, 0 }, vect_c_2.7_13>;
  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;

We can do:
1) Expanding time optimization, for example, reverting comparison operator by switching VEC_COND_EXPR operands.  This is useful when backend only supports some comparison operators.
2) For backend not supporting vcond_mask patterns, saving one LT_EXPR instruction which introduced by expand_vec_cond_expr.

This patch fixes this by propagating comparison into VEC_COND_EXPR even if it's used multiple times.  For now, GCC does single_use_only propagation.  Ideally, we may duplicate the comparison before each use statement just before expanding, so that TER can successfully backtrack it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to do this.  Tree-vect-generic.c looks like a good candidate, but it's so early that following CSE could undo the transform.  Another possible fix is to generate comparison inside VEC_COND_EXPR directly in function vectorizable_reduction.

As for possible comparison CSE opportunities, I checked that it's simple enough to be handled by RTL CSE.

Bootstrap and test on x86_64 and AArch64.  Any comments?

Thanks,
bin

2016-05-12  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/69848
	* optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
	(expand_vec_cmp_expr_p): Call above functions.
	* optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
	* tree-ssa-forwprop.c (optabs-tree.h): Include header file.
	(forward_propagate_into_cond): Propgate multiple uses for
	VEC_COND_EXPR.

[-- Attachment #2: pr69848-1-20160512.txt --]
[-- Type: text/plain, Size: 3734 bytes --]

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index faac087..0ccdbdb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -314,25 +314,50 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type)
 }
 
 /* Return TRUE iff, appropriate vector insns are available
-   for vector cond expr with vector type VALUE_TYPE and a comparison
+   for VCOND_MASK pattern with vector type VALUE_TYPE and a comparison
    with operand vector types in CMP_OP_TYPE.  */
 
 bool
-expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+expand_vcond_mask_p (tree value_type, tree cmp_op_type)
 {
-  machine_mode value_mode = TYPE_MODE (value_type);
-  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
   if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
       && get_vcond_mask_icode (TYPE_MODE (value_type),
 			       TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
     return true;
 
-  if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
-      || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
-      || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
-			  TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing)
-    return false;
-  return true;
+  return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+   for VCOND pattern with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vcond_p (tree value_type, tree cmp_op_type)
+{
+  machine_mode value_mode = TYPE_MODE (value_type);
+  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
+  if (GET_MODE_SIZE (value_mode) == GET_MODE_SIZE (cmp_op_mode)
+      && GET_MODE_NUNITS (value_mode) == GET_MODE_NUNITS (cmp_op_mode)
+      && get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+			  TYPE_UNSIGNED (cmp_op_type)) != CODE_FOR_nothing)
+    return true;
+
+  return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+   for vector cond expr with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+{
+  if (expand_vcond_mask_p (value_type, cmp_op_type)
+      || expand_vcond_p (value_type, cmp_op_type))
+    return true;
+
+  return false;
 }
 
 /* Use the current target and options to initialize
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3b9280..feab40f 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -39,6 +39,8 @@ optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
 bool supportable_convert_operation (enum tree_code, tree, tree, tree *,
 				    enum tree_code *);
 bool expand_vec_cmp_expr_p (tree, tree);
+bool expand_vcond_mask_p (tree, tree);
+bool expand_vcond_p (tree, tree);
 bool expand_vec_cond_expr_p (tree, tree);
 void init_tree_optimization_optabs (tree);
 
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index c40f9e2..40f023f 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "optabs-tree.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "fold-const.h"
@@ -585,7 +586,10 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
     {
       enum tree_code def_code;
       tree name = cond;
-      gimple *def_stmt = get_prop_source_stmt (name, true, NULL);
+      bool single_use_only = (code != VEC_COND_EXPR
+			      || !expand_vcond_p (gimple_expr_type (stmt),
+						  TREE_TYPE (cond)));
+      gimple *def_stmt = get_prop_source_stmt (name, single_use_only, NULL);
       if (!def_stmt || !can_propagate_from (def_stmt))
 	return 0;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
  2016-05-13 16:02 [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports Bin Cheng
@ 2016-05-13 16:54 ` Richard Biener
  2016-05-16  8:09   ` Bin.Cheng
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2016-05-13 16:54 UTC (permalink / raw)
  To: Bin Cheng, gcc-patches; +Cc: nd

On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>Hi,
>As PR69848 reported, GCC vectorizer now generates comparison outside of
>VEC_COND_EXPR for COND_REDUCTION case, as below:
>
>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>
>This results in inefficient expanding.  With IR like:
>
>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>0, 0 }, vect_c_2.7_13>;
>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>
>We can do:
>1) Expanding time optimization, for example, reverting comparison
>operator by switching VEC_COND_EXPR operands.  This is useful when
>backend only supports some comparison operators.
>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>instruction which introduced by expand_vec_cond_expr.
>
>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>if it's used multiple times.  For now, GCC does single_use_only
>propagation.  Ideally, we may duplicate the comparison before each use
>statement just before expanding, so that TER can successfully backtrack
>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>early that following CSE could undo the transform.  Another possible
>fix is to generate comparison inside VEC_COND_EXPR directly in function
>vectorizable_reduction.

I prefer this for now.

Richard.

>As for possible comparison CSE opportunities, I checked that it's
>simple enough to be handled by RTL CSE.
>
>Bootstrap and test on x86_64 and AArch64.  Any comments?
>
>Thanks,
>bin
>
>2016-05-12  Bin Cheng  <bin.cheng@arm.com>
>
>	PR tree-optimization/69848
>	* optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>	(expand_vec_cmp_expr_p): Call above functions.
>	* optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>	* tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>	(forward_propagate_into_cond): Propgate multiple uses for
>	VEC_COND_EXPR.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
  2016-05-13 16:54 ` Richard Biener
@ 2016-05-16  8:09   ` Bin.Cheng
  2016-05-17 11:08     ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2016-05-16  8:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Fri, May 13, 2016 at 5:53 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>Hi,
>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>
>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>
>>This results in inefficient expanding.  With IR like:
>>
>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>0, 0 }, vect_c_2.7_13>;
>>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>
>>We can do:
>>1) Expanding time optimization, for example, reverting comparison
>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>backend only supports some comparison operators.
>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>instruction which introduced by expand_vec_cond_expr.
>>
>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>if it's used multiple times.  For now, GCC does single_use_only
>>propagation.  Ideally, we may duplicate the comparison before each use
>>statement just before expanding, so that TER can successfully backtrack
>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>early that following CSE could undo the transform.  Another possible
>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>vectorizable_reduction.
>
> I prefer this for now.
Hi Richard, you mean this patch, or the possible fix before your comment?
Here is an updated patch addressing comment issue pointed out by
Bernhard Reutner-Fischer.  Thanks.

Thanks,
bin
>
> Richard.
>
>>As for possible comparison CSE opportunities, I checked that it's
>>simple enough to be handled by RTL CSE.
>>
>>Bootstrap and test on x86_64 and AArch64.  Any comments?
>>
>>Thanks,
>>bin
>>
>>2016-05-12  Bin Cheng  <bin.cheng@arm.com>
>>
>>       PR tree-optimization/69848
>>       * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>>       (expand_vec_cmp_expr_p): Call above functions.
>>       * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>>       * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>>       (forward_propagate_into_cond): Propgate multiple uses for
>>       VEC_COND_EXPR.
>
>

[-- Attachment #2: pr69848-1-20160515.txt --]
[-- Type: text/plain, Size: 3830 bytes --]

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index faac087..13538e5 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -313,26 +313,51 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type)
   return (icode != CODE_FOR_nothing);
 }
 
-/* Return TRUE iff, appropriate vector insns are available
-   for vector cond expr with vector type VALUE_TYPE and a comparison
+/* Return TRUE iff appropriate vector insns are available
+   for VCOND_MASK pattern with vector type VALUE_TYPE and a comparison
    with operand vector types in CMP_OP_TYPE.  */
 
 bool
-expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+expand_vcond_mask_p (tree value_type, tree cmp_op_type)
 {
-  machine_mode value_mode = TYPE_MODE (value_type);
-  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
   if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
       && get_vcond_mask_icode (TYPE_MODE (value_type),
 			       TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
     return true;
 
-  if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
-      || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
-      || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
-			  TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing)
-    return false;
-  return true;
+  return false;
+}
+
+/* Return TRUE iff appropriate vector insns are available
+   for VCOND pattern with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vcond_p (tree value_type, tree cmp_op_type)
+{
+  machine_mode value_mode = TYPE_MODE (value_type);
+  machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
+  if (GET_MODE_SIZE (value_mode) == GET_MODE_SIZE (cmp_op_mode)
+      && GET_MODE_NUNITS (value_mode) == GET_MODE_NUNITS (cmp_op_mode)
+      && get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+			  TYPE_UNSIGNED (cmp_op_type)) != CODE_FOR_nothing)
+    return true;
+
+  return false;
+}
+
+/* Return TRUE iff appropriate vector insns are available
+   for vector cond expr with vector type VALUE_TYPE and a comparison
+   with operand vector types in CMP_OP_TYPE.  */
+
+bool
+expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+{
+  if (expand_vcond_mask_p (value_type, cmp_op_type)
+      || expand_vcond_p (value_type, cmp_op_type))
+    return true;
+
+  return false;
 }
 
 /* Use the current target and options to initialize
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3b9280..feab40f 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -39,6 +39,8 @@ optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
 bool supportable_convert_operation (enum tree_code, tree, tree, tree *,
 				    enum tree_code *);
 bool expand_vec_cmp_expr_p (tree, tree);
+bool expand_vcond_mask_p (tree, tree);
+bool expand_vcond_p (tree, tree);
 bool expand_vec_cond_expr_p (tree, tree);
 void init_tree_optimization_optabs (tree);
 
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index c40f9e2..40f023f 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "optabs-tree.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "fold-const.h"
@@ -585,7 +586,10 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
     {
       enum tree_code def_code;
       tree name = cond;
-      gimple *def_stmt = get_prop_source_stmt (name, true, NULL);
+      bool single_use_only = (code != VEC_COND_EXPR
+			      || !expand_vcond_p (gimple_expr_type (stmt),
+						  TREE_TYPE (cond)));
+      gimple *def_stmt = get_prop_source_stmt (name, single_use_only, NULL);
       if (!def_stmt || !can_propagate_from (def_stmt))
 	return 0;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
  2016-05-16  8:09   ` Bin.Cheng
@ 2016-05-17 11:08     ` Richard Biener
  2016-05-18 15:06       ` Bin.Cheng
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2016-05-17 11:08 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer

On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>Hi,
>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>
>>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>
>>>This results in inefficient expanding.  With IR like:
>>>
>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>
>>>We can do:
>>>1) Expanding time optimization, for example, reverting comparison
>>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>>backend only supports some comparison operators.
>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>instruction which introduced by expand_vec_cond_expr.
>>>
>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>if it's used multiple times.  For now, GCC does single_use_only
>>>propagation.  Ideally, we may duplicate the comparison before each use
>>>statement just before expanding, so that TER can successfully backtrack
>>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>>early that following CSE could undo the transform.  Another possible
>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>vectorizable_reduction.
>>
>> I prefer this for now.
> Hi Richard, you mean this patch, or the possible fix before your comment?

The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
with embedded comparison.

Thanks,
Richard.

> Here is an updated patch addressing comment issue pointed out by
> Bernhard Reutner-Fischer.  Thanks.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>>As for possible comparison CSE opportunities, I checked that it's
>>>simple enough to be handled by RTL CSE.
>>>
>>>Bootstrap and test on x86_64 and AArch64.  Any comments?
>>>
>>>Thanks,
>>>bin
>>>
>>>2016-05-12  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       PR tree-optimization/69848
>>>       * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>>>       (expand_vec_cmp_expr_p): Call above functions.
>>>       * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>>>       * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>>>       (forward_propagate_into_cond): Propgate multiple uses for
>>>       VEC_COND_EXPR.
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
  2016-05-17 11:08     ` Richard Biener
@ 2016-05-18 15:06       ` Bin.Cheng
  2016-05-19  7:54         ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2016-05-18 15:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer

[-- Attachment #1: Type: text/plain, Size: 2307 bytes --]

On Tue, May 17, 2016 at 12:08 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>Hi,
>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>>
>>>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>>
>>>>This results in inefficient expanding.  With IR like:
>>>>
>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>>0, 0 }, vect_c_2.7_13>;
>>>>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>>
>>>>We can do:
>>>>1) Expanding time optimization, for example, reverting comparison
>>>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>>>backend only supports some comparison operators.
>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>>instruction which introduced by expand_vec_cond_expr.
>>>>
>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>>if it's used multiple times.  For now, GCC does single_use_only
>>>>propagation.  Ideally, we may duplicate the comparison before each use
>>>>statement just before expanding, so that TER can successfully backtrack
>>>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>>>early that following CSE could undo the transform.  Another possible
>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>>vectorizable_reduction.
>>>
>>> I prefer this for now.
>> Hi Richard, you mean this patch, or the possible fix before your comment?
>
> The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
> with embedded comparison.
Hi,
Here is updated patch doing that.  It's definitely clearer than the
original version.
Bootstrap and test on x86_64.  Also checked the expanding time
optimization still happens.  Is it OK?

Thanks,
bin
>
> Thanks,
> Richard.
>

[-- Attachment #2: pr69848-1-20160517.txt --]
[-- Type: text/plain, Size: 1544 bytes --]

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d673c67..67053af 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6159,21 +6159,14 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
 	     Finally, we update the phi (NEW_PHI_TREE) to take the value of
 	     the new cond_expr (INDEX_COND_EXPR).  */
 
-	  /* Turn the condition from vec_stmt into an ssa name.  */
-	  gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt);
-	  tree ccompare = gimple_assign_rhs1 (*vec_stmt);
-	  tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
-	  gimple *ccompare_stmt = gimple_build_assign (ccompare_name,
-						       ccompare);
-	  gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
-	  gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
-	  update_stmt (*vec_stmt);
+	  /* Duplicate the condition from vec_stmt.  */
+	  tree ccompare = unshare_expr (gimple_assign_rhs1 (*vec_stmt));
 
 	  /* Create a conditional, where the condition is taken from vec_stmt
-	     (CCOMPARE_NAME), then is the induction index (INDEX_BEFORE_INCR)
-	     and else is the phi (NEW_PHI_TREE).  */
+	     (CCOMPARE), then is the induction index (INDEX_BEFORE_INCR) and
+	     else is the phi (NEW_PHI_TREE).  */
 	  tree index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
-					 ccompare_name, indx_before_incr,
+					 ccompare, indx_before_incr,
 					 new_phi_tree);
 	  cond_name = make_ssa_name (cr_index_vector_type);
 	  gimple *index_condition = gimple_build_assign (cond_name,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
  2016-05-18 15:06       ` Bin.Cheng
@ 2016-05-19  7:54         ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2016-05-19  7:54 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer

On Wed, May 18, 2016 at 5:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, May 17, 2016 at 12:08 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>>Hi,
>>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>>>
>>>>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>>>
>>>>>This results in inefficient expanding.  With IR like:
>>>>>
>>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>>>0, 0 }, vect_c_2.7_13>;
>>>>>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>>>
>>>>>We can do:
>>>>>1) Expanding time optimization, for example, reverting comparison
>>>>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>>>>backend only supports some comparison operators.
>>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>>>instruction which introduced by expand_vec_cond_expr.
>>>>>
>>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>>>if it's used multiple times.  For now, GCC does single_use_only
>>>>>propagation.  Ideally, we may duplicate the comparison before each use
>>>>>statement just before expanding, so that TER can successfully backtrack
>>>>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>>>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>>>>early that following CSE could undo the transform.  Another possible
>>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>>>vectorizable_reduction.
>>>>
>>>> I prefer this for now.
>>> Hi Richard, you mean this patch, or the possible fix before your comment?
>>
>> The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
>> with embedded comparison.
> Hi,
> Here is updated patch doing that.  It's definitely clearer than the
> original version.
> Bootstrap and test on x86_64.  Also checked the expanding time
> optimization still happens.  Is it OK?

Yes.

Thanks,
Richard.

> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-19  7:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13 16:02 [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports Bin Cheng
2016-05-13 16:54 ` Richard Biener
2016-05-16  8:09   ` Bin.Cheng
2016-05-17 11:08     ` Richard Biener
2016-05-18 15:06       ` Bin.Cheng
2016-05-19  7:54         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).