* [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
@ 2016-05-13 16:02 Bin Cheng
2016-05-13 16:54 ` Richard Biener
0 siblings, 1 reply; 6+ messages in thread
From: Bin Cheng @ 2016-05-13 16:02 UTC (permalink / raw)
To: gcc-patches; +Cc: nd
[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]
Hi,
As PR69848 reported, GCC vectorizer now generates comparison outside of VEC_COND_EXPR for COND_REDUCTION case, as below:
_20 = vect__1.6_8 != { 0, 0, 0, 0 };
vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
_21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
This results in inefficient expanding. With IR like:
vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, 0, 0 }, vect_c_2.7_13>;
_21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
We can do:
1) Expanding time optimization, for example, reverting comparison operator by switching VEC_COND_EXPR operands. This is useful when backend only supports some comparison operators.
2) For backend not supporting vcond_mask patterns, saving one LT_EXPR instruction which introduced by expand_vec_cond_expr.
This patch fixes this by propagating comparison into VEC_COND_EXPR even if it's used multiple times. For now, GCC does single_use_only propagation. Ideally, we may duplicate the comparison before each use statement just before expanding, so that TER can successfully backtrack it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to do this. Tree-vect-generic.c looks like a good candidate, but it's so early that following CSE could undo the transform. Another possible fix is to generate comparison inside VEC_COND_EXPR directly in function vectorizable_reduction.
As for possible comparison CSE opportunities, I checked that it's simple enough to be handled by RTL CSE.
Bootstrap and test on x86_64 and AArch64. Any comments?
Thanks,
bin
2016-05-12 Bin Cheng <bin.cheng@arm.com>
PR tree-optimization/69848
* optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
(expand_vec_cmp_expr_p): Call above functions.
* optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
* tree-ssa-forwprop.c (optabs-tree.h): Include header file.
(forward_propagate_into_cond): Propgate multiple uses for
VEC_COND_EXPR.
[-- Attachment #2: pr69848-1-20160512.txt --]
[-- Type: text/plain, Size: 3734 bytes --]
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index faac087..0ccdbdb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -314,25 +314,50 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type)
}
/* Return TRUE iff, appropriate vector insns are available
- for vector cond expr with vector type VALUE_TYPE and a comparison
+ for VCOND_MASK pattern with vector type VALUE_TYPE and a comparison
with operand vector types in CMP_OP_TYPE. */
bool
-expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+expand_vcond_mask_p (tree value_type, tree cmp_op_type)
{
- machine_mode value_mode = TYPE_MODE (value_type);
- machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
&& get_vcond_mask_icode (TYPE_MODE (value_type),
TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
return true;
- if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
- || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
- || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
- TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing)
- return false;
- return true;
+ return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+ for VCOND pattern with vector type VALUE_TYPE and a comparison
+ with operand vector types in CMP_OP_TYPE. */
+
+bool
+expand_vcond_p (tree value_type, tree cmp_op_type)
+{
+ machine_mode value_mode = TYPE_MODE (value_type);
+ machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
+ if (GET_MODE_SIZE (value_mode) == GET_MODE_SIZE (cmp_op_mode)
+ && GET_MODE_NUNITS (value_mode) == GET_MODE_NUNITS (cmp_op_mode)
+ && get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+ TYPE_UNSIGNED (cmp_op_type)) != CODE_FOR_nothing)
+ return true;
+
+ return false;
+}
+
+/* Return TRUE iff, appropriate vector insns are available
+ for vector cond expr with vector type VALUE_TYPE and a comparison
+ with operand vector types in CMP_OP_TYPE. */
+
+bool
+expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+{
+ if (expand_vcond_mask_p (value_type, cmp_op_type)
+ || expand_vcond_p (value_type, cmp_op_type))
+ return true;
+
+ return false;
}
/* Use the current target and options to initialize
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3b9280..feab40f 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -39,6 +39,8 @@ optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
bool supportable_convert_operation (enum tree_code, tree, tree, tree *,
enum tree_code *);
bool expand_vec_cmp_expr_p (tree, tree);
+bool expand_vcond_mask_p (tree, tree);
+bool expand_vcond_p (tree, tree);
bool expand_vec_cond_expr_p (tree, tree);
void init_tree_optimization_optabs (tree);
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index c40f9e2..40f023f 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-pass.h"
#include "ssa.h"
#include "expmed.h"
+#include "optabs-tree.h"
#include "optabs-query.h"
#include "gimple-pretty-print.h"
#include "fold-const.h"
@@ -585,7 +586,10 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
{
enum tree_code def_code;
tree name = cond;
- gimple *def_stmt = get_prop_source_stmt (name, true, NULL);
+ bool single_use_only = (code != VEC_COND_EXPR
+ || !expand_vcond_p (gimple_expr_type (stmt),
+ TREE_TYPE (cond)));
+ gimple *def_stmt = get_prop_source_stmt (name, single_use_only, NULL);
if (!def_stmt || !can_propagate_from (def_stmt))
return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
2016-05-13 16:02 [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports Bin Cheng
@ 2016-05-13 16:54 ` Richard Biener
2016-05-16 8:09 ` Bin.Cheng
0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2016-05-13 16:54 UTC (permalink / raw)
To: Bin Cheng, gcc-patches; +Cc: nd
On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>Hi,
>As PR69848 reported, GCC vectorizer now generates comparison outside of
>VEC_COND_EXPR for COND_REDUCTION case, as below:
>
> _20 = vect__1.6_8 != { 0, 0, 0, 0 };
> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>
>This results in inefficient expanding. With IR like:
>
>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>0, 0 }, vect_c_2.7_13>;
> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>
>We can do:
>1) Expanding time optimization, for example, reverting comparison
>operator by switching VEC_COND_EXPR operands. This is useful when
>backend only supports some comparison operators.
>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>instruction which introduced by expand_vec_cond_expr.
>
>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>if it's used multiple times. For now, GCC does single_use_only
>propagation. Ideally, we may duplicate the comparison before each use
>statement just before expanding, so that TER can successfully backtrack
>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to
>do this. Tree-vect-generic.c looks like a good candidate, but it's so
>early that following CSE could undo the transform. Another possible
>fix is to generate comparison inside VEC_COND_EXPR directly in function
>vectorizable_reduction.
I prefer this for now.
Richard.
>As for possible comparison CSE opportunities, I checked that it's
>simple enough to be handled by RTL CSE.
>
>Bootstrap and test on x86_64 and AArch64. Any comments?
>
>Thanks,
>bin
>
>2016-05-12 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/69848
> * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
> (expand_vec_cmp_expr_p): Call above functions.
> * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
> * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
> (forward_propagate_into_cond): Propgate multiple uses for
> VEC_COND_EXPR.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
2016-05-13 16:54 ` Richard Biener
@ 2016-05-16 8:09 ` Bin.Cheng
2016-05-17 11:08 ` Richard Biener
0 siblings, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2016-05-16 8:09 UTC (permalink / raw)
To: Richard Biener; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer
[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]
On Fri, May 13, 2016 at 5:53 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>Hi,
>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>
>> _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>
>>This results in inefficient expanding. With IR like:
>>
>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>0, 0 }, vect_c_2.7_13>;
>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>
>>We can do:
>>1) Expanding time optimization, for example, reverting comparison
>>operator by switching VEC_COND_EXPR operands. This is useful when
>>backend only supports some comparison operators.
>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>instruction which introduced by expand_vec_cond_expr.
>>
>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>if it's used multiple times. For now, GCC does single_use_only
>>propagation. Ideally, we may duplicate the comparison before each use
>>statement just before expanding, so that TER can successfully backtrack
>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to
>>do this. Tree-vect-generic.c looks like a good candidate, but it's so
>>early that following CSE could undo the transform. Another possible
>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>vectorizable_reduction.
>
> I prefer this for now.
Hi Richard, you mean this patch, or the possible fix before your comment?
Here is an updated patch addressing comment issue pointed out by
Bernhard Reutner-Fischer. Thanks.
Thanks,
bin
>
> Richard.
>
>>As for possible comparison CSE opportunities, I checked that it's
>>simple enough to be handled by RTL CSE.
>>
>>Bootstrap and test on x86_64 and AArch64. Any comments?
>>
>>Thanks,
>>bin
>>
>>2016-05-12 Bin Cheng <bin.cheng@arm.com>
>>
>> PR tree-optimization/69848
>> * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>> (expand_vec_cmp_expr_p): Call above functions.
>> * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>> * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>> (forward_propagate_into_cond): Propgate multiple uses for
>> VEC_COND_EXPR.
>
>
[-- Attachment #2: pr69848-1-20160515.txt --]
[-- Type: text/plain, Size: 3830 bytes --]
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index faac087..13538e5 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -313,26 +313,51 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type)
return (icode != CODE_FOR_nothing);
}
-/* Return TRUE iff, appropriate vector insns are available
- for vector cond expr with vector type VALUE_TYPE and a comparison
+/* Return TRUE iff appropriate vector insns are available
+ for VCOND_MASK pattern with vector type VALUE_TYPE and a comparison
with operand vector types in CMP_OP_TYPE. */
bool
-expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+expand_vcond_mask_p (tree value_type, tree cmp_op_type)
{
- machine_mode value_mode = TYPE_MODE (value_type);
- machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
&& get_vcond_mask_icode (TYPE_MODE (value_type),
TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
return true;
- if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
- || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
- || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
- TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing)
- return false;
- return true;
+ return false;
+}
+
+/* Return TRUE iff appropriate vector insns are available
+ for VCOND pattern with vector type VALUE_TYPE and a comparison
+ with operand vector types in CMP_OP_TYPE. */
+
+bool
+expand_vcond_p (tree value_type, tree cmp_op_type)
+{
+ machine_mode value_mode = TYPE_MODE (value_type);
+ machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
+ if (GET_MODE_SIZE (value_mode) == GET_MODE_SIZE (cmp_op_mode)
+ && GET_MODE_NUNITS (value_mode) == GET_MODE_NUNITS (cmp_op_mode)
+ && get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
+ TYPE_UNSIGNED (cmp_op_type)) != CODE_FOR_nothing)
+ return true;
+
+ return false;
+}
+
+/* Return TRUE iff appropriate vector insns are available
+ for vector cond expr with vector type VALUE_TYPE and a comparison
+ with operand vector types in CMP_OP_TYPE. */
+
+bool
+expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
+{
+ if (expand_vcond_mask_p (value_type, cmp_op_type)
+ || expand_vcond_p (value_type, cmp_op_type))
+ return true;
+
+ return false;
}
/* Use the current target and options to initialize
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3b9280..feab40f 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -39,6 +39,8 @@ optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
bool supportable_convert_operation (enum tree_code, tree, tree, tree *,
enum tree_code *);
bool expand_vec_cmp_expr_p (tree, tree);
+bool expand_vcond_mask_p (tree, tree);
+bool expand_vcond_p (tree, tree);
bool expand_vec_cond_expr_p (tree, tree);
void init_tree_optimization_optabs (tree);
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index c40f9e2..40f023f 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-pass.h"
#include "ssa.h"
#include "expmed.h"
+#include "optabs-tree.h"
#include "optabs-query.h"
#include "gimple-pretty-print.h"
#include "fold-const.h"
@@ -585,7 +586,10 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
{
enum tree_code def_code;
tree name = cond;
- gimple *def_stmt = get_prop_source_stmt (name, true, NULL);
+ bool single_use_only = (code != VEC_COND_EXPR
+ || !expand_vcond_p (gimple_expr_type (stmt),
+ TREE_TYPE (cond)));
+ gimple *def_stmt = get_prop_source_stmt (name, single_use_only, NULL);
if (!def_stmt || !can_propagate_from (def_stmt))
return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
2016-05-16 8:09 ` Bin.Cheng
@ 2016-05-17 11:08 ` Richard Biener
2016-05-18 15:06 ` Bin.Cheng
0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2016-05-17 11:08 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer
On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>Hi,
>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>
>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>
>>>This results in inefficient expanding. With IR like:
>>>
>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>0, 0 }, vect_c_2.7_13>;
>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>
>>>We can do:
>>>1) Expanding time optimization, for example, reverting comparison
>>>operator by switching VEC_COND_EXPR operands. This is useful when
>>>backend only supports some comparison operators.
>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>instruction which introduced by expand_vec_cond_expr.
>>>
>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>if it's used multiple times. For now, GCC does single_use_only
>>>propagation. Ideally, we may duplicate the comparison before each use
>>>statement just before expanding, so that TER can successfully backtrack
>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to
>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so
>>>early that following CSE could undo the transform. Another possible
>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>vectorizable_reduction.
>>
>> I prefer this for now.
> Hi Richard, you mean this patch, or the possible fix before your comment?
The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
with embedded comparison.
Thanks,
Richard.
> Here is an updated patch addressing comment issue pointed out by
> Bernhard Reutner-Fischer. Thanks.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>>As for possible comparison CSE opportunities, I checked that it's
>>>simple enough to be handled by RTL CSE.
>>>
>>>Bootstrap and test on x86_64 and AArch64. Any comments?
>>>
>>>Thanks,
>>>bin
>>>
>>>2016-05-12 Bin Cheng <bin.cheng@arm.com>
>>>
>>> PR tree-optimization/69848
>>> * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>>> (expand_vec_cmp_expr_p): Call above functions.
>>> * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>>> * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>>> (forward_propagate_into_cond): Propgate multiple uses for
>>> VEC_COND_EXPR.
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
2016-05-17 11:08 ` Richard Biener
@ 2016-05-18 15:06 ` Bin.Cheng
2016-05-19 7:54 ` Richard Biener
0 siblings, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2016-05-18 15:06 UTC (permalink / raw)
To: Richard Biener; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer
[-- Attachment #1: Type: text/plain, Size: 2307 bytes --]
On Tue, May 17, 2016 at 12:08 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>Hi,
>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>>
>>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>>
>>>>This results in inefficient expanding. With IR like:
>>>>
>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>>0, 0 }, vect_c_2.7_13>;
>>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>>
>>>>We can do:
>>>>1) Expanding time optimization, for example, reverting comparison
>>>>operator by switching VEC_COND_EXPR operands. This is useful when
>>>>backend only supports some comparison operators.
>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>>instruction which introduced by expand_vec_cond_expr.
>>>>
>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>>if it's used multiple times. For now, GCC does single_use_only
>>>>propagation. Ideally, we may duplicate the comparison before each use
>>>>statement just before expanding, so that TER can successfully backtrack
>>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to
>>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so
>>>>early that following CSE could undo the transform. Another possible
>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>>vectorizable_reduction.
>>>
>>> I prefer this for now.
>> Hi Richard, you mean this patch, or the possible fix before your comment?
>
> The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
> with embedded comparison.
Hi,
Here is updated patch doing that. It's definitely clearer than the
original version.
Bootstrap and test on x86_64. Also checked the expanding time
optimization still happens. Is it OK?
Thanks,
bin
>
> Thanks,
> Richard.
>
[-- Attachment #2: pr69848-1-20160517.txt --]
[-- Type: text/plain, Size: 1544 bytes --]
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d673c67..67053af 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6159,21 +6159,14 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
Finally, we update the phi (NEW_PHI_TREE) to take the value of
the new cond_expr (INDEX_COND_EXPR). */
- /* Turn the condition from vec_stmt into an ssa name. */
- gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt);
- tree ccompare = gimple_assign_rhs1 (*vec_stmt);
- tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
- gimple *ccompare_stmt = gimple_build_assign (ccompare_name,
- ccompare);
- gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
- gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
- update_stmt (*vec_stmt);
+ /* Duplicate the condition from vec_stmt. */
+ tree ccompare = unshare_expr (gimple_assign_rhs1 (*vec_stmt));
/* Create a conditional, where the condition is taken from vec_stmt
- (CCOMPARE_NAME), then is the induction index (INDEX_BEFORE_INCR)
- and else is the phi (NEW_PHI_TREE). */
+ (CCOMPARE), then is the induction index (INDEX_BEFORE_INCR) and
+ else is the phi (NEW_PHI_TREE). */
tree index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
- ccompare_name, indx_before_incr,
+ ccompare, indx_before_incr,
new_phi_tree);
cond_name = make_ssa_name (cr_index_vector_type);
gimple *index_condition = gimple_build_assign (cond_name,
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports
2016-05-18 15:06 ` Bin.Cheng
@ 2016-05-19 7:54 ` Richard Biener
0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2016-05-19 7:54 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer
On Wed, May 18, 2016 at 5:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, May 17, 2016 at 12:08 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>>Hi,
>>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>>>
>>>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>>>
>>>>>This results in inefficient expanding. With IR like:
>>>>>
>>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>>>0, 0 }, vect_c_2.7_13>;
>>>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>>>
>>>>>We can do:
>>>>>1) Expanding time optimization, for example, reverting comparison
>>>>>operator by switching VEC_COND_EXPR operands. This is useful when
>>>>>backend only supports some comparison operators.
>>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>>>instruction which introduced by expand_vec_cond_expr.
>>>>>
>>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>>>if it's used multiple times. For now, GCC does single_use_only
>>>>>propagation. Ideally, we may duplicate the comparison before each use
>>>>>statement just before expanding, so that TER can successfully backtrack
>>>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to
>>>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so
>>>>>early that following CSE could undo the transform. Another possible
>>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>>>vectorizable_reduction.
>>>>
>>>> I prefer this for now.
>>> Hi Richard, you mean this patch, or the possible fix before your comment?
>>
>> The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
>> with embedded comparison.
> Hi,
> Here is updated patch doing that. It's definitely clearer than the
> original version.
> Bootstrap and test on x86_64. Also checked the expanding time
> optimization still happens. Is it OK?
Yes.
Thanks,
Richard.
> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-19 7:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13 16:02 [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports Bin Cheng
2016-05-13 16:54 ` Richard Biener
2016-05-16 8:09 ` Bin.Cheng
2016-05-17 11:08 ` Richard Biener
2016-05-18 15:06 ` Bin.Cheng
2016-05-19 7:54 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).