* Help with PR97872
@ 2020-12-01 10:59 Prathamesh Kulkarni
2020-12-01 11:09 ` Richard Biener
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-01 10:59 UTC (permalink / raw)
To: GCC Development, Richard Biener
Hi,
For the test mentioned in PR, I was trying to see if we could do
specialized expansion for vcond in target when operands are -1 and 0.
arm_expand_vcond gets the following operands:
(reg:V8QI 113 [ _2 ])
(reg:V8QI 117)
(reg:V8QI 118)
(lt (reg/v:V8QI 115 [ a ])
(reg/v:V8QI 116 [ b ]))
(reg/v:V8QI 115 [ a ])
(reg/v:V8QI 116 [ b ])
where r117 and r118 are set to vector constants -1 and 0 respectively.
However, I am not sure if there's a way to check if the register is
constant during expansion time (since we don't have df analysis yet) ?
Alternatively, should we add a target hook that returns true if the
result of vector comparison is set to all-ones or all-zeros, and then
use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
Thanks,
Prathamesh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-01 10:59 Help with PR97872 Prathamesh Kulkarni
@ 2020-12-01 11:09 ` Richard Biener
2020-12-03 9:06 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-01 11:09 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: GCC Development
On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> Hi,
> For the test mentioned in PR, I was trying to see if we could do
> specialized expansion for vcond in target when operands are -1 and 0.
> arm_expand_vcond gets the following operands:
> (reg:V8QI 113 [ _2 ])
> (reg:V8QI 117)
> (reg:V8QI 118)
> (lt (reg/v:V8QI 115 [ a ])
> (reg/v:V8QI 116 [ b ]))
> (reg/v:V8QI 115 [ a ])
> (reg/v:V8QI 116 [ b ])
>
> where r117 and r118 are set to vector constants -1 and 0 respectively.
> However, I am not sure if there's a way to check if the register is
> constant during expansion time (since we don't have df analysis yet) ?
>
> Alternatively, should we add a target hook that returns true if the
> result of vector comparison is set to all-ones or all-zeros, and then
> use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
Would everything match-up for a .VEC_CMP IFN producing a non-mask
vector type? ISEL could special case the a ? -1 : 0 case this way.
> Thanks,
> Prathamesh
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-01 11:09 ` Richard Biener
@ 2020-12-03 9:06 ` Prathamesh Kulkarni
2020-12-03 11:05 ` Richard Biener
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-03 9:06 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Development
[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]
On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
>
> > Hi,
> > For the test mentioned in PR, I was trying to see if we could do
> > specialized expansion for vcond in target when operands are -1 and 0.
> > arm_expand_vcond gets the following operands:
> > (reg:V8QI 113 [ _2 ])
> > (reg:V8QI 117)
> > (reg:V8QI 118)
> > (lt (reg/v:V8QI 115 [ a ])
> > (reg/v:V8QI 116 [ b ]))
> > (reg/v:V8QI 115 [ a ])
> > (reg/v:V8QI 116 [ b ])
> >
> > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > However, I am not sure if there's a way to check if the register is
> > constant during expansion time (since we don't have df analysis yet) ?
> >
> > Alternatively, should we add a target hook that returns true if the
> > result of vector comparison is set to all-ones or all-zeros, and then
> > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
>
> Would everything match-up for a .VEC_CMP IFN producing a non-mask
> vector type? ISEL could special case the a ? -1 : 0 case this way.
I think the vec_cmp pattern matches but it produces a masked vector type.
In the attached patch, I simply replaced:
_1 = a < b
x = _1 ? -1 : 0
with
x = view_convert_expr<_1>
For the test-case, isel generates:
vector(8) <signed-boolean:8> _1;
vector(8) signed char _2;
uint8x8_t _5;
<bb 2> [local count: 1073741824]:
_1 = a_3(D) < b_4(D);
_2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
_5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
return _5;
and results in desired code-gen:
f1:
vcgt.s8 d0, d1, d0
bx lr
Altho I guess, we should remove the redundant conversions during isel itself ?
and result in:
_1 = a_3(D) < b_4(D)
_5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
(Patch is lightly tested with only vect.exp)
Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[-- Attachment #2: pr97282-1.diff --]
[-- Type: application/octet-stream, Size: 1761 bytes --]
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 048b407bd11..56cb9dfc310 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -135,6 +135,26 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
lhs = gimple_assign_lhs (stmt);
machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+ /* For targets where result of comparison is all-ones or all-zeros,
+ a < b ? -1 : 0 can be reduced to a < b. */
+
+ if (integer_minus_onep (op1) && integer_zerop (op2))
+ {
+ gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+ tree op0a = gimple_assign_rhs1 (def_stmt);
+ tree op0_type = TREE_TYPE (op0);
+ tree op0a_type = TREE_TYPE (op0a);
+ enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
+
+ if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
+ {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+ }
+ }
+
/* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise operations.
Those can end up generated by folding and at least for integer mode masks
we cannot expect vcond expanders to exist. We lower a ? b : c
diff --git a/gcc/testsuite/gcc.target/arm/pr97282.c b/gcc/testsuite/gcc.target/arm/pr97282.c
new file mode 100644
index 00000000000..eeb4dd9d6bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97282.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint8x8_t f1(int8x8_t a, int8x8_t b) {
+ return a < b;
+}
+
+/* { dg-final { scan-assembler-not "vbsl" } } */
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-03 9:06 ` Prathamesh Kulkarni
@ 2020-12-03 11:05 ` Richard Biener
2020-12-04 9:22 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-03 11:05 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: GCC Development
On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> >
> > > Hi,
> > > For the test mentioned in PR, I was trying to see if we could do
> > > specialized expansion for vcond in target when operands are -1 and 0.
> > > arm_expand_vcond gets the following operands:
> > > (reg:V8QI 113 [ _2 ])
> > > (reg:V8QI 117)
> > > (reg:V8QI 118)
> > > (lt (reg/v:V8QI 115 [ a ])
> > > (reg/v:V8QI 116 [ b ]))
> > > (reg/v:V8QI 115 [ a ])
> > > (reg/v:V8QI 116 [ b ])
> > >
> > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > However, I am not sure if there's a way to check if the register is
> > > constant during expansion time (since we don't have df analysis yet) ?
> > >
> > > Alternatively, should we add a target hook that returns true if the
> > > result of vector comparison is set to all-ones or all-zeros, and then
> > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> >
> > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > vector type? ISEL could special case the a ? -1 : 0 case this way.
> I think the vec_cmp pattern matches but it produces a masked vector type.
> In the attached patch, I simply replaced:
> _1 = a < b
> x = _1 ? -1 : 0
> with
> x = view_convert_expr<_1>
>
> For the test-case, isel generates:
> vector(8) <signed-boolean:8> _1;
> vector(8) signed char _2;
> uint8x8_t _5;
>
> <bb 2> [local count: 1073741824]:
> _1 = a_3(D) < b_4(D);
> _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> return _5;
>
> and results in desired code-gen:
> f1:
> vcgt.s8 d0, d1, d0
> bx lr
>
> Altho I guess, we should remove the redundant conversions during isel itself ?
> and result in:
> _1 = a_3(D) < b_4(D)
> _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
>
> (Patch is lightly tested with only vect.exp)
+ /* For targets where result of comparison is all-ones or all-zeros,
+ a < b ? -1 : 0 can be reduced to a < b. */
+
+ if (integer_minus_onep (op1) && integer_zerop (op2))
+ {
So this really belongs here:
tree op0_type = TREE_TYPE (op0);
tree op0a_type = TREE_TYPE (op0a);
<---
if (used_vec_cond_exprs >= 2
&& (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
!= CODE_FOR_nothing)
&& expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
{
+ gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+ tree op0a = gimple_assign_rhs1 (def_stmt);
+ tree op0_type = TREE_TYPE (op0);
+ tree op0a_type = TREE_TYPE (op0a);
+ enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
+
+ if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
+ {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
and you need to verify that the mode of the lhs and the mode of op0
agree and that the target can actually expand_vec_cmp_expr_p
Richard.
> Thanks,
> Prathamesh
> >
> > > Thanks,
> > > Prathamesh
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-03 11:05 ` Richard Biener
@ 2020-12-04 9:22 ` Prathamesh Kulkarni
2020-12-04 11:48 ` Richard Biener
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-04 9:22 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Development
[-- Attachment #1: Type: text/plain, Size: 4323 bytes --]
On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
>
> On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > >
> > > > Hi,
> > > > For the test mentioned in PR, I was trying to see if we could do
> > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > arm_expand_vcond gets the following operands:
> > > > (reg:V8QI 113 [ _2 ])
> > > > (reg:V8QI 117)
> > > > (reg:V8QI 118)
> > > > (lt (reg/v:V8QI 115 [ a ])
> > > > (reg/v:V8QI 116 [ b ]))
> > > > (reg/v:V8QI 115 [ a ])
> > > > (reg/v:V8QI 116 [ b ])
> > > >
> > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > However, I am not sure if there's a way to check if the register is
> > > > constant during expansion time (since we don't have df analysis yet) ?
> > > >
> > > > Alternatively, should we add a target hook that returns true if the
> > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > >
> > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > I think the vec_cmp pattern matches but it produces a masked vector type.
> > In the attached patch, I simply replaced:
> > _1 = a < b
> > x = _1 ? -1 : 0
> > with
> > x = view_convert_expr<_1>
> >
> > For the test-case, isel generates:
> > vector(8) <signed-boolean:8> _1;
> > vector(8) signed char _2;
> > uint8x8_t _5;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = a_3(D) < b_4(D);
> > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > return _5;
> >
> > and results in desired code-gen:
> > f1:
> > vcgt.s8 d0, d1, d0
> > bx lr
> >
> > Altho I guess, we should remove the redundant conversions during isel itself ?
> > and result in:
> > _1 = a_3(D) < b_4(D)
> > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> >
> > (Patch is lightly tested with only vect.exp)
>
> + /* For targets where result of comparison is all-ones or all-zeros,
> + a < b ? -1 : 0 can be reduced to a < b. */
> +
> + if (integer_minus_onep (op1) && integer_zerop (op2))
> + {
>
> So this really belongs here:
>
> tree op0_type = TREE_TYPE (op0);
> tree op0a_type = TREE_TYPE (op0a);
>
> <---
>
> if (used_vec_cond_exprs >= 2
> && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> != CODE_FOR_nothing)
> && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> {
>
>
> + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> + tree op0a = gimple_assign_rhs1 (def_stmt);
> + tree op0_type = TREE_TYPE (op0);
> + tree op0a_type = TREE_TYPE (op0a);
> + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> +
> + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> + {
> + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> + gsi_replace (gsi, new_stmt, true);
>
> and you need to verify that the mode of the lhs and the mode of op0
> agree and that the target can actually expand_vec_cmp_expr_p
Thanks for the suggestions, does the attached patch look OK ?
Sorry, I am not sure how to check if target can actually expand vec_cmp ?
I assume that since expand_vec_cmp_expr_p queries optab and if it gets
a valid cmp icode, that
should be sufficient ?
Thanks,
Prathamesh
>
> Richard.
>
>
>
> > Thanks,
> > Prathamesh
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[-- Attachment #2: pr97282-2.diff --]
[-- Type: application/octet-stream, Size: 1463 bytes --]
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 048b407bd11..1f9a17105ea 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -184,6 +184,21 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
tree op0_type = TREE_TYPE (op0);
tree op0a_type = TREE_TYPE (op0a);
+
+ /* For targets where result of comparison is all-ones or all-zeros,
+ a < b ? -1 : 0 can be reduced to a < b. */
+
+ if (integer_minus_onep (op1)
+ && integer_zerop (op2)
+ && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))
+ && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
+ {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+ }
+
if (used_vec_cond_exprs >= 2
&& (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
!= CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/arm/pr97872.c b/gcc/testsuite/gcc.target/arm/pr97872.c
new file mode 100644
index 00000000000..eeb4dd9d6bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97872.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint8x8_t f1(int8x8_t a, int8x8_t b) {
+ return a < b;
+}
+
+/* { dg-final { scan-assembler-not "vbsl" } } */
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-04 9:22 ` Prathamesh Kulkarni
@ 2020-12-04 11:48 ` Richard Biener
2020-12-07 5:53 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-04 11:48 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: GCC Development
On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > Hi,
> > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > arm_expand_vcond gets the following operands:
> > > > > (reg:V8QI 113 [ _2 ])
> > > > > (reg:V8QI 117)
> > > > > (reg:V8QI 118)
> > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > (reg/v:V8QI 116 [ b ]))
> > > > > (reg/v:V8QI 115 [ a ])
> > > > > (reg/v:V8QI 116 [ b ])
> > > > >
> > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > However, I am not sure if there's a way to check if the register is
> > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > >
> > > > > Alternatively, should we add a target hook that returns true if the
> > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > >
> > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > In the attached patch, I simply replaced:
> > > _1 = a < b
> > > x = _1 ? -1 : 0
> > > with
> > > x = view_convert_expr<_1>
> > >
> > > For the test-case, isel generates:
> > > vector(8) <signed-boolean:8> _1;
> > > vector(8) signed char _2;
> > > uint8x8_t _5;
> > >
> > > <bb 2> [local count: 1073741824]:
> > > _1 = a_3(D) < b_4(D);
> > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > return _5;
> > >
> > > and results in desired code-gen:
> > > f1:
> > > vcgt.s8 d0, d1, d0
> > > bx lr
> > >
> > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > and result in:
> > > _1 = a_3(D) < b_4(D)
> > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > >
> > > (Patch is lightly tested with only vect.exp)
> >
> > + /* For targets where result of comparison is all-ones or all-zeros,
> > + a < b ? -1 : 0 can be reduced to a < b. */
> > +
> > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > + {
> >
> > So this really belongs here:
> >
> > tree op0_type = TREE_TYPE (op0);
> > tree op0a_type = TREE_TYPE (op0a);
> >
> > <---
> >
> > if (used_vec_cond_exprs >= 2
> > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > != CODE_FOR_nothing)
> > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > {
> >
> >
> > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > + tree op0_type = TREE_TYPE (op0);
> > + tree op0a_type = TREE_TYPE (op0a);
> > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > +
> > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > + {
> > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > + gsi_replace (gsi, new_stmt, true);
> >
> > and you need to verify that the mode of the lhs and the mode of op0
> > agree and that the target can actually expand_vec_cmp_expr_p
> Thanks for the suggestions, does the attached patch look OK ?
> Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> a valid cmp icode, that
> should be sufficient ?
Yes
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-04 11:48 ` Richard Biener
@ 2020-12-07 5:53 ` Prathamesh Kulkarni
2020-12-07 7:31 ` Richard Biener
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-07 5:53 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Development
[-- Attachment #1: Type: text/plain, Size: 6321 bytes --]
On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > Hi,
> > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > arm_expand_vcond gets the following operands:
> > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > (reg:V8QI 117)
> > > > > > (reg:V8QI 118)
> > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > (reg/v:V8QI 116 [ b ])
> > > > > >
> > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > > >
> > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > >
> > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > In the attached patch, I simply replaced:
> > > > _1 = a < b
> > > > x = _1 ? -1 : 0
> > > > with
> > > > x = view_convert_expr<_1>
> > > >
> > > > For the test-case, isel generates:
> > > > vector(8) <signed-boolean:8> _1;
> > > > vector(8) signed char _2;
> > > > uint8x8_t _5;
> > > >
> > > > <bb 2> [local count: 1073741824]:
> > > > _1 = a_3(D) < b_4(D);
> > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > return _5;
> > > >
> > > > and results in desired code-gen:
> > > > f1:
> > > > vcgt.s8 d0, d1, d0
> > > > bx lr
> > > >
> > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > and result in:
> > > > _1 = a_3(D) < b_4(D)
> > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > >
> > > > (Patch is lightly tested with only vect.exp)
> > >
> > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > +
> > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > + {
> > >
> > > So this really belongs here:
> > >
> > > tree op0_type = TREE_TYPE (op0);
> > > tree op0a_type = TREE_TYPE (op0a);
> > >
> > > <---
> > >
> > > if (used_vec_cond_exprs >= 2
> > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > != CODE_FOR_nothing)
> > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > {
> > >
> > >
> > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > + tree op0_type = TREE_TYPE (op0);
> > > + tree op0a_type = TREE_TYPE (op0a);
> > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > +
> > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > + {
> > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > + gsi_replace (gsi, new_stmt, true);
> > >
> > > and you need to verify that the mode of the lhs and the mode of op0
> > > agree and that the target can actually expand_vec_cmp_expr_p
> > Thanks for the suggestions, does the attached patch look OK ?
> > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > a valid cmp icode, that
> > should be sufficient ?
>
> Yes
Hi Richard,
I tested the patch, and it shows one regression for pr78102.c, because
of extra pcmpeqq in code-gen for x != y on x86.
For the test-case:
__v2di
baz (const __v2di x, const __v2di y)
{
return x != y;
}
Before patch:
baz:
pcmpeqq %xmm1, %xmm0
pcmpeqd %xmm1, %xmm1
pandn %xmm1, %xmm0
ret
After patch,
Before ISEL:
vector(2) <signed-boolean:64> _1;
__v2di _4;
<bb 2> [local count: 1073741824]:
_1 = x_2(D) != y_3(D);
_4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
return _4;
After ISEL:
vector(2) <signed-boolean:64> _1;
__v2di _4;
<bb 2> [local count: 1073741824]:
_1 = x_2(D) != y_3(D);
_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
return _4;
which results in:
pcmpeqq %xmm1, %xmm0
pxor %xmm1, %xmm1
pcmpeqq %xmm1, %xmm0
ret
IIUC, the new code-gen is essentially comparing two args for equality, and then
comparing the result against zero to invert it, so it looks correct ?
I am not sure which of the above two sequences is better tho ?
If the new code-gen is OK, would it be OK to adjust the test-case ?
Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > >
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[-- Attachment #2: pr97872-2.diff --]
[-- Type: application/octet-stream, Size: 1463 bytes --]
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 048b407bd11..1f9a17105ea 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -184,6 +184,21 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
tree op0_type = TREE_TYPE (op0);
tree op0a_type = TREE_TYPE (op0a);
+
+ /* For targets where result of comparison is all-ones or all-zeros,
+ a < b ? -1 : 0 can be reduced to a < b. */
+
+ if (integer_minus_onep (op1)
+ && integer_zerop (op2)
+ && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))
+ && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
+ {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+ }
+
if (used_vec_cond_exprs >= 2
&& (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
!= CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/arm/pr97872.c b/gcc/testsuite/gcc.target/arm/pr97872.c
new file mode 100644
index 00000000000..eeb4dd9d6bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97872.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint8x8_t f1(int8x8_t a, int8x8_t b) {
+ return a < b;
+}
+
+/* { dg-final { scan-assembler-not "vbsl" } } */
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 5:53 ` Prathamesh Kulkarni
@ 2020-12-07 7:31 ` Richard Biener
2020-12-07 9:33 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-07 7:31 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: GCC Development, hongtao.liu
On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> >
> > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > (reg:V8QI 117)
> > > > > > > (reg:V8QI 118)
> > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > >
> > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > > > >
> > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > >
> > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > In the attached patch, I simply replaced:
> > > > > _1 = a < b
> > > > > x = _1 ? -1 : 0
> > > > > with
> > > > > x = view_convert_expr<_1>
> > > > >
> > > > > For the test-case, isel generates:
> > > > > vector(8) <signed-boolean:8> _1;
> > > > > vector(8) signed char _2;
> > > > > uint8x8_t _5;
> > > > >
> > > > > <bb 2> [local count: 1073741824]:
> > > > > _1 = a_3(D) < b_4(D);
> > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > return _5;
> > > > >
> > > > > and results in desired code-gen:
> > > > > f1:
> > > > > vcgt.s8 d0, d1, d0
> > > > > bx lr
> > > > >
> > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > and result in:
> > > > > _1 = a_3(D) < b_4(D)
> > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > >
> > > > > (Patch is lightly tested with only vect.exp)
> > > >
> > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > +
> > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > + {
> > > >
> > > > So this really belongs here:
> > > >
> > > > tree op0_type = TREE_TYPE (op0);
> > > > tree op0a_type = TREE_TYPE (op0a);
> > > >
> > > > <---
> > > >
> > > > if (used_vec_cond_exprs >= 2
> > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > != CODE_FOR_nothing)
> > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > {
> > > >
> > > >
> > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > + tree op0_type = TREE_TYPE (op0);
> > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > +
> > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > + {
> > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > + gsi_replace (gsi, new_stmt, true);
> > > >
> > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > Thanks for the suggestions, does the attached patch look OK ?
> > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > a valid cmp icode, that
> > > should be sufficient ?
> >
> > Yes
> Hi Richard,
> I tested the patch, and it shows one regression for pr78102.c, because
> of extra pcmpeqq in code-gen for x != y on x86.
> For the test-case:
> __v2di
> baz (const __v2di x, const __v2di y)
> {
> return x != y;
> }
>
> Before patch:
> baz:
> pcmpeqq %xmm1, %xmm0
> pcmpeqd %xmm1, %xmm1
> pandn %xmm1, %xmm0
> ret
>
> After patch,
> Before ISEL:
> vector(2) <signed-boolean:64> _1;
> __v2di _4;
>
> <bb 2> [local count: 1073741824]:
> _1 = x_2(D) != y_3(D);
> _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> return _4;
>
> After ISEL:
> vector(2) <signed-boolean:64> _1;
> __v2di _4;
>
> <bb 2> [local count: 1073741824]:
> _1 = x_2(D) != y_3(D);
> _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> return _4;
>
> which results in:
> pcmpeqq %xmm1, %xmm0
> pxor %xmm1, %xmm1
> pcmpeqq %xmm1, %xmm0
> ret
>
> IIUC, the new code-gen is essentially comparing two args for equality, and then
> comparing the result against zero to invert it, so it looks correct ?
> I am not sure which of the above two sequences is better tho ?
> If the new code-gen is OK, would it be OK to adjust the test-case ?
In case pcmpeqq is double-issue the first variant might be faster while
the second variant has the advantage of the "free" pxor, but back-to-back
pcmpeqq might have an issue.
I think on GIMPLE the new code is preferable and adjustments are
target business. I wouldn't be surprised if the x86 backend
special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
variant.
Did you check how
a = x != y ? { -1, -1 } : {0, 0 };
b = x != y ? { 1, 2 } : { 3, 4 };
is handled before/after your patch? That is, make the comparison
CSEd between two VEC_COND_EXPRs?
Thanks,
Richard.
> Thanks,
> Prathamesh
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Richard.
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 7:31 ` Richard Biener
@ 2020-12-07 9:33 ` Prathamesh Kulkarni
2020-12-07 9:45 ` Richard Biener
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-07 9:33 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Development, hongtao.liu
On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > >
> > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > (reg:V8QI 117)
> > > > > > > > (reg:V8QI 118)
> > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > >
> > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > > > > >
> > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > >
> > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > In the attached patch, I simply replaced:
> > > > > > _1 = a < b
> > > > > > x = _1 ? -1 : 0
> > > > > > with
> > > > > > x = view_convert_expr<_1>
> > > > > >
> > > > > > For the test-case, isel generates:
> > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > vector(8) signed char _2;
> > > > > > uint8x8_t _5;
> > > > > >
> > > > > > <bb 2> [local count: 1073741824]:
> > > > > > _1 = a_3(D) < b_4(D);
> > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > return _5;
> > > > > >
> > > > > > and results in desired code-gen:
> > > > > > f1:
> > > > > > vcgt.s8 d0, d1, d0
> > > > > > bx lr
> > > > > >
> > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > and result in:
> > > > > > _1 = a_3(D) < b_4(D)
> > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > >
> > > > > > (Patch is lightly tested with only vect.exp)
> > > > >
> > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > +
> > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > + {
> > > > >
> > > > > So this really belongs here:
> > > > >
> > > > > tree op0_type = TREE_TYPE (op0);
> > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > >
> > > > > <---
> > > > >
> > > > > if (used_vec_cond_exprs >= 2
> > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > != CODE_FOR_nothing)
> > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > {
> > > > >
> > > > >
> > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > +
> > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > + {
> > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > + gsi_replace (gsi, new_stmt, true);
> > > > >
> > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > a valid cmp icode, that
> > > > should be sufficient ?
> > >
> > > Yes
> > Hi Richard,
> > I tested the patch, and it shows one regression for pr78102.c, because
> > of extra pcmpeqq in code-gen for x != y on x86.
> > For the test-case:
> > __v2di
> > baz (const __v2di x, const __v2di y)
> > {
> > return x != y;
> > }
> >
> > Before patch:
> > baz:
> > pcmpeqq %xmm1, %xmm0
> > pcmpeqd %xmm1, %xmm1
> > pandn %xmm1, %xmm0
> > ret
> >
> > After patch,
> > Before ISEL:
> > vector(2) <signed-boolean:64> _1;
> > __v2di _4;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = x_2(D) != y_3(D);
> > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > return _4;
> >
> > After ISEL:
> > vector(2) <signed-boolean:64> _1;
> > __v2di _4;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = x_2(D) != y_3(D);
> > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > return _4;
> >
> > which results in:
> > pcmpeqq %xmm1, %xmm0
> > pxor %xmm1, %xmm1
> > pcmpeqq %xmm1, %xmm0
> > ret
> >
> > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > comparing the result against zero to invert it, so it looks correct ?
> > I am not sure which of the above two sequences is better tho ?
> > If the new code-gen is OK, would it be OK to adjust the test-case ?
>
> In case pcmpeqq is double-issue the first variant might be faster while
> the second variant has the advantage of the "free" pxor, but back-to-back
> pcmpeqq might have an issue.
>
> I think on GIMPLE the new code is preferable and adjustments are
> target business. I wouldn't be surprised if the x86 backend
> special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> variant.
>
> Did you check how
>
> a = x != y ? { -1, -1 } : {0, 0 };
> b = x != y ? { 1, 2 } : { 3, 4 };
>
> is handled before/after your patch? That is, make the comparison
> CSEd between two VEC_COND_EXPRs?
For the test-case:
__v2di f(__v2di, __v2di);
__v2di
baz (const __v2di x, const __v2di y)
{
__v2di a = (x != y);
__v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
return f (a, b);
}
Before patch, isel converts both to .vcondeq:
__v2di b;
__v2di a;
__v2di _8;
<bb 2> [local count: 1073741824]:
a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
_8 = f (a_4, b_5); [tail call]
return _8;
and results in following code-gen:
_Z3bazDv2_xS_:
.LFB5666:
pcmpeqq %xmm1, %xmm0
pcmpeqd %xmm1, %xmm1
movdqa %xmm0, %xmm2
pandn %xmm1, %xmm2
movdqa .LC0(%rip), %xmm1
pblendvb %xmm0, .LC1(%rip), %xmm1
movdqa %xmm2, %xmm0
jmp _Z1fDv2_xS_
With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
view_convert_expr and the other
to vcondeq:
__v2di b;
__v2di a;
vector(2) <signed-boolean:64> _1;
__v2di _8;
<bb 2> [local count: 1073741824]:
_1 = x_2(D) != y_3(D);
a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
_8 = f (a_4, b_5); [tail call]
return _8;
which results in following code-gen:
_Z3bazDv2_xS_:
.LFB5666:
pcmpeqq %xmm1, %xmm0
pxor %xmm2, %xmm2
movdqa .LC0(%rip), %xmm1
pblendvb %xmm0, .LC1(%rip), %xmm1
pcmpeqq %xmm0, %xmm2
movdqa %xmm2, %xmm0
jmp _Z1fDv2_xS_
Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
> > Thanks,
> > Prathamesh
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Richard.
> > > > >
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 9:33 ` Prathamesh Kulkarni
@ 2020-12-07 9:45 ` Richard Biener
2020-12-07 10:47 ` Hongtao Liu
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-07 9:45 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: GCC Development, hongtao.liu
On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> >
> > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > (reg:V8QI 117)
> > > > > > > > > (reg:V8QI 118)
> > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > >
> > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > > > > > >
> > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > >
> > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > In the attached patch, I simply replaced:
> > > > > > > _1 = a < b
> > > > > > > x = _1 ? -1 : 0
> > > > > > > with
> > > > > > > x = view_convert_expr<_1>
> > > > > > >
> > > > > > > For the test-case, isel generates:
> > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > vector(8) signed char _2;
> > > > > > > uint8x8_t _5;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > return _5;
> > > > > > >
> > > > > > > and results in desired code-gen:
> > > > > > > f1:
> > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > bx lr
> > > > > > >
> > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > and result in:
> > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > >
> > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > >
> > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > +
> > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > + {
> > > > > >
> > > > > > So this really belongs here:
> > > > > >
> > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > >
> > > > > > <---
> > > > > >
> > > > > > if (used_vec_cond_exprs >= 2
> > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > != CODE_FOR_nothing)
> > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > {
> > > > > >
> > > > > >
> > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > +
> > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > + {
> > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > >
> > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > a valid cmp icode, that
> > > > > should be sufficient ?
> > > >
> > > > Yes
> > > Hi Richard,
> > > I tested the patch, and it shows one regression for pr78102.c, because
> > > of extra pcmpeqq in code-gen for x != y on x86.
> > > For the test-case:
> > > __v2di
> > > baz (const __v2di x, const __v2di y)
> > > {
> > > return x != y;
> > > }
> > >
> > > Before patch:
> > > baz:
> > > pcmpeqq %xmm1, %xmm0
> > > pcmpeqd %xmm1, %xmm1
> > > pandn %xmm1, %xmm0
> > > ret
> > >
> > > After patch,
> > > Before ISEL:
> > > vector(2) <signed-boolean:64> _1;
> > > __v2di _4;
> > >
> > > <bb 2> [local count: 1073741824]:
> > > _1 = x_2(D) != y_3(D);
> > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > return _4;
> > >
> > > After ISEL:
> > > vector(2) <signed-boolean:64> _1;
> > > __v2di _4;
> > >
> > > <bb 2> [local count: 1073741824]:
> > > _1 = x_2(D) != y_3(D);
> > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > return _4;
> > >
> > > which results in:
> > > pcmpeqq %xmm1, %xmm0
> > > pxor %xmm1, %xmm1
> > > pcmpeqq %xmm1, %xmm0
> > > ret
> > >
> > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > comparing the result against zero to invert it, so it looks correct ?
> > > I am not sure which of the above two sequences is better tho ?
> > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> >
> > In case pcmpeqq is double-issue the first variant might be faster while
> > the second variant has the advantage of the "free" pxor, but back-to-back
> > pcmpeqq might have an issue.
> >
> > I think on GIMPLE the new code is preferable and adjustments are
> > target business. I wouldn't be surprised if the x86 backend
> > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > variant.
> >
> > Did you check how
> >
> > a = x != y ? { -1, -1 } : {0, 0 };
> > b = x != y ? { 1, 2 } : { 3, 4 };
> >
> > is handled before/after your patch? That is, make the comparison
> > CSEd between two VEC_COND_EXPRs?
> For the test-case:
> __v2di f(__v2di, __v2di);
>
> __v2di
> baz (const __v2di x, const __v2di y)
> {
> __v2di a = (x != y);
> __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> return f (a, b);
> }
>
> Before patch, isel converts both to .vcondeq:
> __v2di b;
> __v2di a;
> __v2di _8;
>
> <bb 2> [local count: 1073741824]:
> a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> _8 = f (a_4, b_5); [tail call]
> return _8;
>
> and results in following code-gen:
> _Z3bazDv2_xS_:
> .LFB5666:
> pcmpeqq %xmm1, %xmm0
> pcmpeqd %xmm1, %xmm1
> movdqa %xmm0, %xmm2
> pandn %xmm1, %xmm2
> movdqa .LC0(%rip), %xmm1
> pblendvb %xmm0, .LC1(%rip), %xmm1
> movdqa %xmm2, %xmm0
> jmp _Z1fDv2_xS_
>
> With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> view_convert_expr and the other
> to vcondeq:
> __v2di b;
> __v2di a;
> vector(2) <signed-boolean:64> _1;
> __v2di _8;
>
> <bb 2> [local count: 1073741824]:
> _1 = x_2(D) != y_3(D);
> a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> _8 = f (a_4, b_5); [tail call]
> return _8;
>
> which results in following code-gen:
> _Z3bazDv2_xS_:
> .LFB5666:
> pcmpeqq %xmm1, %xmm0
> pxor %xmm2, %xmm2
> movdqa .LC0(%rip), %xmm1
> pblendvb %xmm0, .LC1(%rip), %xmm1
> pcmpeqq %xmm0, %xmm2
> movdqa %xmm2, %xmm0
> jmp _Z1fDv2_xS_
Ok, thanks for checking. I think the patch is OK but please let
Hongtao the chance to comment.
Richard.
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 9:45 ` Richard Biener
@ 2020-12-07 10:47 ` Hongtao Liu
2020-12-07 11:10 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2020-12-07 10:47 UTC (permalink / raw)
To: Richard Biener; +Cc: Prathamesh Kulkarni, GCC Development, Liu, Hongtao
On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > >
> > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > >
> > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
It seems to me that all you need to do is relax the predicates of op1
and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
debugged it, but I see that vcondmn in neon.md only accepts
s_register_operand.
(define_expand "vcond<mode><mode>"
[(set (match_operand:VDQW 0 "s_register_operand")
(if_then_else:VDQW
(match_operator 3 "comparison_operator"
[(match_operand:VDQW 4 "s_register_operand")
(match_operand:VDQW 5 "reg_or_zero_operand")])
(match_operand:VDQW 1 "s_register_operand")
(match_operand:VDQW 2 "s_register_operand")))]
"TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
{
arm_expand_vcond (operands, <V_cmp_result>mode);
DONE;
})
in sse.md it's defined as
(define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
[(set (match_operand:V_512 0 "register_operand")
(if_then_else:V_512
(match_operator 3 ""
[(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
(match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
(match_operand:V_512 1 "general_operand")
(match_operand:V_512 2 "general_operand")))]
"TARGET_AVX512F
&& (GET_MODE_NUNITS (<V_512:MODE>mode)
== GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
{
bool ok = ix86_expand_int_vcond (operands);
gcc_assert (ok);
DONE;
})
then we can get operands[1] and operands[2] as
(gdb) p debug_rtx (operands[1])
(const_vector:V16QI [
(const_int -1 [0xffffffffffffffff]) repeated x16
])
(gdb) p debug_rtx (operands[2])
(reg:V16QI 82 [ _2 ])
(const_vector:V16QI [
(const_int 0 [0]) repeated x16
])
> > > > > > > > > >
> > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > >
> > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > _1 = a < b
> > > > > > > > x = _1 ? -1 : 0
> > > > > > > > with
> > > > > > > > x = view_convert_expr<_1>
> > > > > > > >
> > > > > > > > For the test-case, isel generates:
> > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > vector(8) signed char _2;
> > > > > > > > uint8x8_t _5;
> > > > > > > >
> > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > return _5;
> > > > > > > >
> > > > > > > > and results in desired code-gen:
> > > > > > > > f1:
> > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > bx lr
> > > > > > > >
> > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > and result in:
> > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > >
> > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > >
> > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > +
> > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > + {
> > > > > > >
> > > > > > > So this really belongs here:
> > > > > > >
> > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > >
> > > > > > > <---
> > > > > > >
> > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > != CODE_FOR_nothing)
> > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > {
> > > > > > >
> > > > > > >
> > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > +
> > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > + {
> > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > >
> > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > a valid cmp icode, that
> > > > > > should be sufficient ?
> > > > >
> > > > > Yes
> > > > Hi Richard,
> > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > For the test-case:
> > > > __v2di
> > > > baz (const __v2di x, const __v2di y)
> > > > {
> > > > return x != y;
> > > > }
> > > >
> > > > Before patch:
> > > > baz:
> > > > pcmpeqq %xmm1, %xmm0
> > > > pcmpeqd %xmm1, %xmm1
> > > > pandn %xmm1, %xmm0
> > > > ret
> > > >
> > > > After patch,
> > > > Before ISEL:
> > > > vector(2) <signed-boolean:64> _1;
> > > > __v2di _4;
> > > >
> > > > <bb 2> [local count: 1073741824]:
> > > > _1 = x_2(D) != y_3(D);
> > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > return _4;
> > > >
> > > > After ISEL:
> > > > vector(2) <signed-boolean:64> _1;
> > > > __v2di _4;
> > > >
> > > > <bb 2> [local count: 1073741824]:
> > > > _1 = x_2(D) != y_3(D);
> > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > return _4;
> > > >
> > > > which results in:
> > > > pcmpeqq %xmm1, %xmm0
> > > > pxor %xmm1, %xmm1
> > > > pcmpeqq %xmm1, %xmm0
> > > > ret
> > > >
> > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > comparing the result against zero to invert it, so it looks correct ?
> > > > I am not sure which of the above two sequences is better tho ?
> > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > >
> > > In case pcmpeqq is double-issue the first variant might be faster while
> > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > pcmpeqq might have an issue.
> > >
> > > I think on GIMPLE the new code is preferable and adjustments are
> > > target business. I wouldn't be surprised if the x86 backend
> > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > variant.
> > >
> > > Did you check how
> > >
> > > a = x != y ? { -1, -1 } : {0, 0 };
> > > b = x != y ? { 1, 2 } : { 3, 4 };
> > >
> > > is handled before/after your patch? That is, make the comparison
> > > CSEd between two VEC_COND_EXPRs?
> > For the test-case:
> > __v2di f(__v2di, __v2di);
> >
> > __v2di
> > baz (const __v2di x, const __v2di y)
> > {
> > __v2di a = (x != y);
> > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > return f (a, b);
> > }
> >
> > Before patch, isel converts both to .vcondeq:
> > __v2di b;
> > __v2di a;
> > __v2di _8;
> >
> > <bb 2> [local count: 1073741824]:
> > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > _8 = f (a_4, b_5); [tail call]
> > return _8;
> >
> > and results in following code-gen:
> > _Z3bazDv2_xS_:
> > .LFB5666:
> > pcmpeqq %xmm1, %xmm0
> > pcmpeqd %xmm1, %xmm1
> > movdqa %xmm0, %xmm2
> > pandn %xmm1, %xmm2
> > movdqa .LC0(%rip), %xmm1
> > pblendvb %xmm0, .LC1(%rip), %xmm1
> > movdqa %xmm2, %xmm0
> > jmp _Z1fDv2_xS_
> >
> > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > view_convert_expr and the other
> > to vcondeq:
> > __v2di b;
> > __v2di a;
> > vector(2) <signed-boolean:64> _1;
> > __v2di _8;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = x_2(D) != y_3(D);
> > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > _8 = f (a_4, b_5); [tail call]
> > return _8;
> >
> > which results in following code-gen:
> > _Z3bazDv2_xS_:
> > .LFB5666:
> > pcmpeqq %xmm1, %xmm0
> > pxor %xmm2, %xmm2
> > movdqa .LC0(%rip), %xmm1
> > pblendvb %xmm0, .LC1(%rip), %xmm1
> > pcmpeqq %xmm0, %xmm2
> > movdqa %xmm2, %xmm0
> > jmp _Z1fDv2_xS_
>
> Ok, thanks for checking. I think the patch is OK but please let
> Hongtao the chance to comment.
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard.
> > >
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 10:47 ` Hongtao Liu
@ 2020-12-07 11:10 ` Prathamesh Kulkarni
2020-12-07 12:09 ` Hongtao Liu
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-07 11:10 UTC (permalink / raw)
To: Hongtao Liu; +Cc: Richard Biener, GCC Development, Liu, Hongtao
On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> >
> > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > >
> > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
>
> It seems to me that all you need to do is relax the predicates of op1
> and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> debugged it, but I see that vcondmn in neon.md only accepts
> s_register_operand.
>
> (define_expand "vcond<mode><mode>"
> [(set (match_operand:VDQW 0 "s_register_operand")
> (if_then_else:VDQW
> (match_operator 3 "comparison_operator"
> [(match_operand:VDQW 4 "s_register_operand")
> (match_operand:VDQW 5 "reg_or_zero_operand")])
> (match_operand:VDQW 1 "s_register_operand")
> (match_operand:VDQW 2 "s_register_operand")))]
> "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> {
> arm_expand_vcond (operands, <V_cmp_result>mode);
> DONE;
> })
>
> in sse.md it's defined as
> (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> [(set (match_operand:V_512 0 "register_operand")
> (if_then_else:V_512
> (match_operator 3 ""
> [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> (match_operand:V_512 1 "general_operand")
> (match_operand:V_512 2 "general_operand")))]
> "TARGET_AVX512F
> && (GET_MODE_NUNITS (<V_512:MODE>mode)
> == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> {
> bool ok = ix86_expand_int_vcond (operands);
> gcc_assert (ok);
> DONE;
> })
>
> then we can get operands[1] and operands[2] as
>
> (gdb) p debug_rtx (operands[1])
> (const_vector:V16QI [
> (const_int -1 [0xffffffffffffffff]) repeated x16
> ])
> (gdb) p debug_rtx (operands[2])
> (reg:V16QI 82 [ _2 ])
> (const_vector:V16QI [
> (const_int 0 [0]) repeated x16
> ])
Hi Hongtao,
Thanks for the suggestions!
However IIUC from vector extensions doc page, the result of vector
comparison is defined to be 0
or -1, so would it be better to canonicalize
x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
adjust targets if required ?
Alternatively, I could try fixing this in backend as you suggest above.
Thanks,
Prathamesh
>
> > > > > > > > > > >
> > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > >
> > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > _1 = a < b
> > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > with
> > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > >
> > > > > > > > > For the test-case, isel generates:
> > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > vector(8) signed char _2;
> > > > > > > > > uint8x8_t _5;
> > > > > > > > >
> > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > return _5;
> > > > > > > > >
> > > > > > > > > and results in desired code-gen:
> > > > > > > > > f1:
> > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > bx lr
> > > > > > > > >
> > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > and result in:
> > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > >
> > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > >
> > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > +
> > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > + {
> > > > > > > >
> > > > > > > > So this really belongs here:
> > > > > > > >
> > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > >
> > > > > > > > <---
> > > > > > > >
> > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > != CODE_FOR_nothing)
> > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > {
> > > > > > > >
> > > > > > > >
> > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > +
> > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > + {
> > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > >
> > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > a valid cmp icode, that
> > > > > > > should be sufficient ?
> > > > > >
> > > > > > Yes
> > > > > Hi Richard,
> > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > For the test-case:
> > > > > __v2di
> > > > > baz (const __v2di x, const __v2di y)
> > > > > {
> > > > > return x != y;
> > > > > }
> > > > >
> > > > > Before patch:
> > > > > baz:
> > > > > pcmpeqq %xmm1, %xmm0
> > > > > pcmpeqd %xmm1, %xmm1
> > > > > pandn %xmm1, %xmm0
> > > > > ret
> > > > >
> > > > > After patch,
> > > > > Before ISEL:
> > > > > vector(2) <signed-boolean:64> _1;
> > > > > __v2di _4;
> > > > >
> > > > > <bb 2> [local count: 1073741824]:
> > > > > _1 = x_2(D) != y_3(D);
> > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > return _4;
> > > > >
> > > > > After ISEL:
> > > > > vector(2) <signed-boolean:64> _1;
> > > > > __v2di _4;
> > > > >
> > > > > <bb 2> [local count: 1073741824]:
> > > > > _1 = x_2(D) != y_3(D);
> > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > return _4;
> > > > >
> > > > > which results in:
> > > > > pcmpeqq %xmm1, %xmm0
> > > > > pxor %xmm1, %xmm1
> > > > > pcmpeqq %xmm1, %xmm0
> > > > > ret
> > > > >
> > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > I am not sure which of the above two sequences is better tho ?
> > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > >
> > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > pcmpeqq might have an issue.
> > > >
> > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > target business. I wouldn't be surprised if the x86 backend
> > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > variant.
> > > >
> > > > Did you check how
> > > >
> > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > >
> > > > is handled before/after your patch? That is, make the comparison
> > > > CSEd between two VEC_COND_EXPRs?
> > > For the test-case:
> > > __v2di f(__v2di, __v2di);
> > >
> > > __v2di
> > > baz (const __v2di x, const __v2di y)
> > > {
> > > __v2di a = (x != y);
> > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > return f (a, b);
> > > }
> > >
> > > Before patch, isel converts both to .vcondeq:
> > > __v2di b;
> > > __v2di a;
> > > __v2di _8;
> > >
> > > <bb 2> [local count: 1073741824]:
> > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > _8 = f (a_4, b_5); [tail call]
> > > return _8;
> > >
> > > and results in following code-gen:
> > > _Z3bazDv2_xS_:
> > > .LFB5666:
> > > pcmpeqq %xmm1, %xmm0
> > > pcmpeqd %xmm1, %xmm1
> > > movdqa %xmm0, %xmm2
> > > pandn %xmm1, %xmm2
> > > movdqa .LC0(%rip), %xmm1
> > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > movdqa %xmm2, %xmm0
> > > jmp _Z1fDv2_xS_
> > >
> > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > view_convert_expr and the other
> > > to vcondeq:
> > > __v2di b;
> > > __v2di a;
> > > vector(2) <signed-boolean:64> _1;
> > > __v2di _8;
> > >
> > > <bb 2> [local count: 1073741824]:
> > > _1 = x_2(D) != y_3(D);
> > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > _8 = f (a_4, b_5); [tail call]
> > > return _8;
> > >
> > > which results in following code-gen:
> > > _Z3bazDv2_xS_:
> > > .LFB5666:
> > > pcmpeqq %xmm1, %xmm0
> > > pxor %xmm2, %xmm2
> > > movdqa .LC0(%rip), %xmm1
> > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > pcmpeqq %xmm0, %xmm2
> > > movdqa %xmm2, %xmm0
> > > jmp _Z1fDv2_xS_
> >
> > Ok, thanks for checking. I think the patch is OK but please let
> > Hongtao the chance to comment.
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
>
>
> --
> BR,
> Hongtao
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 11:10 ` Prathamesh Kulkarni
@ 2020-12-07 12:09 ` Hongtao Liu
2020-12-08 9:06 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Hongtao Liu @ 2020-12-07 12:09 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Biener, GCC Development, Liu, Hongtao
On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > >
> > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > >
> > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> >
> > It seems to me that all you need to do is relax the predicates of op1
> > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > debugged it, but I see that vcondmn in neon.md only accepts
> > s_register_operand.
> >
> > (define_expand "vcond<mode><mode>"
> > [(set (match_operand:VDQW 0 "s_register_operand")
> > (if_then_else:VDQW
> > (match_operator 3 "comparison_operator"
> > [(match_operand:VDQW 4 "s_register_operand")
> > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > (match_operand:VDQW 1 "s_register_operand")
> > (match_operand:VDQW 2 "s_register_operand")))]
> > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > {
> > arm_expand_vcond (operands, <V_cmp_result>mode);
> > DONE;
> > })
> >
> > in sse.md it's defined as
> > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > [(set (match_operand:V_512 0 "register_operand")
> > (if_then_else:V_512
> > (match_operator 3 ""
> > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > (match_operand:V_512 1 "general_operand")
> > (match_operand:V_512 2 "general_operand")))]
> > "TARGET_AVX512F
> > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > {
> > bool ok = ix86_expand_int_vcond (operands);
> > gcc_assert (ok);
> > DONE;
> > })
> >
> > then we can get operands[1] and operands[2] as
> >
> > (gdb) p debug_rtx (operands[1])
> > (const_vector:V16QI [
> > (const_int -1 [0xffffffffffffffff]) repeated x16
> > ])
> > (gdb) p debug_rtx (operands[2])
> > (reg:V16QI 82 [ _2 ])
> > (const_vector:V16QI [
> > (const_int 0 [0]) repeated x16
> > ])
> Hi Hongtao,
> Thanks for the suggestions!
> However IIUC from vector extensions doc page, the result of vector
> comparison is defined to be 0
> or -1, so would it be better to canonicalize
> x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> adjust targets if required ?
Yes, it would be more straightforward to handle it in gimple isel, I
would adjust the backend and testcase after you check in the patch.
> Alternatively, I could try fixing this in backend as you suggest above.
>
> Thanks,
> Prathamesh
> >
> > > > > > > > > > > >
> > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > >
> > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > _1 = a < b
> > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > with
> > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > >
> > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > uint8x8_t _5;
> > > > > > > > > >
> > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > return _5;
> > > > > > > > > >
> > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > f1:
> > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > bx lr
> > > > > > > > > >
> > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > and result in:
> > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > >
> > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > >
> > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > +
> > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > + {
> > > > > > > > >
> > > > > > > > > So this really belongs here:
> > > > > > > > >
> > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > >
> > > > > > > > > <---
> > > > > > > > >
> > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > {
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > +
> > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > + {
> > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > >
> > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > a valid cmp icode, that
> > > > > > > > should be sufficient ?
> > > > > > >
> > > > > > > Yes
> > > > > > Hi Richard,
> > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > For the test-case:
> > > > > > __v2di
> > > > > > baz (const __v2di x, const __v2di y)
> > > > > > {
> > > > > > return x != y;
> > > > > > }
> > > > > >
> > > > > > Before patch:
> > > > > > baz:
> > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > pandn %xmm1, %xmm0
> > > > > > ret
> > > > > >
> > > > > > After patch,
> > > > > > Before ISEL:
> > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > __v2di _4;
> > > > > >
> > > > > > <bb 2> [local count: 1073741824]:
> > > > > > _1 = x_2(D) != y_3(D);
> > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > return _4;
> > > > > >
> > > > > > After ISEL:
> > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > __v2di _4;
> > > > > >
> > > > > > <bb 2> [local count: 1073741824]:
> > > > > > _1 = x_2(D) != y_3(D);
> > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > return _4;
> > > > > >
> > > > > > which results in:
> > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > pxor %xmm1, %xmm1
> > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > ret
seems better to be
pcmpeqq %xmm1, %xmm0
pxor %xmm1, %xmm1
pxor %xmm1, %xmm0
ret
Anyway, it needs backend adjustment.
> > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > >
> > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > pcmpeqq might have an issue.
> > > > >
> > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > variant.
> > > > >
> > > > > Did you check how
> > > > >
> > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > >
> > > > > is handled before/after your patch? That is, make the comparison
> > > > > CSEd between two VEC_COND_EXPRs?
> > > > For the test-case:
> > > > __v2di f(__v2di, __v2di);
> > > >
> > > > __v2di
> > > > baz (const __v2di x, const __v2di y)
> > > > {
> > > > __v2di a = (x != y);
> > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > return f (a, b);
> > > > }
> > > >
> > > > Before patch, isel converts both to .vcondeq:
> > > > __v2di b;
> > > > __v2di a;
> > > > __v2di _8;
> > > >
> > > > <bb 2> [local count: 1073741824]:
> > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > _8 = f (a_4, b_5); [tail call]
> > > > return _8;
> > > >
> > > > and results in following code-gen:
> > > > _Z3bazDv2_xS_:
> > > > .LFB5666:
> > > > pcmpeqq %xmm1, %xmm0
> > > > pcmpeqd %xmm1, %xmm1
> > > > movdqa %xmm0, %xmm2
> > > > pandn %xmm1, %xmm2
> > > > movdqa .LC0(%rip), %xmm1
> > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > movdqa %xmm2, %xmm0
> > > > jmp _Z1fDv2_xS_
> > > >
> > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > view_convert_expr and the other
> > > > to vcondeq:
> > > > __v2di b;
> > > > __v2di a;
> > > > vector(2) <signed-boolean:64> _1;
> > > > __v2di _8;
> > > >
> > > > <bb 2> [local count: 1073741824]:
> > > > _1 = x_2(D) != y_3(D);
> > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > _8 = f (a_4, b_5); [tail call]
> > > > return _8;
> > > >
> > > > which results in following code-gen:
> > > > _Z3bazDv2_xS_:
> > > > .LFB5666:
> > > > pcmpeqq %xmm1, %xmm0
> > > > pxor %xmm2, %xmm2
> > > > movdqa .LC0(%rip), %xmm1
> > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > pcmpeqq %xmm0, %xmm2
> > > > movdqa %xmm2, %xmm0
> > > > jmp _Z1fDv2_xS_
> > >
> > > Ok, thanks for checking. I think the patch is OK but please let
> > > Hongtao the chance to comment.
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
> >
> >
> > --
> > BR,
> > Hongtao
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-07 12:09 ` Hongtao Liu
@ 2020-12-08 9:06 ` Prathamesh Kulkarni
2020-12-09 11:47 ` Prathamesh Kulkarni
0 siblings, 1 reply; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-08 9:06 UTC (permalink / raw)
To: Hongtao Liu; +Cc: Richard Biener, GCC Development, Liu, Hongtao, gcc Patches
[-- Attachment #1: Type: text/plain, Size: 16081 bytes --]
On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > >
> > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > >
> > > It seems to me that all you need to do is relax the predicates of op1
> > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > debugged it, but I see that vcondmn in neon.md only accepts
> > > s_register_operand.
> > >
> > > (define_expand "vcond<mode><mode>"
> > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > (if_then_else:VDQW
> > > (match_operator 3 "comparison_operator"
> > > [(match_operand:VDQW 4 "s_register_operand")
> > > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > (match_operand:VDQW 1 "s_register_operand")
> > > (match_operand:VDQW 2 "s_register_operand")))]
> > > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > {
> > > arm_expand_vcond (operands, <V_cmp_result>mode);
> > > DONE;
> > > })
> > >
> > > in sse.md it's defined as
> > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > > [(set (match_operand:V_512 0 "register_operand")
> > > (if_then_else:V_512
> > > (match_operator 3 ""
> > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > (match_operand:V_512 1 "general_operand")
> > > (match_operand:V_512 2 "general_operand")))]
> > > "TARGET_AVX512F
> > > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > {
> > > bool ok = ix86_expand_int_vcond (operands);
> > > gcc_assert (ok);
> > > DONE;
> > > })
> > >
> > > then we can get operands[1] and operands[2] as
> > >
> > > (gdb) p debug_rtx (operands[1])
> > > (const_vector:V16QI [
> > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > ])
> > > (gdb) p debug_rtx (operands[2])
> > > (reg:V16QI 82 [ _2 ])
> > > (const_vector:V16QI [
> > > (const_int 0 [0]) repeated x16
> > > ])
> > Hi Hongtao,
> > Thanks for the suggestions!
> > However IIUC from vector extensions doc page, the result of vector
> > comparison is defined to be 0
> > or -1, so would it be better to canonicalize
> > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > adjust targets if required ?
>
> Yes, it would be more straightforward to handle it in gimple isel, I
> would adjust the backend and testcase after you check in the patch.
Thanks! I have committed the attached patch in
3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.
Regards,
Prathamesh
>
> > Alternatively, I could try fixing this in backend as you suggest above.
> >
> > Thanks,
> > Prathamesh
> > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > > >
> > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > _1 = a < b
> > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > with
> > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > >
> > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > > uint8x8_t _5;
> > > > > > > > > > >
> > > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > > return _5;
> > > > > > > > > > >
> > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > f1:
> > > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > > bx lr
> > > > > > > > > > >
> > > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > > and result in:
> > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > >
> > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > >
> > > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > > +
> > > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > + {
> > > > > > > > > >
> > > > > > > > > > So this really belongs here:
> > > > > > > > > >
> > > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > >
> > > > > > > > > > <---
> > > > > > > > > >
> > > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > {
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > > +
> > > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > + {
> > > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > > >
> > > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > > a valid cmp icode, that
> > > > > > > > > should be sufficient ?
> > > > > > > >
> > > > > > > > Yes
> > > > > > > Hi Richard,
> > > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > For the test-case:
> > > > > > > __v2di
> > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > {
> > > > > > > return x != y;
> > > > > > > }
> > > > > > >
> > > > > > > Before patch:
> > > > > > > baz:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > pandn %xmm1, %xmm0
> > > > > > > ret
> > > > > > >
> > > > > > > After patch,
> > > > > > > Before ISEL:
> > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > __v2di _4;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > > return _4;
> > > > > > >
> > > > > > > After ISEL:
> > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > __v2di _4;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > return _4;
> > > > > > >
> > > > > > > which results in:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pxor %xmm1, %xmm1
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > ret
> seems better to be
>
> pcmpeqq %xmm1, %xmm0
> pxor %xmm1, %xmm1
> pxor %xmm1, %xmm0
> ret
>
> Anyway, it needs backend adjustment.
>
> > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > > >
> > > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > > pcmpeqq might have an issue.
> > > > > >
> > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > variant.
> > > > > >
> > > > > > Did you check how
> > > > > >
> > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > >
> > > > > > is handled before/after your patch? That is, make the comparison
> > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > For the test-case:
> > > > > __v2di f(__v2di, __v2di);
> > > > >
> > > > > __v2di
> > > > > baz (const __v2di x, const __v2di y)
> > > > > {
> > > > > __v2di a = (x != y);
> > > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > > return f (a, b);
> > > > > }
> > > > >
> > > > > Before patch, isel converts both to .vcondeq:
> > > > > __v2di b;
> > > > > __v2di a;
> > > > > __v2di _8;
> > > > >
> > > > > <bb 2> [local count: 1073741824]:
> > > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > _8 = f (a_4, b_5); [tail call]
> > > > > return _8;
> > > > >
> > > > > and results in following code-gen:
> > > > > _Z3bazDv2_xS_:
> > > > > .LFB5666:
> > > > > pcmpeqq %xmm1, %xmm0
> > > > > pcmpeqd %xmm1, %xmm1
> > > > > movdqa %xmm0, %xmm2
> > > > > pandn %xmm1, %xmm2
> > > > > movdqa .LC0(%rip), %xmm1
> > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > movdqa %xmm2, %xmm0
> > > > > jmp _Z1fDv2_xS_
> > > > >
> > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > view_convert_expr and the other
> > > > > to vcondeq:
> > > > > __v2di b;
> > > > > __v2di a;
> > > > > vector(2) <signed-boolean:64> _1;
> > > > > __v2di _8;
> > > > >
> > > > > <bb 2> [local count: 1073741824]:
> > > > > _1 = x_2(D) != y_3(D);
> > > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > _8 = f (a_4, b_5); [tail call]
> > > > > return _8;
> > > > >
> > > > > which results in following code-gen:
> > > > > _Z3bazDv2_xS_:
> > > > > .LFB5666:
> > > > > pcmpeqq %xmm1, %xmm0
> > > > > pxor %xmm2, %xmm2
> > > > > movdqa .LC0(%rip), %xmm1
> > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > pcmpeqq %xmm0, %xmm2
> > > > > movdqa %xmm2, %xmm0
> > > > > jmp _Z1fDv2_xS_
> > > >
> > > > Ok, thanks for checking. I think the patch is OK but please let
> > > > Hongtao the chance to comment.
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao
[-- Attachment #2: pr97872-3.diff --]
[-- Type: application/octet-stream, Size: 1392 bytes --]
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index d79c212748f..1ab75e72c2e 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -184,6 +184,20 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
tree op0_type = TREE_TYPE (op0);
tree op0a_type = TREE_TYPE (op0a);
+
+ /* Try to fold x CMP y ? -1 : 0 to x CMP y. */
+
+ if (integer_minus_onep (op1)
+ && integer_zerop (op2)
+ && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))
+ && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
+ {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+ }
+
if (used_vec_cond_exprs >= 2
&& (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
!= CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/arm/pr97872.c b/gcc/testsuite/gcc.target/arm/pr97872.c
new file mode 100644
index 00000000000..eeb4dd9d6bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97872.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint8x8_t f1(int8x8_t a, int8x8_t b) {
+ return a < b;
+}
+
+/* { dg-final { scan-assembler-not "vbsl" } } */
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-08 9:06 ` Prathamesh Kulkarni
@ 2020-12-09 11:47 ` Prathamesh Kulkarni
2020-12-10 1:30 ` Hongtao Liu
2020-12-10 7:15 ` Richard Biener
0 siblings, 2 replies; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-09 11:47 UTC (permalink / raw)
To: Hongtao Liu; +Cc: Richard Biener, GCC Development, Liu, Hongtao, gcc Patches
On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > >
> > > > It seems to me that all you need to do is relax the predicates of op1
> > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > s_register_operand.
> > > >
> > > > (define_expand "vcond<mode><mode>"
> > > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > > (if_then_else:VDQW
> > > > (match_operator 3 "comparison_operator"
> > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > (match_operand:VDQW 1 "s_register_operand")
> > > > (match_operand:VDQW 2 "s_register_operand")))]
> > > > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > {
> > > > arm_expand_vcond (operands, <V_cmp_result>mode);
> > > > DONE;
> > > > })
> > > >
> > > > in sse.md it's defined as
> > > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > > > [(set (match_operand:V_512 0 "register_operand")
> > > > (if_then_else:V_512
> > > > (match_operator 3 ""
> > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > (match_operand:V_512 1 "general_operand")
> > > > (match_operand:V_512 2 "general_operand")))]
> > > > "TARGET_AVX512F
> > > > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > > > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > > {
> > > > bool ok = ix86_expand_int_vcond (operands);
> > > > gcc_assert (ok);
> > > > DONE;
> > > > })
> > > >
> > > > then we can get operands[1] and operands[2] as
> > > >
> > > > (gdb) p debug_rtx (operands[1])
> > > > (const_vector:V16QI [
> > > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > > ])
> > > > (gdb) p debug_rtx (operands[2])
> > > > (reg:V16QI 82 [ _2 ])
> > > > (const_vector:V16QI [
> > > > (const_int 0 [0]) repeated x16
> > > > ])
> > > Hi Hongtao,
> > > Thanks for the suggestions!
> > > However IIUC from vector extensions doc page, the result of vector
> > > comparison is defined to be 0
> > > or -1, so would it be better to canonicalize
> > > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > > adjust targets if required ?
> >
> > Yes, it would be more straightforward to handle it in gimple isel, I
> > would adjust the backend and testcase after you check in the patch.
> Thanks! I have committed the attached patch in
> 3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.
Hi,
I was looking at similar issue in PR97903 and wondering what will be the
right approach to lower (a & b) != 0 to vtst ?
For test-case:
#include <arm_neon.h>
int8x8_t f1(int8x8_t a, int8x8_t b) {
return (a & b) != 0;
}
input to ISEL:
int8x8_t _1;
vector(8) <signed-boolean:8> _2;
int8x8_t _5;
<bb 2> [local count: 1073741824]:
_1 = a_3(D) & b_4(D);
_2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
_5 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0,
0, 0, 0, 0, 0, 0 }>;
return _5;
and after the PR97872 fix, ISEL now generates:
int8x8_t _1;
vector(8) <signed-boolean:8> _2;
int8x8_t _5;
<bb 2> [local count: 1073741824]:
_1 = a_3(D) & b_4(D);
_2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
_5 = VIEW_CONVERT_EXPR<int8x8_t>(_2);
return _5;
while before patch it lowered to:
_1 = a_3(D) & b_4(D);
_5 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1,
-1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 114);
Should we fold
_1 = a & b
_2 = _1 != 0
to .VCOND (dest, -1, 0, _1, 0, AND_EXR) and as Hongtao suggested,
relax predicates for op1, op2 operands to match for -1, 0 respectively
in vcond and do specialized expansion there ?
In that case, I am not sure if folding x cmp y ? -1 : 0 to x cmp y
offers much benefit in the first place ?
On GIMPLE, the comparisons can be lowered to .VCOND, and the target
does specialized expansion depending on the operands.
Thanks,
Prathamesh
>
> Regards,
> Prathamesh
> >
> > > Alternatively, I could try fixing this in backend as you suggest above.
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > > _1 = a < b
> > > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > > with
> > > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > > >
> > > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > > > uint8x8_t _5;
> > > > > > > > > > > >
> > > > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > > > return _5;
> > > > > > > > > > > >
> > > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > > f1:
> > > > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > > > bx lr
> > > > > > > > > > > >
> > > > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > > > and result in:
> > > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > > >
> > > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > > >
> > > > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > > > +
> > > > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > > + {
> > > > > > > > > > >
> > > > > > > > > > > So this really belongs here:
> > > > > > > > > > >
> > > > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > >
> > > > > > > > > > > <---
> > > > > > > > > > >
> > > > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > {
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > > > +
> > > > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > + {
> > > > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > > > >
> > > > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > > > a valid cmp icode, that
> > > > > > > > > > should be sufficient ?
> > > > > > > > >
> > > > > > > > > Yes
> > > > > > > > Hi Richard,
> > > > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > > For the test-case:
> > > > > > > > __v2di
> > > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > > {
> > > > > > > > return x != y;
> > > > > > > > }
> > > > > > > >
> > > > > > > > Before patch:
> > > > > > > > baz:
> > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > > pandn %xmm1, %xmm0
> > > > > > > > ret
> > > > > > > >
> > > > > > > > After patch,
> > > > > > > > Before ISEL:
> > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > __v2di _4;
> > > > > > > >
> > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > > > return _4;
> > > > > > > >
> > > > > > > > After ISEL:
> > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > __v2di _4;
> > > > > > > >
> > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > > return _4;
> > > > > > > >
> > > > > > > > which results in:
> > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > pxor %xmm1, %xmm1
> > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > ret
> > seems better to be
> >
> > pcmpeqq %xmm1, %xmm0
> > pxor %xmm1, %xmm1
> > pxor %xmm1, %xmm0
> > ret
> >
> > Anyway, it needs backend adjustment.
> >
> > > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > > > >
> > > > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > > > pcmpeqq might have an issue.
> > > > > > >
> > > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > > variant.
> > > > > > >
> > > > > > > Did you check how
> > > > > > >
> > > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > > >
> > > > > > > is handled before/after your patch? That is, make the comparison
> > > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > > For the test-case:
> > > > > > __v2di f(__v2di, __v2di);
> > > > > >
> > > > > > __v2di
> > > > > > baz (const __v2di x, const __v2di y)
> > > > > > {
> > > > > > __v2di a = (x != y);
> > > > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > > > return f (a, b);
> > > > > > }
> > > > > >
> > > > > > Before patch, isel converts both to .vcondeq:
> > > > > > __v2di b;
> > > > > > __v2di a;
> > > > > > __v2di _8;
> > > > > >
> > > > > > <bb 2> [local count: 1073741824]:
> > > > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > return _8;
> > > > > >
> > > > > > and results in following code-gen:
> > > > > > _Z3bazDv2_xS_:
> > > > > > .LFB5666:
> > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > movdqa %xmm0, %xmm2
> > > > > > pandn %xmm1, %xmm2
> > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > movdqa %xmm2, %xmm0
> > > > > > jmp _Z1fDv2_xS_
> > > > > >
> > > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > > view_convert_expr and the other
> > > > > > to vcondeq:
> > > > > > __v2di b;
> > > > > > __v2di a;
> > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > __v2di _8;
> > > > > >
> > > > > > <bb 2> [local count: 1073741824]:
> > > > > > _1 = x_2(D) != y_3(D);
> > > > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > return _8;
> > > > > >
> > > > > > which results in following code-gen:
> > > > > > _Z3bazDv2_xS_:
> > > > > > .LFB5666:
> > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > pxor %xmm2, %xmm2
> > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > pcmpeqq %xmm0, %xmm2
> > > > > > movdqa %xmm2, %xmm0
> > > > > > jmp _Z1fDv2_xS_
> > > > >
> > > > > Ok, thanks for checking. I think the patch is OK but please let
> > > > > Hongtao the chance to comment.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > > > Richard.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-09 11:47 ` Prathamesh Kulkarni
@ 2020-12-10 1:30 ` Hongtao Liu
2020-12-10 7:15 ` Richard Biener
1 sibling, 0 replies; 18+ messages in thread
From: Hongtao Liu @ 2020-12-10 1:30 UTC (permalink / raw)
To: Prathamesh Kulkarni
Cc: Richard Biener, GCC Development, Liu, Hongtao, gcc Patches
It seems better with your PR97872 fix on i386.
Cat test.c
typedef char v16qi __attribute__ ((vector_size(16)));
v16qi f1(v16qi a, v16qi b) {
return (a & b) != 0;
}
before
f1(char __vector(16), char __vector(16)):
pand %xmm1, %xmm0
pxor %xmm1, %xmm1
pcmpeqb %xmm1, %xmm0
pcmpeqd %xmm1, %xmm1
pandn %xmm1, %xmm0
ret
After the pr97872 fix
f1(char __vector(16), char __vector(16)):
pand xmm0, xmm1
pxor xmm1, xmm1
pcmpeqb xmm0, xmm1
pcmpeqb xmm0, xmm1
ret
On Wed, Dec 9, 2020 at 7:47 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > >
> > > > > It seems to me that all you need to do is relax the predicates of op1
> > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > > s_register_operand.
> > > > >
> > > > > (define_expand "vcond<mode><mode>"
> > > > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > (if_then_else:VDQW
> > > > > (match_operator 3 "comparison_operator"
> > > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > > (match_operand:VDQW 1 "s_register_operand")
> > > > > (match_operand:VDQW 2 "s_register_operand")))]
> > > > > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > {
> > > > > arm_expand_vcond (operands, <V_cmp_result>mode);
> > > > > DONE;
> > > > > })
> > > > >
> > > > > in sse.md it's defined as
> > > > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > > > > [(set (match_operand:V_512 0 "register_operand")
> > > > > (if_then_else:V_512
> > > > > (match_operator 3 ""
> > > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > > (match_operand:V_512 1 "general_operand")
> > > > > (match_operand:V_512 2 "general_operand")))]
> > > > > "TARGET_AVX512F
> > > > > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > > > > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > > > {
> > > > > bool ok = ix86_expand_int_vcond (operands);
> > > > > gcc_assert (ok);
> > > > > DONE;
> > > > > })
> > > > >
> > > > > then we can get operands[1] and operands[2] as
> > > > >
> > > > > (gdb) p debug_rtx (operands[1])
> > > > > (const_vector:V16QI [
> > > > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > > > ])
> > > > > (gdb) p debug_rtx (operands[2])
> > > > > (reg:V16QI 82 [ _2 ])
> > > > > (const_vector:V16QI [
> > > > > (const_int 0 [0]) repeated x16
> > > > > ])
> > > > Hi Hongtao,
> > > > Thanks for the suggestions!
> > > > However IIUC from vector extensions doc page, the result of vector
> > > > comparison is defined to be 0
> > > > or -1, so would it be better to canonicalize
> > > > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > > > adjust targets if required ?
> > >
> > > Yes, it would be more straightforward to handle it in gimple isel, I
> > > would adjust the backend and testcase after you check in the patch.
> > Thanks! I have committed the attached patch in
> > 3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.
> Hi,
> I was looking at similar issue in PR97903 and wondering what will be the
> right approach to lower (a & b) != 0 to vtst ?
> For test-case:
> #include <arm_neon.h>
>
> int8x8_t f1(int8x8_t a, int8x8_t b) {
> return (a & b) != 0;
> }
>
> input to ISEL:
> int8x8_t _1;
> vector(8) <signed-boolean:8> _2;
> int8x8_t _5;
>
> <bb 2> [local count: 1073741824]:
> _1 = a_3(D) & b_4(D);
> _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> _5 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0,
> 0, 0, 0, 0, 0, 0 }>;
> return _5;
>
> and after the PR97872 fix, ISEL now generates:
> int8x8_t _1;
> vector(8) <signed-boolean:8> _2;
> int8x8_t _5;
>
> <bb 2> [local count: 1073741824]:
> _1 = a_3(D) & b_4(D);
> _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> _5 = VIEW_CONVERT_EXPR<int8x8_t>(_2);
> return _5;
>
> while before patch it lowered to:
> _1 = a_3(D) & b_4(D);
> _5 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1,
> -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 114);
>
> Should we fold
> _1 = a & b
> _2 = _1 != 0
> to .VCOND (dest, -1, 0, _1, 0, AND_EXR) and as Hongtao suggested,
> relax predicates for op1, op2 operands to match for -1, 0 respectively
> in vcond and do specialized expansion there ?
>
> In that case, I am not sure if folding x cmp y ? -1 : 0 to x cmp y
> offers much benefit in the first place ?
> On GIMPLE, the comparisons can be lowered to .VCOND, and the target
> does specialized expansion depending on the operands.
>
> Thanks,
> Prathamesh
> >
> > Regards,
> > Prathamesh
> > >
> > > > Alternatively, I could try fixing this in backend as you suggest above.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > > > _1 = a < b
> > > > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > > > with
> > > > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > > > >
> > > > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > > > > uint8x8_t _5;
> > > > > > > > > > > > >
> > > > > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > > > > return _5;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > > > f1:
> > > > > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > > > > bx lr
> > > > > > > > > > > > >
> > > > > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > > > > and result in:
> > > > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > > > >
> > > > > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > > > + {
> > > > > > > > > > > >
> > > > > > > > > > > > So this really belongs here:
> > > > > > > > > > > >
> > > > > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > >
> > > > > > > > > > > > <---
> > > > > > > > > > > >
> > > > > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > {
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > + {
> > > > > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > > > > >
> > > > > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > > > > a valid cmp icode, that
> > > > > > > > > > > should be sufficient ?
> > > > > > > > > >
> > > > > > > > > > Yes
> > > > > > > > > Hi Richard,
> > > > > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > > > For the test-case:
> > > > > > > > > __v2di
> > > > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > > > {
> > > > > > > > > return x != y;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Before patch:
> > > > > > > > > baz:
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > > > pandn %xmm1, %xmm0
> > > > > > > > > ret
> > > > > > > > >
> > > > > > > > > After patch,
> > > > > > > > > Before ISEL:
> > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > __v2di _4;
> > > > > > > > >
> > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > > > > return _4;
> > > > > > > > >
> > > > > > > > > After ISEL:
> > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > __v2di _4;
> > > > > > > > >
> > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > > > return _4;
> > > > > > > > >
> > > > > > > > > which results in:
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > pxor %xmm1, %xmm1
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > ret
> > > seems better to be
> > >
> > > pcmpeqq %xmm1, %xmm0
> > > pxor %xmm1, %xmm1
> > > pxor %xmm1, %xmm0
> > > ret
> > >
> > > Anyway, it needs backend adjustment.
> > >
> > > > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > > > > >
> > > > > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > > > > pcmpeqq might have an issue.
> > > > > > > >
> > > > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > > > variant.
> > > > > > > >
> > > > > > > > Did you check how
> > > > > > > >
> > > > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > > > >
> > > > > > > > is handled before/after your patch? That is, make the comparison
> > > > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > > > For the test-case:
> > > > > > > __v2di f(__v2di, __v2di);
> > > > > > >
> > > > > > > __v2di
> > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > {
> > > > > > > __v2di a = (x != y);
> > > > > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > > > > return f (a, b);
> > > > > > > }
> > > > > > >
> > > > > > > Before patch, isel converts both to .vcondeq:
> > > > > > > __v2di b;
> > > > > > > __v2di a;
> > > > > > > __v2di _8;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > return _8;
> > > > > > >
> > > > > > > and results in following code-gen:
> > > > > > > _Z3bazDv2_xS_:
> > > > > > > .LFB5666:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > movdqa %xmm0, %xmm2
> > > > > > > pandn %xmm1, %xmm2
> > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > movdqa %xmm2, %xmm0
> > > > > > > jmp _Z1fDv2_xS_
> > > > > > >
> > > > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > > > view_convert_expr and the other
> > > > > > > to vcondeq:
> > > > > > > __v2di b;
> > > > > > > __v2di a;
> > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > __v2di _8;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > return _8;
> > > > > > >
> > > > > > > which results in following code-gen:
> > > > > > > _Z3bazDv2_xS_:
> > > > > > > .LFB5666:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pxor %xmm2, %xmm2
> > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > pcmpeqq %xmm0, %xmm2
> > > > > > > movdqa %xmm2, %xmm0
> > > > > > > jmp _Z1fDv2_xS_
> > > > > >
> > > > > > Ok, thanks for checking. I think the patch is OK but please let
> > > > > > Hongtao the chance to comment.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-09 11:47 ` Prathamesh Kulkarni
2020-12-10 1:30 ` Hongtao Liu
@ 2020-12-10 7:15 ` Richard Biener
2020-12-10 13:00 ` Prathamesh Kulkarni
1 sibling, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-12-10 7:15 UTC (permalink / raw)
To: Prathamesh Kulkarni
Cc: Hongtao Liu, GCC Development, Liu, Hongtao, gcc Patches
On Wed, 9 Dec 2020, Prathamesh Kulkarni wrote:
> On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > >
> > > > > It seems to me that all you need to do is relax the predicates of op1
> > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > > s_register_operand.
> > > > >
> > > > > (define_expand "vcond<mode><mode>"
> > > > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > (if_then_else:VDQW
> > > > > (match_operator 3 "comparison_operator"
> > > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > > (match_operand:VDQW 1 "s_register_operand")
> > > > > (match_operand:VDQW 2 "s_register_operand")))]
> > > > > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > {
> > > > > arm_expand_vcond (operands, <V_cmp_result>mode);
> > > > > DONE;
> > > > > })
> > > > >
> > > > > in sse.md it's defined as
> > > > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > > > > [(set (match_operand:V_512 0 "register_operand")
> > > > > (if_then_else:V_512
> > > > > (match_operator 3 ""
> > > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > > (match_operand:V_512 1 "general_operand")
> > > > > (match_operand:V_512 2 "general_operand")))]
> > > > > "TARGET_AVX512F
> > > > > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > > > > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > > > {
> > > > > bool ok = ix86_expand_int_vcond (operands);
> > > > > gcc_assert (ok);
> > > > > DONE;
> > > > > })
> > > > >
> > > > > then we can get operands[1] and operands[2] as
> > > > >
> > > > > (gdb) p debug_rtx (operands[1])
> > > > > (const_vector:V16QI [
> > > > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > > > ])
> > > > > (gdb) p debug_rtx (operands[2])
> > > > > (reg:V16QI 82 [ _2 ])
> > > > > (const_vector:V16QI [
> > > > > (const_int 0 [0]) repeated x16
> > > > > ])
> > > > Hi Hongtao,
> > > > Thanks for the suggestions!
> > > > However IIUC from vector extensions doc page, the result of vector
> > > > comparison is defined to be 0
> > > > or -1, so would it be better to canonicalize
> > > > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > > > adjust targets if required ?
> > >
> > > Yes, it would be more straightforward to handle it in gimple isel, I
> > > would adjust the backend and testcase after you check in the patch.
> > Thanks! I have committed the attached patch in
> > 3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.
> Hi,
> I was looking at similar issue in PR97903 and wondering what will be the
> right approach to lower (a & b) != 0 to vtst ?
> For test-case:
> #include <arm_neon.h>
>
> int8x8_t f1(int8x8_t a, int8x8_t b) {
> return (a & b) != 0;
> }
>
> input to ISEL:
> int8x8_t _1;
> vector(8) <signed-boolean:8> _2;
> int8x8_t _5;
>
> <bb 2> [local count: 1073741824]:
> _1 = a_3(D) & b_4(D);
> _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> _5 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0,
> 0, 0, 0, 0, 0, 0 }>;
> return _5;
>
> and after the PR97872 fix, ISEL now generates:
> int8x8_t _1;
> vector(8) <signed-boolean:8> _2;
> int8x8_t _5;
>
> <bb 2> [local count: 1073741824]:
> _1 = a_3(D) & b_4(D);
> _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> _5 = VIEW_CONVERT_EXPR<int8x8_t>(_2);
> return _5;
>
> while before patch it lowered to:
> _1 = a_3(D) & b_4(D);
> _5 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1,
> -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 114);
>
> Should we fold
> _1 = a & b
> _2 = _1 != 0
> to .VCOND (dest, -1, 0, _1, 0, AND_EXR) and as Hongtao suggested,
> relax predicates for op1, op2 operands to match for -1, 0 respectively
> in vcond and do specialized expansion there ?
>
> In that case, I am not sure if folding x cmp y ? -1 : 0 to x cmp y
> offers much benefit in the first place ?
> On GIMPLE, the comparisons can be lowered to .VCOND, and the target
> does specialized expansion depending on the operands.
So vtst can do (a & b) != 0 directly? It looks like that's a job
for combine then and the feeding GIMPLE shouldn't matter too much
since independent of the choice above you'll get RTL to compute the
condition [mask] from the result of a vector AND.
Richard.
> Thanks,
> Prathamesh
> >
> > Regards,
> > Prathamesh
> > >
> > > > Alternatively, I could try fixing this in backend as you suggest above.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > > > _1 = a < b
> > > > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > > > with
> > > > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > > > >
> > > > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > > > > uint8x8_t _5;
> > > > > > > > > > > > >
> > > > > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > > > > return _5;
> > > > > > > > > > > > >
> > > > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > > > f1:
> > > > > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > > > > bx lr
> > > > > > > > > > > > >
> > > > > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > > > > and result in:
> > > > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > > > >
> > > > > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > > > + {
> > > > > > > > > > > >
> > > > > > > > > > > > So this really belongs here:
> > > > > > > > > > > >
> > > > > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > >
> > > > > > > > > > > > <---
> > > > > > > > > > > >
> > > > > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > {
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > + {
> > > > > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > > > > >
> > > > > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > > > > a valid cmp icode, that
> > > > > > > > > > > should be sufficient ?
> > > > > > > > > >
> > > > > > > > > > Yes
> > > > > > > > > Hi Richard,
> > > > > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > > > For the test-case:
> > > > > > > > > __v2di
> > > > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > > > {
> > > > > > > > > return x != y;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Before patch:
> > > > > > > > > baz:
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > > > pandn %xmm1, %xmm0
> > > > > > > > > ret
> > > > > > > > >
> > > > > > > > > After patch,
> > > > > > > > > Before ISEL:
> > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > __v2di _4;
> > > > > > > > >
> > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > > > > return _4;
> > > > > > > > >
> > > > > > > > > After ISEL:
> > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > __v2di _4;
> > > > > > > > >
> > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > > > return _4;
> > > > > > > > >
> > > > > > > > > which results in:
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > pxor %xmm1, %xmm1
> > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > ret
> > > seems better to be
> > >
> > > pcmpeqq %xmm1, %xmm0
> > > pxor %xmm1, %xmm1
> > > pxor %xmm1, %xmm0
> > > ret
> > >
> > > Anyway, it needs backend adjustment.
> > >
> > > > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > > > > >
> > > > > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > > > > pcmpeqq might have an issue.
> > > > > > > >
> > > > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > > > variant.
> > > > > > > >
> > > > > > > > Did you check how
> > > > > > > >
> > > > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > > > >
> > > > > > > > is handled before/after your patch? That is, make the comparison
> > > > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > > > For the test-case:
> > > > > > > __v2di f(__v2di, __v2di);
> > > > > > >
> > > > > > > __v2di
> > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > {
> > > > > > > __v2di a = (x != y);
> > > > > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > > > > return f (a, b);
> > > > > > > }
> > > > > > >
> > > > > > > Before patch, isel converts both to .vcondeq:
> > > > > > > __v2di b;
> > > > > > > __v2di a;
> > > > > > > __v2di _8;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > return _8;
> > > > > > >
> > > > > > > and results in following code-gen:
> > > > > > > _Z3bazDv2_xS_:
> > > > > > > .LFB5666:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > movdqa %xmm0, %xmm2
> > > > > > > pandn %xmm1, %xmm2
> > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > movdqa %xmm2, %xmm0
> > > > > > > jmp _Z1fDv2_xS_
> > > > > > >
> > > > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > > > view_convert_expr and the other
> > > > > > > to vcondeq:
> > > > > > > __v2di b;
> > > > > > > __v2di a;
> > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > __v2di _8;
> > > > > > >
> > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > return _8;
> > > > > > >
> > > > > > > which results in following code-gen:
> > > > > > > _Z3bazDv2_xS_:
> > > > > > > .LFB5666:
> > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > pxor %xmm2, %xmm2
> > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > pcmpeqq %xmm0, %xmm2
> > > > > > > movdqa %xmm2, %xmm0
> > > > > > > jmp _Z1fDv2_xS_
> > > > > >
> > > > > > Ok, thanks for checking. I think the patch is OK but please let
> > > > > > Hongtao the chance to comment.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with PR97872
2020-12-10 7:15 ` Richard Biener
@ 2020-12-10 13:00 ` Prathamesh Kulkarni
0 siblings, 0 replies; 18+ messages in thread
From: Prathamesh Kulkarni @ 2020-12-10 13:00 UTC (permalink / raw)
To: Richard Biener; +Cc: Hongtao Liu, GCC Development, Liu, Hongtao, gcc Patches
On Thu, 10 Dec 2020 at 17:11, Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 9 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we could do
> > > > > > > > > > > > > > > > specialized expansion for vcond in target when operands are -1 and 0.
> > > > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and 0 respectively.
> > > > > > > > > > > > > > > > However, I am not sure if there's a way to check if the register is
> > > > > > > > > > > > > > > > constant during expansion time (since we don't have df analysis yet) ?
> > > > > >
> > > > > > It seems to me that all you need to do is relax the predicates of op1
> > > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > > > s_register_operand.
> > > > > >
> > > > > > (define_expand "vcond<mode><mode>"
> > > > > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > > (if_then_else:VDQW
> > > > > > (match_operator 3 "comparison_operator"
> > > > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > > > (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > > > (match_operand:VDQW 1 "s_register_operand")
> > > > > > (match_operand:VDQW 2 "s_register_operand")))]
> > > > > > "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > > {
> > > > > > arm_expand_vcond (operands, <V_cmp_result>mode);
> > > > > > DONE;
> > > > > > })
> > > > > >
> > > > > > in sse.md it's defined as
> > > > > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > > > > > [(set (match_operand:V_512 0 "register_operand")
> > > > > > (if_then_else:V_512
> > > > > > (match_operator 3 ""
> > > > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > > > (match_operand:V_512 1 "general_operand")
> > > > > > (match_operand:V_512 2 "general_operand")))]
> > > > > > "TARGET_AVX512F
> > > > > > && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > > > > > == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > > > > {
> > > > > > bool ok = ix86_expand_int_vcond (operands);
> > > > > > gcc_assert (ok);
> > > > > > DONE;
> > > > > > })
> > > > > >
> > > > > > then we can get operands[1] and operands[2] as
> > > > > >
> > > > > > (gdb) p debug_rtx (operands[1])
> > > > > > (const_vector:V16QI [
> > > > > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > > > > ])
> > > > > > (gdb) p debug_rtx (operands[2])
> > > > > > (reg:V16QI 82 [ _2 ])
> > > > > > (const_vector:V16QI [
> > > > > > (const_int 0 [0]) repeated x16
> > > > > > ])
> > > > > Hi Hongtao,
> > > > > Thanks for the suggestions!
> > > > > However IIUC from vector extensions doc page, the result of vector
> > > > > comparison is defined to be 0
> > > > > or -1, so would it be better to canonicalize
> > > > > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > > > > adjust targets if required ?
> > > >
> > > > Yes, it would be more straightforward to handle it in gimple isel, I
> > > > would adjust the backend and testcase after you check in the patch.
> > > Thanks! I have committed the attached patch in
> > > 3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.
> > Hi,
> > I was looking at similar issue in PR97903 and wondering what will be the
> > right approach to lower (a & b) != 0 to vtst ?
> > For test-case:
> > #include <arm_neon.h>
> >
> > int8x8_t f1(int8x8_t a, int8x8_t b) {
> > return (a & b) != 0;
> > }
> >
> > input to ISEL:
> > int8x8_t _1;
> > vector(8) <signed-boolean:8> _2;
> > int8x8_t _5;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = a_3(D) & b_4(D);
> > _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> > _5 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0,
> > 0, 0, 0, 0, 0, 0 }>;
> > return _5;
> >
> > and after the PR97872 fix, ISEL now generates:
> > int8x8_t _1;
> > vector(8) <signed-boolean:8> _2;
> > int8x8_t _5;
> >
> > <bb 2> [local count: 1073741824]:
> > _1 = a_3(D) & b_4(D);
> > _2 = _1 != { 0, 0, 0, 0, 0, 0, 0, 0 };
> > _5 = VIEW_CONVERT_EXPR<int8x8_t>(_2);
> > return _5;
> >
> > while before patch it lowered to:
> > _1 = a_3(D) & b_4(D);
> > _5 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1,
> > -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 114);
> >
> > Should we fold
> > _1 = a & b
> > _2 = _1 != 0
> > to .VCOND (dest, -1, 0, _1, 0, AND_EXR) and as Hongtao suggested,
> > relax predicates for op1, op2 operands to match for -1, 0 respectively
> > in vcond and do specialized expansion there ?
> >
> > In that case, I am not sure if folding x cmp y ? -1 : 0 to x cmp y
> > offers much benefit in the first place ?
> > On GIMPLE, the comparisons can be lowered to .VCOND, and the target
> > does specialized expansion depending on the operands.
>
> So vtst can do (a & b) != 0 directly? It looks like that's a job
> for combine then and the feeding GIMPLE shouldn't matter too much
> since independent of the choice above you'll get RTL to compute the
> condition [mask] from the result of a vector AND.
OK, thanks for the suggestions. So let's keep the PR97872 fix, since
it also improves
code-gen for x86. I will try to fix this on RTL for arm to generate vtst.
Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Regards,
> > > Prathamesh
> > > >
> > > > > Alternatively, I could try fixing this in backend as you suggest above.
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Alternatively, should we add a target hook that returns true if the
> > > > > > > > > > > > > > > > result of vector comparison is set to all-ones or all-zeros, and then
> > > > > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn VEC_COND_EXPR into nop ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing a non-mask
> > > > > > > > > > > > > > > vector type? ISEL could special case the a ? -1 : 0 case this way.
> > > > > > > > > > > > > > I think the vec_cmp pattern matches but it produces a masked vector type.
> > > > > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > > > > _1 = a < b
> > > > > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > > > > > vector(8) <signed-boolean:8> _1;
> > > > > > > > > > > > > > vector(8) signed char _2;
> > > > > > > > > > > > > > uint8x8_t _5;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > > > > > _1 = a_3(D) < b_4(D);
> > > > > > > > > > > > > > _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > > > > > return _5;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > > > > f1:
> > > > > > > > > > > > > > vcgt.s8 d0, d1, d0
> > > > > > > > > > > > > > bx lr
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Altho I guess, we should remove the redundant conversions during isel itself ?
> > > > > > > > > > > > > > and result in:
> > > > > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > > > > >
> > > > > > > > > > > > > + /* For targets where result of comparison is all-ones or all-zeros,
> > > > > > > > > > > > > + a < b ? -1 : 0 can be reduced to a < b. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > > > > + {
> > > > > > > > > > > > >
> > > > > > > > > > > > > So this really belongs here:
> > > > > > > > > > > > >
> > > > > > > > > > > > > tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > > tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > > >
> > > > > > > > > > > > > <---
> > > > > > > > > > > > >
> > > > > > > > > > > > > if (used_vec_cond_exprs >= 2
> > > > > > > > > > > > > && (get_vcond_mask_icode (mode, TYPE_MODE (op0_type))
> > > > > > > > > > > > > != CODE_FOR_nothing)
> > > > > > > > > > > > > && expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > > {
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > > > > + tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > > > > + tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > > > > + tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > > > > + enum tree_code tcode = gimple_assign_rhs_code (def_stmt);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (expand_vec_cmp_expr_p (op0a_type, op0_type, tcode))
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
> > > > > > > > > > > > > + gassign *new_stmt = gimple_build_assign (lhs, conv_op);
> > > > > > > > > > > > > + gsi_replace (gsi, new_stmt, true);
> > > > > > > > > > > > >
> > > > > > > > > > > > > and you need to verify that the mode of the lhs and the mode of op0
> > > > > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > > > > Sorry, I am not sure how to check if target can actually expand vec_cmp ?
> > > > > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it gets
> > > > > > > > > > > > a valid cmp icode, that
> > > > > > > > > > > > should be sufficient ?
> > > > > > > > > > >
> > > > > > > > > > > Yes
> > > > > > > > > > Hi Richard,
> > > > > > > > > > I tested the patch, and it shows one regression for pr78102.c, because
> > > > > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > > > > For the test-case:
> > > > > > > > > > __v2di
> > > > > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > > > > {
> > > > > > > > > > return x != y;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > Before patch:
> > > > > > > > > > baz:
> > > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > > > > pandn %xmm1, %xmm0
> > > > > > > > > > ret
> > > > > > > > > >
> > > > > > > > > > After patch,
> > > > > > > > > > Before ISEL:
> > > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > > __v2di _4;
> > > > > > > > > >
> > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > > > > > return _4;
> > > > > > > > > >
> > > > > > > > > > After ISEL:
> > > > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > > > __v2di _4;
> > > > > > > > > >
> > > > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > > > > return _4;
> > > > > > > > > >
> > > > > > > > > > which results in:
> > > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > > pxor %xmm1, %xmm1
> > > > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > > > ret
> > > > seems better to be
> > > >
> > > > pcmpeqq %xmm1, %xmm0
> > > > pxor %xmm1, %xmm1
> > > > pxor %xmm1, %xmm0
> > > > ret
> > > >
> > > > Anyway, it needs backend adjustment.
> > > >
> > > > > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, and then
> > > > > > > > > > comparing the result against zero to invert it, so it looks correct ?
> > > > > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case ?
> > > > > > > > >
> > > > > > > > > In case pcmpeqq is double-issue the first variant might be faster while
> > > > > > > > > the second variant has the advantage of the "free" pxor, but back-to-back
> > > > > > > > > pcmpeqq might have an issue.
> > > > > > > > >
> > > > > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > > > > target business. I wouldn't be surprised if the x86 backend
> > > > > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > > > > variant.
> > > > > > > > >
> > > > > > > > > Did you check how
> > > > > > > > >
> > > > > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > > > > >
> > > > > > > > > is handled before/after your patch? That is, make the comparison
> > > > > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > > > > For the test-case:
> > > > > > > > __v2di f(__v2di, __v2di);
> > > > > > > >
> > > > > > > > __v2di
> > > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > > {
> > > > > > > > __v2di a = (x != y);
> > > > > > > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > > > > > return f (a, b);
> > > > > > > > }
> > > > > > > >
> > > > > > > > Before patch, isel converts both to .vcondeq:
> > > > > > > > __v2di b;
> > > > > > > > __v2di a;
> > > > > > > > __v2di _8;
> > > > > > > >
> > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > > return _8;
> > > > > > > >
> > > > > > > > and results in following code-gen:
> > > > > > > > _Z3bazDv2_xS_:
> > > > > > > > .LFB5666:
> > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > pcmpeqd %xmm1, %xmm1
> > > > > > > > movdqa %xmm0, %xmm2
> > > > > > > > pandn %xmm1, %xmm2
> > > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > > movdqa %xmm2, %xmm0
> > > > > > > > jmp _Z1fDv2_xS_
> > > > > > > >
> > > > > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > > > > view_convert_expr and the other
> > > > > > > > to vcondeq:
> > > > > > > > __v2di b;
> > > > > > > > __v2di a;
> > > > > > > > vector(2) <signed-boolean:64> _1;
> > > > > > > > __v2di _8;
> > > > > > > >
> > > > > > > > <bb 2> [local count: 1073741824]:
> > > > > > > > _1 = x_2(D) != y_3(D);
> > > > > > > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > > > > > _8 = f (a_4, b_5); [tail call]
> > > > > > > > return _8;
> > > > > > > >
> > > > > > > > which results in following code-gen:
> > > > > > > > _Z3bazDv2_xS_:
> > > > > > > > .LFB5666:
> > > > > > > > pcmpeqq %xmm1, %xmm0
> > > > > > > > pxor %xmm2, %xmm2
> > > > > > > > movdqa .LC0(%rip), %xmm1
> > > > > > > > pblendvb %xmm0, .LC1(%rip), %xmm1
> > > > > > > > pcmpeqq %xmm0, %xmm2
> > > > > > > > movdqa %xmm2, %xmm0
> > > > > > > > jmp _Z1fDv2_xS_
> > > > > > >
> > > > > > > Ok, thanks for checking. I think the patch is OK but please let
> > > > > > > Hongtao the chance to comment.
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > >
> > > > > > > > > > > > > Richard.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Hongtao
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-12-10 13:00 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01 10:59 Help with PR97872 Prathamesh Kulkarni
2020-12-01 11:09 ` Richard Biener
2020-12-03 9:06 ` Prathamesh Kulkarni
2020-12-03 11:05 ` Richard Biener
2020-12-04 9:22 ` Prathamesh Kulkarni
2020-12-04 11:48 ` Richard Biener
2020-12-07 5:53 ` Prathamesh Kulkarni
2020-12-07 7:31 ` Richard Biener
2020-12-07 9:33 ` Prathamesh Kulkarni
2020-12-07 9:45 ` Richard Biener
2020-12-07 10:47 ` Hongtao Liu
2020-12-07 11:10 ` Prathamesh Kulkarni
2020-12-07 12:09 ` Hongtao Liu
2020-12-08 9:06 ` Prathamesh Kulkarni
2020-12-09 11:47 ` Prathamesh Kulkarni
2020-12-10 1:30 ` Hongtao Liu
2020-12-10 7:15 ` Richard Biener
2020-12-10 13:00 ` Prathamesh Kulkarni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).