public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
@ 2022-09-26  6:56 Liwei Xu
  2022-09-26  8:21 ` Richard Biener
  2022-10-12 13:51 ` Xi Ruoyao
  0 siblings, 2 replies; 13+ messages in thread
From: Liwei Xu @ 2022-09-26  6:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, wilson, admin

        This patch implemented the optimization in PR 54346, which Merges

	c = VEC_PERM_EXPR <a, b, VCST0>;
        d = VEC_PERM_EXPR <c, c, VCST1>;
                to
        d = VEC_PERM_EXPR <a, b, NEW_VCST>;

	Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
        is ok to removed it.

gcc/ChangeLog:

	PR target/54346
	* match.pd: Merge the index of VCST then generates the new vec_perm.

gcc/testsuite/ChangeLog:

	PR target/54346
	* gcc.dg/pr54346.c: New test.

Co-authored-by: liuhongt <hongtao.liu@intel.com>
---
 gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
 2 files changed, 54 insertions(+)
 create mode 100755 gcc/testsuite/gcc.dg/pr54346.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 345bcb701a5..9219b0a10e1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8086,6 +8086,47 @@ and,
   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
 
 
+/* (PR54346) Merge 
+   c = VEC_PERM_EXPR <a, b, VCST0>; 
+   d = VEC_PERM_EXPR <c, c, VCST1>;
+   to
+   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
+   
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (with
+  {
+    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
+      return NULL_TREE;
+
+    tree op0;
+    machine_mode result_mode = TYPE_MODE (type);
+    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
+    vec_perm_builder builder0;
+    vec_perm_builder builder1;
+    vec_perm_builder builder2 (nelts, nelts, 1);
+
+    if (!tree_to_vec_perm_builder (&builder0, @3) 
+    || !tree_to_vec_perm_builder (&builder1, @4))
+      return NULL_TREE;
+
+    vec_perm_indices sel0 (builder0, 2, nelts);
+    vec_perm_indices sel1 (builder1, 1, nelts);
+   
+    for (int i = 0; i < nelts; i++)
+      builder2.quick_push (sel0[sel1[i].to_constant()]);
+
+    vec_perm_indices sel2 (builder2, 2, nelts);
+
+    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
+      return NULL_TREE;
+
+    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+  }
+  (vec_perm @1 @2 { op0; })))
+
+
 /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
    constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
new file mode 100755
index 00000000000..d87dc3a79a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54346.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
+
+void fun (veci a, veci b, veci *i)
+{
+  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
+  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
\ No newline at end of file
-- 
2.18.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-09-26  6:56 [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346] Liwei Xu
@ 2022-09-26  8:21 ` Richard Biener
  2022-09-30  9:17   ` Richard Sandiford
  2022-10-12 13:51 ` Xi Ruoyao
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Biener @ 2022-09-26  8:21 UTC (permalink / raw)
  To: Liwei Xu, Richard Sandiford; +Cc: gcc-patches, wilson, admin

On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>
>         This patch implemented the optimization in PR 54346, which Merges
>
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

Looks good, but leave Richard a chance to ask for VLA vector support which
might be trivial to do.

Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
seem to handle VLA vectors.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/54346
>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/54346
>         * gcc.dg/pr54346.c: New test.
>
> Co-authored-by: liuhongt <hongtao.liu@intel.com>
> ---
>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>  2 files changed, 54 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 345bcb701a5..9219b0a10e1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8086,6 +8086,47 @@ and,
>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>
>
> +/* (PR54346) Merge
> +   c = VEC_PERM_EXPR <a, b, VCST0>;
> +   d = VEC_PERM_EXPR <c, c, VCST1>;
> +   to
> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
> +
> +(simplify
> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> + (with
> +  {
> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
> +      return NULL_TREE;
> +
> +    tree op0;
> +    machine_mode result_mode = TYPE_MODE (type);
> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
> +    vec_perm_builder builder0;
> +    vec_perm_builder builder1;
> +    vec_perm_builder builder2 (nelts, nelts, 1);
> +
> +    if (!tree_to_vec_perm_builder (&builder0, @3)
> +    || !tree_to_vec_perm_builder (&builder1, @4))
> +      return NULL_TREE;
> +
> +    vec_perm_indices sel0 (builder0, 2, nelts);
> +    vec_perm_indices sel1 (builder1, 1, nelts);
> +
> +    for (int i = 0; i < nelts; i++)
> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
> +
> +    vec_perm_indices sel2 (builder2, 2, nelts);
> +
> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
> +      return NULL_TREE;
> +
> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> +  }
> +  (vec_perm @1 @2 { op0; })))
> +
> +
>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>     constant which when multiplied by a power of 2 contains a unique value
> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
> new file mode 100755
> index 00000000000..d87dc3a79a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr54346.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-dse1" } */
> +
> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
> +
> +void fun (veci a, veci b, veci *i)
> +{
> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
> \ No newline at end of file
> --
> 2.18.2
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-09-26  8:21 ` Richard Biener
@ 2022-09-30  9:17   ` Richard Sandiford
  2022-09-30  9:36     ` Xu, Liwei
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2022-09-30  9:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Liwei Xu, gcc-patches, wilson, admin

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>
>>         This patch implemented the optimization in PR 54346, which Merges
>>
>>         c = VEC_PERM_EXPR <a, b, VCST0>;
>>         d = VEC_PERM_EXPR <c, c, VCST1>;
>>                 to
>>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>
>>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>         is ok to removed it.
>
> Looks good, but leave Richard a chance to ask for VLA vector support which
> might be trivial to do.

Sorry for the slow reply.  It might be tricky to handle the general case,
so I'd suggest going with this for now and dealing with VLA as a follow-up.
(Probably after Prathamesh's changes to fold_vec_perm_expr.)

Thanks,
Richard

> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
> seem to handle VLA vectors.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>         PR target/54346
>>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         PR target/54346
>>         * gcc.dg/pr54346.c: New test.
>>
>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>> ---
>>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>  2 files changed, 54 insertions(+)
>>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 345bcb701a5..9219b0a10e1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8086,6 +8086,47 @@ and,
>>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>
>>
>> +/* (PR54346) Merge
>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>> +   to
>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>> +
>> +(simplify
>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>> + (with
>> +  {
>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>> +      return NULL_TREE;
>> +
>> +    tree op0;
>> +    machine_mode result_mode = TYPE_MODE (type);
>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>> +    vec_perm_builder builder0;
>> +    vec_perm_builder builder1;
>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>> +
>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>> +      return NULL_TREE;
>> +
>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>> +
>> +    for (int i = 0; i < nelts; i++)
>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>> +
>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>> +
>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>> +      return NULL_TREE;
>> +
>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>> +  }
>> +  (vec_perm @1 @2 { op0; })))
>> +
>> +
>>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>     constant which when multiplied by a power of 2 contains a unique value
>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>> new file mode 100755
>> index 00000000000..d87dc3a79a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O -fdump-tree-dse1" } */
>> +
>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>> +
>> +void fun (veci a, veci b, veci *i)
>> +{
>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>> \ No newline at end of file
>> --
>> 2.18.2
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-09-30  9:17   ` Richard Sandiford
@ 2022-09-30  9:36     ` Xu, Liwei
  0 siblings, 0 replies; 13+ messages in thread
From: Xu, Liwei @ 2022-09-30  9:36 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, gcc-patches, wilson, admin

Hi Richard 

Sure, I’ll also check the VLA case once I’m back from vacation, thank you for your help.

Best Regards
Liwei

> On 30 Sep 2022, at 5:17 pm, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>> 
>>>        This patch implemented the optimization in PR 54346, which Merges
>>> 
>>>        c = VEC_PERM_EXPR <a, b, VCST0>;
>>>        d = VEC_PERM_EXPR <c, c, VCST1>;
>>>                to
>>>        d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>> 
>>>        Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>>        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>>        is ok to removed it.
>> 
>> Looks good, but leave Richard a chance to ask for VLA vector support which
>> might be trivial to do.
> 
> Sorry for the slow reply.  It might be tricky to handle the general case,
> so I'd suggest going with this for now and dealing with VLA as a follow-up.
> (Probably after Prathamesh's changes to fold_vec_perm_expr.)
> 
> Thanks,
> Richard
> 
>> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
>> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
>> seem to handle VLA vectors.
>> 
>> Thanks,
>> Richard.
>> 
>>> gcc/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * match.pd: Merge the index of VCST then generates the new vec_perm.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * gcc.dg/pr54346.c: New test.
>>> 
>>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>>> ---
>>> gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>> 2 files changed, 54 insertions(+)
>>> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>> 
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 345bcb701a5..9219b0a10e1 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -8086,6 +8086,47 @@ and,
>>>   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>> 
>>> 
>>> +/* (PR54346) Merge
>>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>>> +   to
>>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>>> +
>>> +(simplify
>>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>>> + (with
>>> +  {
>>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>>> +      return NULL_TREE;
>>> +
>>> +    tree op0;
>>> +    machine_mode result_mode = TYPE_MODE (type);
>>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>>> +    vec_perm_builder builder0;
>>> +    vec_perm_builder builder1;
>>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>>> +
>>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>>> +      return NULL_TREE;
>>> +
>>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>>> +
>>> +    for (int i = 0; i < nelts; i++)
>>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>>> +
>>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>>> +
>>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>>> +      return NULL_TREE;
>>> +
>>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>>> +  }
>>> +  (vec_perm @1 @2 { op0; })))
>>> +
>>> +
>>> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>>    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>>    constant which when multiplied by a power of 2 contains a unique value
>>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>>> new file mode 100755
>>> index 00000000000..d87dc3a79a5
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O -fdump-tree-dse1" } */
>>> +
>>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>>> +
>>> +void fun (veci a, veci b, veci *i)
>>> +{
>>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>>> \ No newline at end of file
>>> --
>>> 2.18.2
>>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-09-26  6:56 [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346] Liwei Xu
  2022-09-26  8:21 ` Richard Biener
@ 2022-10-12 13:51 ` Xi Ruoyao
  2022-10-13  6:15   ` Levy
  1 sibling, 1 reply; 13+ messages in thread
From: Xi Ruoyao @ 2022-10-12 13:51 UTC (permalink / raw)
  To: Liwei Xu, gcc-patches; +Cc: wilson, admin, Lulu Cheng

On Mon, 2022-09-26 at 14:56 +0800, Liwei Xu via Gcc-patches wrote:
>         This patch implemented the optimization in PR 54346, which Merges
> 
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
> 
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

I'm getting:

FAIL: gcc.dg/pr54346.c scan-tree-dump dse1 "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }"
FAIL: gcc.dg/pr54346.c scan-tree-dump-times dse1 "VEC_PERM_EXPR" 1

on loongarch64-linux-gnu.  Not sure why.


-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-12 13:51 ` Xi Ruoyao
@ 2022-10-13  6:15   ` Levy
  2022-10-13  6:44     ` Xi Ruoyao
  0 siblings, 1 reply; 13+ messages in thread
From: Levy @ 2022-10-13  6:15 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Liwei Xu, gcc-patches, wilson, Lulu Cheng

Hi RuoYao

It’s probably because loongarch64 doesn’t support 
can_vec_perm_const_p(result_mode, op_mode, sel2, false)

I’m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c?

Best Regards
Levy

> On 12 Oct 2022, at 9:51 pm, Xi Ruoyao <xry111@xry111.site> wrote:
> 
> pr54346.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-13  6:15   ` Levy
@ 2022-10-13  6:44     ` Xi Ruoyao
  2022-10-13  8:15       ` Lulu Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Xi Ruoyao @ 2022-10-13  6:44 UTC (permalink / raw)
  To: Levy; +Cc: Liwei Xu, gcc-patches, wilson, Lulu Cheng

On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> Hi RuoYao
> 
> It’s probably because loongarch64 doesn’t support 
> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> 
> I’m not sure whether if loongarch will support it or should I just
> limit the test target for pr54346.c?

I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
actually support vector.  (LoongArch has SIMD instructions but the
support in GCC won't be added in a very recent future.)

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-13  6:44     ` Xi Ruoyao
@ 2022-10-13  8:15       ` Lulu Cheng
  2022-10-13 11:10         ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Lulu Cheng @ 2022-10-13  8:15 UTC (permalink / raw)
  To: Xi Ruoyao, Levy; +Cc: Liwei Xu, gcc-patches, wilson


在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>> Hi RuoYao
>>
>> It’s probably because loongarch64 doesn’t support
>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>
>> I’m not sure whether if loongarch will support it or should I just
>> limit the test target for pr54346.c?
> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> actually support vector.  (LoongArch has SIMD instructions but the
> support in GCC won't be added in a very recent future.)
>
If what I understand is correct, I think this might be a better solution.

  /* { dg-do compile } */

+/* { dg-require-effective-target vect_perm } */
  /* { dg-options "-O -fdump-tree-dse1" } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-13  8:15       ` Lulu Cheng
@ 2022-10-13 11:10         ` Richard Biener
  2022-10-14  1:09           ` Xu, Liwei
  2022-10-14  1:49           ` Lulu Cheng
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Biener @ 2022-10-13 11:10 UTC (permalink / raw)
  To: Lulu Cheng; +Cc: Xi Ruoyao, Levy, wilson, gcc-patches, Liwei Xu

On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>
>
> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >> Hi RuoYao
> >>
> >> It’s probably because loongarch64 doesn’t support
> >> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>
> >> I’m not sure whether if loongarch will support it or should I just
> >> limit the test target for pr54346.c?
> > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> > actually support vector.  (LoongArch has SIMD instructions but the
> > support in GCC won't be added in a very recent future.)
> >
> If what I understand is correct, I think this might be a better solution.
>
>   /* { dg-do compile } */
>
> +/* { dg-require-effective-target vect_perm } */
>   /* { dg-options "-O -fdump-tree-dse1" } */

Btw, what forwprop does is check whether any of the original permutations are
not supported and then elide the supportability check for the result.
The reasoning
is that the original permute(s) would be lowered during vectlower so we can as
well do that for the result.  We should just never turn a supported permutation
sequence into a not supported one.

Richard.

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-13 11:10         ` Richard Biener
@ 2022-10-14  1:09           ` Xu, Liwei
  2022-10-14  1:49           ` Lulu Cheng
  1 sibling, 0 replies; 13+ messages in thread
From: Xu, Liwei @ 2022-10-14  1:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Lulu Cheng, Xi Ruoyao, Levy, wilson, gcc-patches

Hi Richard
 
Thank your for your detailed explanation, I’ll patch the test case with suggestions form LuLu.

Best
Levy

> On 13 Oct 2022, at 7:12 pm, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>> 
>> 
>>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>> 
>>>> It’s probably because loongarch64 doesn’t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>> 
>>>> I’m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>> 
>> If what I understand is correct, I think this might be a better solution.
>> 
>>  /* { dg-do compile } */
>> 
>> +/* { dg-require-effective-target vect_perm } */
>>  /* { dg-options "-O -fdump-tree-dse1" } */
> 
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
> 
> Richard.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-13 11:10         ` Richard Biener
  2022-10-14  1:09           ` Xu, Liwei
@ 2022-10-14  1:49           ` Lulu Cheng
  2022-10-14  6:18             ` Richard Biener
  1 sibling, 1 reply; 13+ messages in thread
From: Lulu Cheng @ 2022-10-14  1:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Xi Ruoyao, Levy, wilson, gcc-patches, Liwei Xu


在 2022/10/13 下午7:10, Richard Biener 写道:
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>>
>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>>
>>>> It’s probably because loongarch64 doesn’t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>>
>>>> I’m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>>
>> If what I understand is correct, I think this might be a better solution.
>>
>>    /* { dg-do compile } */
>>
>> +/* { dg-require-effective-target vect_perm } */
>>    /* { dg-options "-O -fdump-tree-dse1" } */
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
>
> Richard.
>
Hi Richard:

I'm very sorry. I don't fully understand what you mean.

Could you give me some more details?


Thanks!

Lulu Cheng



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-14  1:49           ` Lulu Cheng
@ 2022-10-14  6:18             ` Richard Biener
  2022-10-14  6:34               ` Xu, Liwei
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2022-10-14  6:18 UTC (permalink / raw)
  To: Lulu Cheng; +Cc: Xi Ruoyao, Levy, wilson, gcc-patches, Liwei Xu

On Fri, Oct 14, 2022 at 3:49 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>
>
> 在 2022/10/13 下午7:10, Richard Biener 写道:
> > On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
> >>
> >> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> >>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >>>> Hi RuoYao
> >>>>
> >>>> It’s probably because loongarch64 doesn’t support
> >>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>>>
> >>>> I’m not sure whether if loongarch will support it or should I just
> >>>> limit the test target for pr54346.c?
> >>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> >>> actually support vector.  (LoongArch has SIMD instructions but the
> >>> support in GCC won't be added in a very recent future.)
> >>>
> >> If what I understand is correct, I think this might be a better solution.
> >>
> >>    /* { dg-do compile } */
> >>
> >> +/* { dg-require-effective-target vect_perm } */
> >>    /* { dg-options "-O -fdump-tree-dse1" } */
> > Btw, what forwprop does is check whether any of the original permutations are
> > not supported and then elide the supportability check for the result.
> > The reasoning
> > is that the original permute(s) would be lowered during vectlower so we can as
> > well do that for the result.  We should just never turn a supported permutation
> > sequence into a not supported one.
> >
> > Richard.
> >
> Hi Richard:
>
> I'm very sorry. I don't fully understand what you mean.
>
> Could you give me some more details?

The argument is that if Loongarch64 doesn't support the VEC_PERM_EXPR in the IL
before the transform then it doesn't matter if the transform introduces another
VEC_PERM_EXPR that isn't supported.  Both are later (in the veclower pass)
lowered to scalar operations.

So instead of

  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false))
   ..

you'd do

  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
      || !can_vec_perm_const_p ( ... original vec_perm1 ...)
      || !can_vec_perm_const_p ( ... original vec_perm2 ...))

Richard.

>
> Thanks!
>
> Lulu Cheng
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
  2022-10-14  6:18             ` Richard Biener
@ 2022-10-14  6:34               ` Xu, Liwei
  0 siblings, 0 replies; 13+ messages in thread
From: Xu, Liwei @ 2022-10-14  6:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Lulu Cheng, Xi Ruoyao, Levy, wilson, gcc-patches

Hi Richard

I just found that modifying match.pd may be a better way since for forwprop-19.c, VEC_PERM_EXPR exists in all gimple until 235t.optimize with current trunk code, that leave us no pass to ‘scan-tree-dump-not’.

Best Regards
Levy

> On 14 Oct 2022, at 2:19 pm, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Fri, Oct 14, 2022 at 3:49 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>> 
>> 
>>> 在 2022/10/13 下午7:10, Richard Biener 写道:
>>> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>>>> 
>>>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>>>> Hi RuoYao
>>>>>> 
>>>>>> It’s probably because loongarch64 doesn’t support
>>>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>>>> 
>>>>>> I’m not sure whether if loongarch will support it or should I just
>>>>>> limit the test target for pr54346.c?
>>>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>>>> actually support vector.  (LoongArch has SIMD instructions but the
>>>>> support in GCC won't be added in a very recent future.)
>>>>> 
>>>> If what I understand is correct, I think this might be a better solution.
>>>> 
>>>>   /* { dg-do compile } */
>>>> 
>>>> +/* { dg-require-effective-target vect_perm } */
>>>>   /* { dg-options "-O -fdump-tree-dse1" } */
>>> Btw, what forwprop does is check whether any of the original permutations are
>>> not supported and then elide the supportability check for the result.
>>> The reasoning
>>> is that the original permute(s) would be lowered during vectlower so we can as
>>> well do that for the result.  We should just never turn a supported permutation
>>> sequence into a not supported one.
>>> 
>>> Richard.
>>> 
>> Hi Richard:
>> 
>> I'm very sorry. I don't fully understand what you mean.
>> 
>> Could you give me some more details?
> 
> The argument is that if Loongarch64 doesn't support the VEC_PERM_EXPR in the IL
> before the transform then it doesn't matter if the transform introduces another
> VEC_PERM_EXPR that isn't supported.  Both are later (in the veclower pass)
> lowered to scalar operations.
> 
> So instead of
> 
>  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>   ..
> 
> you'd do
> 
>  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
>      || !can_vec_perm_const_p ( ... original vec_perm1 ...)
>      || !can_vec_perm_const_p ( ... original vec_perm2 ...))
> 
> Richard.
> 
>> 
>> Thanks!
>> 
>> Lulu Cheng
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-14  6:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-26  6:56 [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346] Liwei Xu
2022-09-26  8:21 ` Richard Biener
2022-09-30  9:17   ` Richard Sandiford
2022-09-30  9:36     ` Xu, Liwei
2022-10-12 13:51 ` Xi Ruoyao
2022-10-13  6:15   ` Levy
2022-10-13  6:44     ` Xi Ruoyao
2022-10-13  8:15       ` Lulu Cheng
2022-10-13 11:10         ` Richard Biener
2022-10-14  1:09           ` Xu, Liwei
2022-10-14  1:49           ` Lulu Cheng
2022-10-14  6:18             ` Richard Biener
2022-10-14  6:34               ` Xu, Liwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).