* [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
@ 2023-05-17 6:09 Kewen.Lin
2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
0 siblings, 2 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17 6:09 UTC (permalink / raw)
To: GCC Patches
Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner
Hi,
This patch is to refactor the handlings for the case (index
== count) in a loop of vect_transform_slp_perm_load_1, in
order to prepare a subsequent adjustment on *nperm. This
patch doesn't have any functional changes.
Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
BR,
Kewen
-----
gcc/ChangeLog:
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
handling on the case index == count.
---
gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
1 file changed, 44 insertions(+), 45 deletions(-)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3b7a21724ec..e5c9d7e766e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
noop_p = false;
mask[index++] = mask_element;
- if (index == count && !noop_p)
+ if (index == count)
{
- indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
- if (!can_vec_perm_const_p (mode, mode, indices))
+ if (!noop_p)
{
- if (dump_p)
+ indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
+ if (!can_vec_perm_const_p (mode, mode, indices))
{
- dump_printf_loc (MSG_MISSED_OPTIMIZATION,
- vect_location,
- "unsupported vect permute { ");
- for (i = 0; i < count; ++i)
+ if (dump_p)
{
- dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
- dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "unsupported vect permute { ");
+ for (i = 0; i < count; ++i)
+ {
+ dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
+ dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+ }
+ dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
}
- dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
+ gcc_assert (analyze_only);
+ return false;
}
- gcc_assert (analyze_only);
- return false;
- }
- ++*n_perms;
- }
+ ++*n_perms;
- if (index == count)
- {
- if (!analyze_only)
- {
- tree mask_vec = NULL_TREE;
-
- if (! noop_p)
- mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+ if (!analyze_only)
+ {
+ tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
- if (second_vec_index == -1)
- second_vec_index = first_vec_index;
+ if (second_vec_index == -1)
+ second_vec_index = first_vec_index;
- for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
- {
- /* Generate the permute statement if necessary. */
- tree first_vec = dr_chain[first_vec_index + ri];
- tree second_vec = dr_chain[second_vec_index + ri];
- gimple *perm_stmt;
- if (! noop_p)
+ for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
{
- gassign *stmt = as_a <gassign *> (stmt_info->stmt);
+ /* Generate the permute statement if necessary. */
+ tree first_vec = dr_chain[first_vec_index + ri];
+ tree second_vec = dr_chain[second_vec_index + ri];
+ gassign *stmt = as_a<gassign *> (stmt_info->stmt);
tree perm_dest
= vect_create_destination_var (gimple_assign_lhs (stmt),
vectype);
perm_dest = make_ssa_name (perm_dest);
- perm_stmt
+ gimple *perm_stmt
= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
- first_vec, second_vec,
- mask_vec);
+ first_vec, second_vec, mask_vec);
vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
gsi);
if (dce_chain)
@@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
bitmap_set_bit (used_defs, first_vec_index + ri);
bitmap_set_bit (used_defs, second_vec_index + ri);
}
+
+ /* Store the vector statement in NODE. */
+ SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
+ = perm_stmt;
}
- else
- {
- /* If mask was NULL_TREE generate the requested
- identity transform. */
- perm_stmt = SSA_NAME_DEF_STMT (first_vec);
- if (dce_chain)
- bitmap_set_bit (used_defs, first_vec_index + ri);
- }
+ }
+ }
+ else if (!analyze_only)
+ {
+ for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+ {
+ tree first_vec = dr_chain[first_vec_index + ri];
+ /* If mask was NULL_TREE generate the requested
+ identity transform. */
+ gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
+ if (dce_chain)
+ bitmap_set_bit (used_defs, first_vec_index + ri);
/* Store the vector statement in NODE. */
SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
--
2.39.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
@ 2023-05-17 6:15 ` Kewen.Lin
2023-05-22 13:44 ` Richard Biener
2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
1 sibling, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17 6:15 UTC (permalink / raw)
To: GCC Patches
Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner
Hi,
Following Richi's suggestion in [1], I'm working on deferring
cost evaluation next to the transformation, this patch is
to enhance function vect_transform_slp_perm_load_1 which
could under-cost for vector permutation, since the costing
doesn't try to consider nvectors_per_build, it's inconsistent
with the transformation part.
Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
Is it ok for trunk?
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
BR,
Kewen
-----
gcc/ChangeLog:
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
calculation on n_perms by considering nvectors_per_build.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
---
.../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++
gcc/tree-vect-slp.cc | 66 ++++++++++---------
2 files changed, 57 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
new file mode 100644
index 00000000000..e5c4dceddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* Specify power9 to ensure the vectorization is profitable
+ and test point stands, otherwise it could be not profitable
+ to vectorize. */
+/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
+
+/* Verify we cost the exact count for required vec_perm. */
+
+int x[1024], y[1024];
+
+void
+foo ()
+{
+ for (int i = 0; i < 512; ++i)
+ {
+ x[2 * i] = y[1023 - (2 * i)];
+ x[2 * i + 1] = y[1023 - (2 * i + 1)];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e5c9d7e766e..af9a6dd4fa9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
mode = TYPE_MODE (vectype);
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
/* Initialize the vect stmts of NODE to properly insert the generated
stmts later. */
if (! analyze_only)
- for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
- i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
+ for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
SLP_TREE_VEC_STMTS (node).quick_push (NULL);
/* Generate permutation masks for every NODE. Number of masks for each NODE
@@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
(b) the permutes only need a single vector input. */
mask.new_vector (nunits, group_size, 3);
nelts_to_build = mask.encoded_nelts ();
- nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
+ /* It's possible to obtain zero nstmts during analyze_only, so make
+ it at least one to ensure the later computation for n_perms
+ proceed. */
+ nvectors_per_build = nstmts > 0 ? nstmts : 1;
in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
}
else
@@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
return false;
}
- ++*n_perms;
-
+ tree mask_vec = NULL_TREE;
if (!analyze_only)
- {
- tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+ mask_vec = vect_gen_perm_mask_checked (vectype, indices);
- if (second_vec_index == -1)
- second_vec_index = first_vec_index;
+ if (second_vec_index == -1)
+ second_vec_index = first_vec_index;
- for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+ for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+ {
+ ++*n_perms;
+ if (analyze_only)
+ continue;
+ /* Generate the permute statement if necessary. */
+ tree first_vec = dr_chain[first_vec_index + ri];
+ tree second_vec = dr_chain[second_vec_index + ri];
+ gassign *stmt = as_a<gassign *> (stmt_info->stmt);
+ tree perm_dest
+ = vect_create_destination_var (gimple_assign_lhs (stmt),
+ vectype);
+ perm_dest = make_ssa_name (perm_dest);
+ gimple *perm_stmt
+ = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
+ second_vec, mask_vec);
+ vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+ gsi);
+ if (dce_chain)
{
- /* Generate the permute statement if necessary. */
- tree first_vec = dr_chain[first_vec_index + ri];
- tree second_vec = dr_chain[second_vec_index + ri];
- gassign *stmt = as_a<gassign *> (stmt_info->stmt);
- tree perm_dest
- = vect_create_destination_var (gimple_assign_lhs (stmt),
- vectype);
- perm_dest = make_ssa_name (perm_dest);
- gimple *perm_stmt
- = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
- first_vec, second_vec, mask_vec);
- vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
- gsi);
- if (dce_chain)
- {
- bitmap_set_bit (used_defs, first_vec_index + ri);
- bitmap_set_bit (used_defs, second_vec_index + ri);
- }
-
- /* Store the vector statement in NODE. */
- SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
- = perm_stmt;
+ bitmap_set_bit (used_defs, first_vec_index + ri);
+ bitmap_set_bit (used_defs, second_vec_index + ri);
}
+
+ /* Store the vector statement in NODE. */
+ SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
}
}
else if (!analyze_only)
--
2.39.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
@ 2023-05-17 6:34 ` Richard Biener
2023-05-17 7:18 ` Kewen.Lin
1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-17 6:34 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> This patch is to refactor the handlings for the case (index
> == count) in a loop of vect_transform_slp_perm_load_1, in
> order to prepare a subsequent adjustment on *nperm. This
> patch doesn't have any functional changes.
The diff is impossible to be reviewed - can you explain the
refactoring you have done or also attach a patch more clearly
showing what you change?
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
> handling on the case index == count.
> ---
> gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
> 1 file changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3b7a21724ec..e5c9d7e766e 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> noop_p = false;
> mask[index++] = mask_element;
>
> - if (index == count && !noop_p)
> + if (index == count)
> {
> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> - if (!can_vec_perm_const_p (mode, mode, indices))
> + if (!noop_p)
> {
> - if (dump_p)
> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> + if (!can_vec_perm_const_p (mode, mode, indices))
> {
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> - vect_location,
> - "unsupported vect permute { ");
> - for (i = 0; i < count; ++i)
> + if (dump_p)
> {
> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> - dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "unsupported vect permute { ");
> + for (i = 0; i < count; ++i)
> + {
> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> + dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> + }
> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> }
> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> + gcc_assert (analyze_only);
> + return false;
> }
> - gcc_assert (analyze_only);
> - return false;
> - }
>
> - ++*n_perms;
> - }
> + ++*n_perms;
>
> - if (index == count)
> - {
> - if (!analyze_only)
> - {
> - tree mask_vec = NULL_TREE;
> -
> - if (! noop_p)
> - mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> + if (!analyze_only)
> + {
> + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>
> - if (second_vec_index == -1)
> - second_vec_index = first_vec_index;
> + if (second_vec_index == -1)
> + second_vec_index = first_vec_index;
>
> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> - {
> - /* Generate the permute statement if necessary. */
> - tree first_vec = dr_chain[first_vec_index + ri];
> - tree second_vec = dr_chain[second_vec_index + ri];
> - gimple *perm_stmt;
> - if (! noop_p)
> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> {
> - gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> + /* Generate the permute statement if necessary. */
> + tree first_vec = dr_chain[first_vec_index + ri];
> + tree second_vec = dr_chain[second_vec_index + ri];
> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> tree perm_dest
> = vect_create_destination_var (gimple_assign_lhs (stmt),
> vectype);
> perm_dest = make_ssa_name (perm_dest);
> - perm_stmt
> + gimple *perm_stmt
> = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> - first_vec, second_vec,
> - mask_vec);
> + first_vec, second_vec, mask_vec);
> vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> gsi);
> if (dce_chain)
> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> bitmap_set_bit (used_defs, first_vec_index + ri);
> bitmap_set_bit (used_defs, second_vec_index + ri);
> }
> +
> + /* Store the vector statement in NODE. */
> + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> + = perm_stmt;
> }
> - else
> - {
> - /* If mask was NULL_TREE generate the requested
> - identity transform. */
> - perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> - if (dce_chain)
> - bitmap_set_bit (used_defs, first_vec_index + ri);
> - }
> + }
> + }
> + else if (!analyze_only)
> + {
> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> + {
> + tree first_vec = dr_chain[first_vec_index + ri];
> + /* If mask was NULL_TREE generate the requested
> + identity transform. */
> + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> + if (dce_chain)
> + bitmap_set_bit (used_defs, first_vec_index + ri);
>
> /* Store the vector statement in NODE. */
> SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> --
> 2.39.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
@ 2023-05-17 7:18 ` Kewen.Lin
2023-05-18 6:12 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-17 7:18 UTC (permalink / raw)
To: Richard Biener
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
Hi Richi,
on 2023/5/17 14:34, Richard Biener wrote:
> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> This patch is to refactor the handlings for the case (index
>> == count) in a loop of vect_transform_slp_perm_load_1, in
>> order to prepare a subsequent adjustment on *nperm. This
>> patch doesn't have any functional changes.
>
> The diff is impossible to be reviewed - can you explain the
> refactoring you have done or also attach a patch more clearly
> showing what you change?
Sorry, I should have made it more clear.
It mainly to combine these two hunks:
if (index == count && !noop_p)
{
// A ...
// ++*n_perms;
}
if (index == count)
{
if (!analyze_only)
{
if (!noop_p)
// B1 ...
// B2 ...
for ...
{
if (!noop_p)
// B3 building VEC_PERM_EXPR
else
// B4 building nothing (no uses for B2 and its seq)
}
}
// B5
}
The former can be part of the latter, so it becomes to:
if (index == count)
{
if (!noop_p)
{
// A ...
// ++*n_perms;
if (!analyze_only)
{
// B1 ...
// B2 ...
for ...
// B3 building VEC_PERM_EXPR
}
}
else if (!analyze_only)
{
// no B2 since no any further uses here.
for ...
// B4 building nothing
}
// B5 ...
}
But it's mainly the basic for the subsequent patch for consistent n_perms calculation,
the patch 2/2 is to make it further become to:
if (index == count)
{
if (!noop_p)
{
// A ...
if (!analyze_only)
// B1 ...
// B2 ... (trivial computations during analyze_only or not)
for ...
{
// ++*n_perms; (now n_perms is consistent with building VEC_PERM_EXPR)
if (analyze_only)
continue;
// B3 building VEC_PERM_EXPR
}
}
else if (!analyze_only)
{
// no B2 since no any further uses here.
for ...
// B4 building nothing
}
// B5 ...
}
BR,
Kewen
>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
>> handling on the case index == count.
>> ---
>> gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
>> 1 file changed, 44 insertions(+), 45 deletions(-)
>>
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 3b7a21724ec..e5c9d7e766e 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>> noop_p = false;
>> mask[index++] = mask_element;
>>
>> - if (index == count && !noop_p)
>> + if (index == count)
>> {
>> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
>> - if (!can_vec_perm_const_p (mode, mode, indices))
>> + if (!noop_p)
>> {
>> - if (dump_p)
>> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
>> + if (!can_vec_perm_const_p (mode, mode, indices))
>> {
>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> - vect_location,
>> - "unsupported vect permute { ");
>> - for (i = 0; i < count; ++i)
>> + if (dump_p)
>> {
>> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> - dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> + "unsupported vect permute { ");
>> + for (i = 0; i < count; ++i)
>> + {
>> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> + dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> + }
>> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>> }
>> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>> + gcc_assert (analyze_only);
>> + return false;
>> }
>> - gcc_assert (analyze_only);
>> - return false;
>> - }
>>
>> - ++*n_perms;
>> - }
>> + ++*n_perms;
>>
>> - if (index == count)
>> - {
>> - if (!analyze_only)
>> - {
>> - tree mask_vec = NULL_TREE;
>> -
>> - if (! noop_p)
>> - mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>> + if (!analyze_only)
>> + {
>> + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>
>> - if (second_vec_index == -1)
>> - second_vec_index = first_vec_index;
>> + if (second_vec_index == -1)
>> + second_vec_index = first_vec_index;
>>
>> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> - {
>> - /* Generate the permute statement if necessary. */
>> - tree first_vec = dr_chain[first_vec_index + ri];
>> - tree second_vec = dr_chain[second_vec_index + ri];
>> - gimple *perm_stmt;
>> - if (! noop_p)
>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> {
>> - gassign *stmt = as_a <gassign *> (stmt_info->stmt);
>> + /* Generate the permute statement if necessary. */
>> + tree first_vec = dr_chain[first_vec_index + ri];
>> + tree second_vec = dr_chain[second_vec_index + ri];
>> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>> tree perm_dest
>> = vect_create_destination_var (gimple_assign_lhs (stmt),
>> vectype);
>> perm_dest = make_ssa_name (perm_dest);
>> - perm_stmt
>> + gimple *perm_stmt
>> = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
>> - first_vec, second_vec,
>> - mask_vec);
>> + first_vec, second_vec, mask_vec);
>> vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>> gsi);
>> if (dce_chain)
>> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>> bitmap_set_bit (used_defs, first_vec_index + ri);
>> bitmap_set_bit (used_defs, second_vec_index + ri);
>> }
>> +
>> + /* Store the vector statement in NODE. */
>> + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
>> + = perm_stmt;
>> }
>> - else
>> - {
>> - /* If mask was NULL_TREE generate the requested
>> - identity transform. */
>> - perm_stmt = SSA_NAME_DEF_STMT (first_vec);
>> - if (dce_chain)
>> - bitmap_set_bit (used_defs, first_vec_index + ri);
>> - }
>> + }
>> + }
>> + else if (!analyze_only)
>> + {
>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> + {
>> + tree first_vec = dr_chain[first_vec_index + ri];
>> + /* If mask was NULL_TREE generate the requested
>> + identity transform. */
>> + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
>> + if (dce_chain)
>> + bitmap_set_bit (used_defs, first_vec_index + ri);
>>
>> /* Store the vector statement in NODE. */
>> SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>> --
>> 2.39.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
2023-05-17 7:18 ` Kewen.Lin
@ 2023-05-18 6:12 ` Richard Biener
2023-05-22 5:37 ` Kewen.Lin
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-18 6:12 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> on 2023/5/17 14:34, Richard Biener wrote:
> > On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> This patch is to refactor the handlings for the case (index
> >> == count) in a loop of vect_transform_slp_perm_load_1, in
> >> order to prepare a subsequent adjustment on *nperm. This
> >> patch doesn't have any functional changes.
> >
> > The diff is impossible to be reviewed - can you explain the
> > refactoring you have done or also attach a patch more clearly
> > showing what you change?
>
> Sorry, I should have made it more clear.
> It mainly to combine these two hunks:
>
> if (index == count && !noop_p)
> {
> // A ...
> // ++*n_perms;
> }
>
> if (index == count)
> {
> if (!analyze_only)
> {
> if (!noop_p)
> // B1 ...
>
> // B2 ...
>
> for ...
> {
> if (!noop_p)
> // B3 building VEC_PERM_EXPR
> else
> // B4 building nothing (no uses for B2 and its seq)
> }
> }
> // B5
> }
>
> The former can be part of the latter, so it becomes to:
>
> if (index == count)
> {
> if (!noop_p)
> {
> // A ...
> // ++*n_perms;
>
> if (!analyze_only)
> {
> // B1 ...
> // B2 ...
> for ...
> // B3 building VEC_PERM_EXPR
> }
> }
> else if (!analyze_only)
> {
> // no B2 since no any further uses here.
> for ...
> // B4 building nothing
> }
> // B5 ...
> }
Ah, thanks - that made reviewing easy. 1/2 is OK for trunk.
Thanks,
Richard.
> But it's mainly the basic for the subsequent patch for consistent n_perms calculation,
> the patch 2/2 is to make it further become to:
>
> if (index == count)
> {
> if (!noop_p)
> {
> // A ...
>
> if (!analyze_only)
> // B1 ...
>
> // B2 ... (trivial computations during analyze_only or not)
>
> for ...
> {
> // ++*n_perms; (now n_perms is consistent with building VEC_PERM_EXPR)
> if (analyze_only)
> continue;
> // B3 building VEC_PERM_EXPR
> }
> }
> else if (!analyze_only)
> {
> // no B2 since no any further uses here.
> for ...
> // B4 building nothing
> }
> // B5 ...
> }
>
> BR,
> Kewen
>
>
> >
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
> >> handling on the case index == count.
> >> ---
> >> gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++----------------------
> >> 1 file changed, 44 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index 3b7a21724ec..e5c9d7e766e 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >> noop_p = false;
> >> mask[index++] = mask_element;
> >>
> >> - if (index == count && !noop_p)
> >> + if (index == count)
> >> {
> >> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> >> - if (!can_vec_perm_const_p (mode, mode, indices))
> >> + if (!noop_p)
> >> {
> >> - if (dump_p)
> >> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> >> + if (!can_vec_perm_const_p (mode, mode, indices))
> >> {
> >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> >> - vect_location,
> >> - "unsupported vect permute { ");
> >> - for (i = 0; i < count; ++i)
> >> + if (dump_p)
> >> {
> >> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> >> - dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> + "unsupported vect permute { ");
> >> + for (i = 0; i < count; ++i)
> >> + {
> >> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> >> + dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> >> + }
> >> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> >> }
> >> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> >> + gcc_assert (analyze_only);
> >> + return false;
> >> }
> >> - gcc_assert (analyze_only);
> >> - return false;
> >> - }
> >>
> >> - ++*n_perms;
> >> - }
> >> + ++*n_perms;
> >>
> >> - if (index == count)
> >> - {
> >> - if (!analyze_only)
> >> - {
> >> - tree mask_vec = NULL_TREE;
> >> -
> >> - if (! noop_p)
> >> - mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >> + if (!analyze_only)
> >> + {
> >> + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >>
> >> - if (second_vec_index == -1)
> >> - second_vec_index = first_vec_index;
> >> + if (second_vec_index == -1)
> >> + second_vec_index = first_vec_index;
> >>
> >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> - {
> >> - /* Generate the permute statement if necessary. */
> >> - tree first_vec = dr_chain[first_vec_index + ri];
> >> - tree second_vec = dr_chain[second_vec_index + ri];
> >> - gimple *perm_stmt;
> >> - if (! noop_p)
> >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> {
> >> - gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> >> + /* Generate the permute statement if necessary. */
> >> + tree first_vec = dr_chain[first_vec_index + ri];
> >> + tree second_vec = dr_chain[second_vec_index + ri];
> >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >> tree perm_dest
> >> = vect_create_destination_var (gimple_assign_lhs (stmt),
> >> vectype);
> >> perm_dest = make_ssa_name (perm_dest);
> >> - perm_stmt
> >> + gimple *perm_stmt
> >> = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> >> - first_vec, second_vec,
> >> - mask_vec);
> >> + first_vec, second_vec, mask_vec);
> >> vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >> gsi);
> >> if (dce_chain)
> >> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >> bitmap_set_bit (used_defs, first_vec_index + ri);
> >> bitmap_set_bit (used_defs, second_vec_index + ri);
> >> }
> >> +
> >> + /* Store the vector statement in NODE. */
> >> + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> >> + = perm_stmt;
> >> }
> >> - else
> >> - {
> >> - /* If mask was NULL_TREE generate the requested
> >> - identity transform. */
> >> - perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> >> - if (dce_chain)
> >> - bitmap_set_bit (used_defs, first_vec_index + ri);
> >> - }
> >> + }
> >> + }
> >> + else if (!analyze_only)
> >> + {
> >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> + {
> >> + tree first_vec = dr_chain[first_vec_index + ri];
> >> + /* If mask was NULL_TREE generate the requested
> >> + identity transform. */
> >> + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec);
> >> + if (dce_chain)
> >> + bitmap_set_bit (used_defs, first_vec_index + ri);
> >>
> >> /* Store the vector statement in NODE. */
> >> SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> >> --
> >> 2.39.1
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1
2023-05-18 6:12 ` Richard Biener
@ 2023-05-22 5:37 ` Kewen.Lin
0 siblings, 0 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-22 5:37 UTC (permalink / raw)
To: Richard Biener
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
on 2023/5/18 14:12, Richard Biener wrote:
> On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> on 2023/5/17 14:34, Richard Biener wrote:
>>> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch is to refactor the handlings for the case (index
>>>> == count) in a loop of vect_transform_slp_perm_load_1, in
>>>> order to prepare a subsequent adjustment on *nperm. This
>>>> patch doesn't have any functional changes.
>>>
>>> The diff is impossible to be reviewed - can you explain the
>>> refactoring you have done or also attach a patch more clearly
>>> showing what you change?
>>
>> Sorry, I should have made it more clear.
>> It mainly to combine these two hunks:
>>
>> if (index == count && !noop_p)
>> {
>> // A ...
>> // ++*n_perms;
>> }
>>
>> if (index == count)
>> {
>> if (!analyze_only)
>> {
>> if (!noop_p)
>> // B1 ...
>>
>> // B2 ...
>>
>> for ...
>> {
>> if (!noop_p)
>> // B3 building VEC_PERM_EXPR
>> else
>> // B4 building nothing (no uses for B2 and its seq)
>> }
>> }
>> // B5
>> }
>>
>> The former can be part of the latter, so it becomes to:
>>
>> if (index == count)
>> {
>> if (!noop_p)
>> {
>> // A ...
>> // ++*n_perms;
>>
>> if (!analyze_only)
>> {
>> // B1 ...
>> // B2 ...
>> for ...
>> // B3 building VEC_PERM_EXPR
>> }
>> }
>> else if (!analyze_only)
>> {
>> // no B2 since no any further uses here.
>> for ...
>> // B4 building nothing
>> }
>> // B5 ...
>> }
>
> Ah, thanks - that made reviewing easy. 1/2 is OK for trunk.
Thanks for the review! Pushed as r14-1028.
BR,
Kewen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
@ 2023-05-22 13:44 ` Richard Biener
2023-05-23 3:01 ` Kewen.Lin
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-22 13:44 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> Following Richi's suggestion in [1], I'm working on deferring
> cost evaluation next to the transformation, this patch is
> to enhance function vect_transform_slp_perm_load_1 which
> could under-cost for vector permutation, since the costing
> doesn't try to consider nvectors_per_build, it's inconsistent
> with the transformation part.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
> calculation on n_perms by considering nvectors_per_build.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> ---
> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++
> gcc/tree-vect-slp.cc | 66 ++++++++++---------
> 2 files changed, 57 insertions(+), 32 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> new file mode 100644
> index 00000000000..e5c4dceddfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* Specify power9 to ensure the vectorization is profitable
> + and test point stands, otherwise it could be not profitable
> + to vectorize. */
> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
> +
> +/* Verify we cost the exact count for required vec_perm. */
> +
> +int x[1024], y[1024];
> +
> +void
> +foo ()
> +{
> + for (int i = 0; i < 512; ++i)
> + {
> + x[2 * i] = y[1023 - (2 * i)];
> + x[2 * i + 1] = y[1023 - (2 * i + 1)];
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..af9a6dd4fa9 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>
> mode = TYPE_MODE (vectype);
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>
> /* Initialize the vect stmts of NODE to properly insert the generated
> stmts later. */
> if (! analyze_only)
> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
> SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>
> /* Generate permutation masks for every NODE. Number of masks for each NODE
> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> (b) the permutes only need a single vector input. */
> mask.new_vector (nunits, group_size, 3);
> nelts_to_build = mask.encoded_nelts ();
> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> + /* It's possible to obtain zero nstmts during analyze_only, so make
> + it at least one to ensure the later computation for n_perms
> + proceed. */
> + nvectors_per_build = nstmts > 0 ? nstmts : 1;
> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
> }
> else
> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> return false;
> }
>
> - ++*n_perms;
> -
> + tree mask_vec = NULL_TREE;
> if (!analyze_only)
> - {
> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> + mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>
> - if (second_vec_index == -1)
> - second_vec_index = first_vec_index;
> + if (second_vec_index == -1)
> + second_vec_index = first_vec_index;
>
> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> + {
> + ++*n_perms;
So the "real" change is doing
*n_perms += nvectors_per_build;
and *n_perms was unused when !analyze_only? And since at
analysis time we (sometimes?) have zero nvectors you have to
fixup above? Which cases are that?
In principle the patch looks good to me.
Richard.
> + if (analyze_only)
> + continue;
> + /* Generate the permute statement if necessary. */
> + tree first_vec = dr_chain[first_vec_index + ri];
> + tree second_vec = dr_chain[second_vec_index + ri];
> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> + tree perm_dest
> + = vect_create_destination_var (gimple_assign_lhs (stmt),
> + vectype);
> + perm_dest = make_ssa_name (perm_dest);
> + gimple *perm_stmt
> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
> + second_vec, mask_vec);
> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> + gsi);
> + if (dce_chain)
> {
> - /* Generate the permute statement if necessary. */
> - tree first_vec = dr_chain[first_vec_index + ri];
> - tree second_vec = dr_chain[second_vec_index + ri];
> - gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> - tree perm_dest
> - = vect_create_destination_var (gimple_assign_lhs (stmt),
> - vectype);
> - perm_dest = make_ssa_name (perm_dest);
> - gimple *perm_stmt
> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> - first_vec, second_vec, mask_vec);
> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> - gsi);
> - if (dce_chain)
> - {
> - bitmap_set_bit (used_defs, first_vec_index + ri);
> - bitmap_set_bit (used_defs, second_vec_index + ri);
> - }
> -
> - /* Store the vector statement in NODE. */
> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> - = perm_stmt;
> + bitmap_set_bit (used_defs, first_vec_index + ri);
> + bitmap_set_bit (used_defs, second_vec_index + ri);
> }
> +
> + /* Store the vector statement in NODE. */
> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> }
> }
> else if (!analyze_only)
> --
> 2.39.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
2023-05-22 13:44 ` Richard Biener
@ 2023-05-23 3:01 ` Kewen.Lin
2023-05-23 6:19 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Kewen.Lin @ 2023-05-23 3:01 UTC (permalink / raw)
To: Richard Biener
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
Hi Richi,
Thanks for the review!
on 2023/5/22 21:44, Richard Biener wrote:
> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> Following Richi's suggestion in [1], I'm working on deferring
>> cost evaluation next to the transformation, this patch is
>> to enhance function vect_transform_slp_perm_load_1 which
>> could under-cost for vector permutation, since the costing
>> doesn't try to consider nvectors_per_build, it's inconsistent
>> with the transformation part.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>> calculation on n_perms by considering nvectors_per_build.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
>> ---
>> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++
>> gcc/tree-vect-slp.cc | 66 ++++++++++---------
>> 2 files changed, 57 insertions(+), 32 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>> new file mode 100644
>> index 00000000000..e5c4dceddfb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>> +/* Specify power9 to ensure the vectorization is profitable
>> + and test point stands, otherwise it could be not profitable
>> + to vectorize. */
>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
>> +
>> +/* Verify we cost the exact count for required vec_perm. */
>> +
>> +int x[1024], y[1024];
>> +
>> +void
>> +foo ()
>> +{
>> + for (int i = 0; i < 512; ++i)
>> + {
>> + x[2 * i] = y[1023 - (2 * i)];
>> + x[2 * i + 1] = y[1023 - (2 * i + 1)];
>> + }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index e5c9d7e766e..af9a6dd4fa9 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>
>> mode = TYPE_MODE (vectype);
>> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>>
>> /* Initialize the vect stmts of NODE to properly insert the generated
>> stmts later. */
>> if (! analyze_only)
>> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
>> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
>> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>> SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>>
>> /* Generate permutation masks for every NODE. Number of masks for each NODE
>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>> (b) the permutes only need a single vector input. */
>> mask.new_vector (nunits, group_size, 3);
>> nelts_to_build = mask.encoded_nelts ();
>> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
>> + /* It's possible to obtain zero nstmts during analyze_only, so make
>> + it at least one to ensure the later computation for n_perms
>> + proceed. */
>> + nvectors_per_build = nstmts > 0 ? nstmts : 1;
>> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>> }
>> else
>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>> return false;
>> }
>>
>> - ++*n_perms;
>> -
>> + tree mask_vec = NULL_TREE;
>> if (!analyze_only)
>> - {
>> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>> + mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>
>> - if (second_vec_index == -1)
>> - second_vec_index = first_vec_index;
>> + if (second_vec_index == -1)
>> + second_vec_index = first_vec_index;
>>
>> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>> + {
>> + ++*n_perms;
>
> So the "real" change is doing
>
> *n_perms += nvectors_per_build;
>
> and *n_perms was unused when !analyze_only? And since at
Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
only the call sites with analyze_only will use the returned n_perms further.
> analysis time we (sometimes?) have zero nvectors you have to
> fixup above? Which cases are that?
Yes, the fixup is to avoid to result in unexpected n_perms in function
vect_optimize_slp_pass::internal_node_cost。 One typical case is
gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
out one more vec_perm unexpectedly.
In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
is zero or not (vec_perm not needed or needed).
if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
nullptr, vf, true, false, &n_perms))
{
auto rep = SLP_TREE_REPRESENTATIVE (node);
if (out_layout_i == 0)
{
/* Use the fallback cost if the load is an N-to-N permutation.
Otherwise assume that the node will be rejected later
and rebuilt from scalars. */
if (STMT_VINFO_GROUPED_ACCESS (rep)
&& (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
== SLP_TREE_LANES (node)))
return fallback_cost;
return 0;
}
return -1;
}
/* See the comment above the corresponding VEC_PERM_EXPR handling. */
return n_perms == 0 ? 0 : 1;
In vect_optimize_slp_pass::forward_pass (), it only considers the case that
factor > 0 (there is some vec_perm needed).
/* Accumulate the cost of using LAYOUT_I within NODE,
both for the inputs and the outputs. */
int factor = internal_node_cost (vertex.node, layout_i,
layout_i);
if (factor < 0)
{
is_possible = false;
break;
}
else if (factor)
layout_costs.internal_cost.add_serial_cost
({ vertex.weight * factor, m_optimize_size });
BR,
Kewen
>
> In principle the patch looks good to me.
>
> Richard.
>
>> + if (analyze_only)
>> + continue;
>> + /* Generate the permute statement if necessary. */
>> + tree first_vec = dr_chain[first_vec_index + ri];
>> + tree second_vec = dr_chain[second_vec_index + ri];
>> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>> + tree perm_dest
>> + = vect_create_destination_var (gimple_assign_lhs (stmt),
>> + vectype);
>> + perm_dest = make_ssa_name (perm_dest);
>> + gimple *perm_stmt
>> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
>> + second_vec, mask_vec);
>> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>> + gsi);
>> + if (dce_chain)
>> {
>> - /* Generate the permute statement if necessary. */
>> - tree first_vec = dr_chain[first_vec_index + ri];
>> - tree second_vec = dr_chain[second_vec_index + ri];
>> - gassign *stmt = as_a<gassign *> (stmt_info->stmt);
>> - tree perm_dest
>> - = vect_create_destination_var (gimple_assign_lhs (stmt),
>> - vectype);
>> - perm_dest = make_ssa_name (perm_dest);
>> - gimple *perm_stmt
>> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
>> - first_vec, second_vec, mask_vec);
>> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
>> - gsi);
>> - if (dce_chain)
>> - {
>> - bitmap_set_bit (used_defs, first_vec_index + ri);
>> - bitmap_set_bit (used_defs, second_vec_index + ri);
>> - }
>> -
>> - /* Store the vector statement in NODE. */
>> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
>> - = perm_stmt;
>> + bitmap_set_bit (used_defs, first_vec_index + ri);
>> + bitmap_set_bit (used_defs, second_vec_index + ri);
>> }
>> +
>> + /* Store the vector statement in NODE. */
>> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
>> }
>> }
>> else if (!analyze_only)
>> --
>> 2.39.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
2023-05-23 3:01 ` Kewen.Lin
@ 2023-05-23 6:19 ` Richard Biener
2023-05-24 5:23 ` Kewen.Lin
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-05-23 6:19 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the review!
>
> on 2023/5/22 21:44, Richard Biener wrote:
> > On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> Following Richi's suggestion in [1], I'm working on deferring
> >> cost evaluation next to the transformation, this patch is
> >> to enhance function vect_transform_slp_perm_load_1 which
> >> could under-cost for vector permutation, since the costing
> >> doesn't try to consider nvectors_per_build, it's inconsistent
> >> with the transformation part.
> >>
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> Is it ok for trunk?
> >>
> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
> >> calculation on n_perms by considering nvectors_per_build.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
> >> ---
> >> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++
> >> gcc/tree-vect-slp.cc | 66 ++++++++++---------
> >> 2 files changed, 57 insertions(+), 32 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> new file mode 100644
> >> index 00000000000..e5c4dceddfb
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
> >> @@ -0,0 +1,23 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-require-effective-target vect_int } */
> >> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> >> +/* Specify power9 to ensure the vectorization is profitable
> >> + and test point stands, otherwise it could be not profitable
> >> + to vectorize. */
> >> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
> >> +
> >> +/* Verify we cost the exact count for required vec_perm. */
> >> +
> >> +int x[1024], y[1024];
> >> +
> >> +void
> >> +foo ()
> >> +{
> >> + for (int i = 0; i < 512; ++i)
> >> + {
> >> + x[2 * i] = y[1023 - (2 * i)];
> >> + x[2 * i + 1] = y[1023 - (2 * i + 1)];
> >> + }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> index e5c9d7e766e..af9a6dd4fa9 100644
> >> --- a/gcc/tree-vect-slp.cc
> >> +++ b/gcc/tree-vect-slp.cc
> >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >>
> >> mode = TYPE_MODE (vectype);
> >> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> >> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
> >>
> >> /* Initialize the vect stmts of NODE to properly insert the generated
> >> stmts later. */
> >> if (! analyze_only)
> >> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
> >> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
> >> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
> >> SLP_TREE_VEC_STMTS (node).quick_push (NULL);
> >>
> >> /* Generate permutation masks for every NODE. Number of masks for each NODE
> >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >> (b) the permutes only need a single vector input. */
> >> mask.new_vector (nunits, group_size, 3);
> >> nelts_to_build = mask.encoded_nelts ();
> >> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> >> + /* It's possible to obtain zero nstmts during analyze_only, so make
> >> + it at least one to ensure the later computation for n_perms
> >> + proceed. */
> >> + nvectors_per_build = nstmts > 0 ? nstmts : 1;
> >> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
> >> }
> >> else
> >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
> >> return false;
> >> }
> >>
> >> - ++*n_perms;
> >> -
> >> + tree mask_vec = NULL_TREE;
> >> if (!analyze_only)
> >> - {
> >> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >> + mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >>
> >> - if (second_vec_index == -1)
> >> - second_vec_index = first_vec_index;
> >> + if (second_vec_index == -1)
> >> + second_vec_index = first_vec_index;
> >>
> >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> >> + {
> >> + ++*n_perms;
> >
> > So the "real" change is doing
> >
> > *n_perms += nvectors_per_build;
> >
> > and *n_perms was unused when !analyze_only? And since at
>
> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
> only the call sites with analyze_only will use the returned n_perms further.
>
> > analysis time we (sometimes?) have zero nvectors you have to
> > fixup above? Which cases are that?
>
> Yes, the fixup is to avoid to result in unexpected n_perms in function
> vect_optimize_slp_pass::internal_node_cost。 One typical case is
> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
> out one more vec_perm unexpectedly.
>
> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
> is zero or not (vec_perm not needed or needed).
>
> if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
> nullptr, vf, true, false, &n_perms))
> {
> auto rep = SLP_TREE_REPRESENTATIVE (node);
> if (out_layout_i == 0)
> {
> /* Use the fallback cost if the load is an N-to-N permutation.
> Otherwise assume that the node will be rejected later
> and rebuilt from scalars. */
> if (STMT_VINFO_GROUPED_ACCESS (rep)
> && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
> == SLP_TREE_LANES (node)))
> return fallback_cost;
> return 0;
> }
> return -1;
> }
>
> /* See the comment above the corresponding VEC_PERM_EXPR handling. */
> return n_perms == 0 ? 0 : 1;
>
> In vect_optimize_slp_pass::forward_pass (), it only considers the case that
> factor > 0 (there is some vec_perm needed).
>
> /* Accumulate the cost of using LAYOUT_I within NODE,
> both for the inputs and the outputs. */
> int factor = internal_node_cost (vertex.node, layout_i,
> layout_i);
> if (factor < 0)
> {
> is_possible = false;
> break;
> }
> else if (factor)
> layout_costs.internal_cost.add_serial_cost
> ({ vertex.weight * factor, m_optimize_size });
Ah, OK - thanks for clarifying.
The patch is OK.
Richard.
> BR,
> Kewen
>
> >
> > In principle the patch looks good to me.
> >
> > Richard.
> >
> >> + if (analyze_only)
> >> + continue;
> >> + /* Generate the permute statement if necessary. */
> >> + tree first_vec = dr_chain[first_vec_index + ri];
> >> + tree second_vec = dr_chain[second_vec_index + ri];
> >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >> + tree perm_dest
> >> + = vect_create_destination_var (gimple_assign_lhs (stmt),
> >> + vectype);
> >> + perm_dest = make_ssa_name (perm_dest);
> >> + gimple *perm_stmt
> >> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
> >> + second_vec, mask_vec);
> >> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >> + gsi);
> >> + if (dce_chain)
> >> {
> >> - /* Generate the permute statement if necessary. */
> >> - tree first_vec = dr_chain[first_vec_index + ri];
> >> - tree second_vec = dr_chain[second_vec_index + ri];
> >> - gassign *stmt = as_a<gassign *> (stmt_info->stmt);
> >> - tree perm_dest
> >> - = vect_create_destination_var (gimple_assign_lhs (stmt),
> >> - vectype);
> >> - perm_dest = make_ssa_name (perm_dest);
> >> - gimple *perm_stmt
> >> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> >> - first_vec, second_vec, mask_vec);
> >> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> >> - gsi);
> >> - if (dce_chain)
> >> - {
> >> - bitmap_set_bit (used_defs, first_vec_index + ri);
> >> - bitmap_set_bit (used_defs, second_vec_index + ri);
> >> - }
> >> -
> >> - /* Store the vector statement in NODE. */
> >> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++]
> >> - = perm_stmt;
> >> + bitmap_set_bit (used_defs, first_vec_index + ri);
> >> + bitmap_set_bit (used_defs, second_vec_index + ri);
> >> }
> >> +
> >> + /* Store the vector statement in NODE. */
> >> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt;
> >> }
> >> }
> >> else if (!analyze_only)
> >> --
> >> 2.39.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1
2023-05-23 6:19 ` Richard Biener
@ 2023-05-24 5:23 ` Kewen.Lin
0 siblings, 0 replies; 10+ messages in thread
From: Kewen.Lin @ 2023-05-24 5:23 UTC (permalink / raw)
To: Richard Biener
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner
on 2023/5/23 14:19, Richard Biener wrote:
> On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> Thanks for the review!
>>
>> on 2023/5/22 21:44, Richard Biener wrote:
>>> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Following Richi's suggestion in [1], I'm working on deferring
>>>> cost evaluation next to the transformation, this patch is
>>>> to enhance function vect_transform_slp_perm_load_1 which
>>>> could under-cost for vector permutation, since the costing
>>>> doesn't try to consider nvectors_per_build, it's inconsistent
>>>> with the transformation part.
>>>>
>>>> Bootstrapped and regtested on x86_64-redhat-linux,
>>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>> gcc/ChangeLog:
>>>>
>>>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
>>>> calculation on n_perms by considering nvectors_per_build.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
>>>> ---
>>>> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++
>>>> gcc/tree-vect-slp.cc | 66 ++++++++++---------
>>>> 2 files changed, 57 insertions(+), 32 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>> new file mode 100644
>>>> index 00000000000..e5c4dceddfb
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
>>>> @@ -0,0 +1,23 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_int } */
>>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>>>> +/* Specify power9 to ensure the vectorization is profitable
>>>> + and test point stands, otherwise it could be not profitable
>>>> + to vectorize. */
>>>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
>>>> +
>>>> +/* Verify we cost the exact count for required vec_perm. */
>>>> +
>>>> +int x[1024], y[1024];
>>>> +
>>>> +void
>>>> +foo ()
>>>> +{
>>>> + for (int i = 0; i < 512; ++i)
>>>> + {
>>>> + x[2 * i] = y[1023 - (2 * i)];
>>>> + x[2 * i + 1] = y[1023 - (2 * i + 1)];
>>>> + }
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
>>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>>>> index e5c9d7e766e..af9a6dd4fa9 100644
>>>> --- a/gcc/tree-vect-slp.cc
>>>> +++ b/gcc/tree-vect-slp.cc
>>>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>>
>>>> mode = TYPE_MODE (vectype);
>>>> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>>>> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
>>>>
>>>> /* Initialize the vect stmts of NODE to properly insert the generated
>>>> stmts later. */
>>>> if (! analyze_only)
>>>> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
>>>> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
>>>> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
>>>> SLP_TREE_VEC_STMTS (node).quick_push (NULL);
>>>>
>>>> /* Generate permutation masks for every NODE. Number of masks for each NODE
>>>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>> (b) the permutes only need a single vector input. */
>>>> mask.new_vector (nunits, group_size, 3);
>>>> nelts_to_build = mask.encoded_nelts ();
>>>> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
>>>> + /* It's possible to obtain zero nstmts during analyze_only, so make
>>>> + it at least one to ensure the later computation for n_perms
>>>> + proceed. */
>>>> + nvectors_per_build = nstmts > 0 ? nstmts : 1;
>>>> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
>>>> }
>>>> else
>>>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node,
>>>> return false;
>>>> }
>>>>
>>>> - ++*n_perms;
>>>> -
>>>> + tree mask_vec = NULL_TREE;
>>>> if (!analyze_only)
>>>> - {
>>>> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>>> + mask_vec = vect_gen_perm_mask_checked (vectype, indices);
>>>>
>>>> - if (second_vec_index == -1)
>>>> - second_vec_index = first_vec_index;
>>>> + if (second_vec_index == -1)
>>>> + second_vec_index = first_vec_index;
>>>>
>>>> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>>>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
>>>> + {
>>>> + ++*n_perms;
>>>
>>> So the "real" change is doing
>>>
>>> *n_perms += nvectors_per_build;
>>>
>>> and *n_perms was unused when !analyze_only? And since at
>>
>> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now
>> only the call sites with analyze_only will use the returned n_perms further.
>>
>>> analysis time we (sometimes?) have zero nvectors you have to
>>> fixup above? Which cases are that?
>>
>> Yes, the fixup is to avoid to result in unexpected n_perms in function
>> vect_optimize_slp_pass::internal_node_cost。 One typical case is
>> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize
>> out one more vec_perm unexpectedly.
>>
>> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms
>> is zero or not (vec_perm not needed or needed).
>>
>> if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
>> nullptr, vf, true, false, &n_perms))
>> {
>> auto rep = SLP_TREE_REPRESENTATIVE (node);
>> if (out_layout_i == 0)
>> {
>> /* Use the fallback cost if the load is an N-to-N permutation.
>> Otherwise assume that the node will be rejected later
>> and rebuilt from scalars. */
>> if (STMT_VINFO_GROUPED_ACCESS (rep)
>> && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
>> == SLP_TREE_LANES (node)))
>> return fallback_cost;
>> return 0;
>> }
>> return -1;
>> }
>>
>> /* See the comment above the corresponding VEC_PERM_EXPR handling. */
>> return n_perms == 0 ? 0 : 1;
>>
>> In vect_optimize_slp_pass::forward_pass (), it only considers the case that
>> factor > 0 (there is some vec_perm needed).
>>
>> /* Accumulate the cost of using LAYOUT_I within NODE,
>> both for the inputs and the outputs. */
>> int factor = internal_node_cost (vertex.node, layout_i,
>> layout_i);
>> if (factor < 0)
>> {
>> is_possible = false;
>> break;
>> }
>> else if (factor)
>> layout_costs.internal_cost.add_serial_cost
>> ({ vertex.weight * factor, m_optimize_size });
>
> Ah, OK - thanks for clarifying.
>
> The patch is OK.
Thanks! Committed in r14-1151.
BR,
Kewen
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-24 5:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin
2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin
2023-05-22 13:44 ` Richard Biener
2023-05-23 3:01 ` Kewen.Lin
2023-05-23 6:19 ` Richard Biener
2023-05-24 5:23 ` Kewen.Lin
2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener
2023-05-17 7:18 ` Kewen.Lin
2023-05-18 6:12 ` Richard Biener
2023-05-22 5:37 ` Kewen.Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).