* [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 @ 2023-05-17 6:09 Kewen.Lin 2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin 2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener 0 siblings, 2 replies; 10+ messages in thread From: Kewen.Lin @ 2023-05-17 6:09 UTC (permalink / raw) To: GCC Patches Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner Hi, This patch is to refactor the handlings for the case (index == count) in a loop of vect_transform_slp_perm_load_1, in order to prepare a subsequent adjustment on *nperm. This patch doesn't have any functional changes. Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu and powerpc64{,le}-linux-gnu. BR, Kewen ----- gcc/ChangeLog: * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the handling on the case index == count. --- gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++---------------------- 1 file changed, 44 insertions(+), 45 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 3b7a21724ec..e5c9d7e766e 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, noop_p = false; mask[index++] = mask_element; - if (index == count && !noop_p) + if (index == count) { - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); - if (!can_vec_perm_const_p (mode, mode, indices)) + if (!noop_p) { - if (dump_p) + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); + if (!can_vec_perm_const_p (mode, mode, indices)) { - dump_printf_loc (MSG_MISSED_OPTIMIZATION, - vect_location, - "unsupported vect permute { "); - for (i = 0; i < count; ++i) + if (dump_p) { - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); - dump_printf (MSG_MISSED_OPTIMIZATION, " "); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "unsupported vect permute { "); + for (i = 0; i < count; ++i) + { + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); + dump_printf (MSG_MISSED_OPTIMIZATION, " "); + } + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); } - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); + gcc_assert (analyze_only); + return false; } - gcc_assert (analyze_only); - return false; - } - ++*n_perms; - } + ++*n_perms; - if (index == count) - { - if (!analyze_only) - { - tree mask_vec = NULL_TREE; - - if (! noop_p) - mask_vec = vect_gen_perm_mask_checked (vectype, indices); + if (!analyze_only) + { + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); - if (second_vec_index == -1) - second_vec_index = first_vec_index; + if (second_vec_index == -1) + second_vec_index = first_vec_index; - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) - { - /* Generate the permute statement if necessary. */ - tree first_vec = dr_chain[first_vec_index + ri]; - tree second_vec = dr_chain[second_vec_index + ri]; - gimple *perm_stmt; - if (! noop_p) + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) { - gassign *stmt = as_a <gassign *> (stmt_info->stmt); + /* Generate the permute statement if necessary. */ + tree first_vec = dr_chain[first_vec_index + ri]; + tree second_vec = dr_chain[second_vec_index + ri]; + gassign *stmt = as_a<gassign *> (stmt_info->stmt); tree perm_dest = vect_create_destination_var (gimple_assign_lhs (stmt), vectype); perm_dest = make_ssa_name (perm_dest); - perm_stmt + gimple *perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR, - first_vec, second_vec, - mask_vec); + first_vec, second_vec, mask_vec); vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, gsi); if (dce_chain) @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, bitmap_set_bit (used_defs, first_vec_index + ri); bitmap_set_bit (used_defs, second_vec_index + ri); } + + /* Store the vector statement in NODE. */ + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] + = perm_stmt; } - else - { - /* If mask was NULL_TREE generate the requested - identity transform. */ - perm_stmt = SSA_NAME_DEF_STMT (first_vec); - if (dce_chain) - bitmap_set_bit (used_defs, first_vec_index + ri); - } + } + } + else if (!analyze_only) + { + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) + { + tree first_vec = dr_chain[first_vec_index + ri]; + /* If mask was NULL_TREE generate the requested + identity transform. */ + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec); + if (dce_chain) + bitmap_set_bit (used_defs, first_vec_index + ri); /* Store the vector statement in NODE. */ SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; -- 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin @ 2023-05-17 6:15 ` Kewen.Lin 2023-05-22 13:44 ` Richard Biener 2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener 1 sibling, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2023-05-17 6:15 UTC (permalink / raw) To: GCC Patches Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Peter Bergner Hi, Following Richi's suggestion in [1], I'm working on deferring cost evaluation next to the transformation, this patch is to enhance function vect_transform_slp_perm_load_1 which could under-cost for vector permutation, since the costing doesn't try to consider nvectors_per_build, it's inconsistent with the transformation part. Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu and powerpc64{,le}-linux-gnu. Is it ok for trunk? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html BR, Kewen ----- gcc/ChangeLog: * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the calculation on n_perms by considering nvectors_per_build. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. --- .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ gcc/tree-vect-slp.cc | 66 ++++++++++--------- 2 files changed, 57 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c new file mode 100644 index 00000000000..e5c4dceddfb --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* Specify power9 to ensure the vectorization is profitable + and test point stands, otherwise it could be not profitable + to vectorize. */ +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ + +/* Verify we cost the exact count for required vec_perm. */ + +int x[1024], y[1024]; + +void +foo () +{ + for (int i = 0; i < 512; ++i) + { + x[2 * i] = y[1023 - (2 * i)]; + x[2 * i + 1] = y[1023 - (2 * i + 1)]; + } +} + +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index e5c9d7e766e..af9a6dd4fa9 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, mode = TYPE_MODE (vectype); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ if (! analyze_only) - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) SLP_TREE_VEC_STMTS (node).quick_push (NULL); /* Generate permutation masks for every NODE. Number of masks for each NODE @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, (b) the permutes only need a single vector input. */ mask.new_vector (nunits, group_size, 3); nelts_to_build = mask.encoded_nelts (); - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); + /* It's possible to obtain zero nstmts during analyze_only, so make + it at least one to ensure the later computation for n_perms + proceed. */ + nvectors_per_build = nstmts > 0 ? nstmts : 1; in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; } else @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, return false; } - ++*n_perms; - + tree mask_vec = NULL_TREE; if (!analyze_only) - { - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); + mask_vec = vect_gen_perm_mask_checked (vectype, indices); - if (second_vec_index == -1) - second_vec_index = first_vec_index; + if (second_vec_index == -1) + second_vec_index = first_vec_index; - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) + { + ++*n_perms; + if (analyze_only) + continue; + /* Generate the permute statement if necessary. */ + tree first_vec = dr_chain[first_vec_index + ri]; + tree second_vec = dr_chain[second_vec_index + ri]; + gassign *stmt = as_a<gassign *> (stmt_info->stmt); + tree perm_dest + = vect_create_destination_var (gimple_assign_lhs (stmt), + vectype); + perm_dest = make_ssa_name (perm_dest); + gimple *perm_stmt + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, + second_vec, mask_vec); + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, + gsi); + if (dce_chain) { - /* Generate the permute statement if necessary. */ - tree first_vec = dr_chain[first_vec_index + ri]; - tree second_vec = dr_chain[second_vec_index + ri]; - gassign *stmt = as_a<gassign *> (stmt_info->stmt); - tree perm_dest - = vect_create_destination_var (gimple_assign_lhs (stmt), - vectype); - perm_dest = make_ssa_name (perm_dest); - gimple *perm_stmt - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, - first_vec, second_vec, mask_vec); - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, - gsi); - if (dce_chain) - { - bitmap_set_bit (used_defs, first_vec_index + ri); - bitmap_set_bit (used_defs, second_vec_index + ri); - } - - /* Store the vector statement in NODE. */ - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] - = perm_stmt; + bitmap_set_bit (used_defs, first_vec_index + ri); + bitmap_set_bit (used_defs, second_vec_index + ri); } + + /* Store the vector statement in NODE. */ + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; } } else if (!analyze_only) -- 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin @ 2023-05-22 13:44 ` Richard Biener 2023-05-23 3:01 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2023-05-22 13:44 UTC (permalink / raw) To: Kewen.Lin Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi, > > Following Richi's suggestion in [1], I'm working on deferring > cost evaluation next to the transformation, this patch is > to enhance function vect_transform_slp_perm_load_1 which > could under-cost for vector permutation, since the costing > doesn't try to consider nvectors_per_build, it's inconsistent > with the transformation part. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > Is it ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html > > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the > calculation on n_perms by considering nvectors_per_build. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. > --- > .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ > gcc/tree-vect-slp.cc | 66 ++++++++++--------- > 2 files changed, 57 insertions(+), 32 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > new file mode 100644 > index 00000000000..e5c4dceddfb > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* Specify power9 to ensure the vectorization is profitable > + and test point stands, otherwise it could be not profitable > + to vectorize. */ > +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ > + > +/* Verify we cost the exact count for required vec_perm. */ > + > +int x[1024], y[1024]; > + > +void > +foo () > +{ > + for (int i = 0; i < 512; ++i) > + { > + x[2 * i] = y[1023 - (2 * i)]; > + x[2 * i + 1] = y[1023 - (2 * i + 1)]; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index e5c9d7e766e..af9a6dd4fa9 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > > mode = TYPE_MODE (vectype); > poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); > > /* Initialize the vect stmts of NODE to properly insert the generated > stmts later. */ > if (! analyze_only) > - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); > - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) > + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) > SLP_TREE_VEC_STMTS (node).quick_push (NULL); > > /* Generate permutation masks for every NODE. Number of masks for each NODE > @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > (b) the permutes only need a single vector input. */ > mask.new_vector (nunits, group_size, 3); > nelts_to_build = mask.encoded_nelts (); > - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); > + /* It's possible to obtain zero nstmts during analyze_only, so make > + it at least one to ensure the later computation for n_perms > + proceed. */ > + nvectors_per_build = nstmts > 0 ? nstmts : 1; > in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; > } > else > @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > return false; > } > > - ++*n_perms; > - > + tree mask_vec = NULL_TREE; > if (!analyze_only) > - { > - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); > + mask_vec = vect_gen_perm_mask_checked (vectype, indices); > > - if (second_vec_index == -1) > - second_vec_index = first_vec_index; > + if (second_vec_index == -1) > + second_vec_index = first_vec_index; > > - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > + { > + ++*n_perms; So the "real" change is doing *n_perms += nvectors_per_build; and *n_perms was unused when !analyze_only? And since at analysis time we (sometimes?) have zero nvectors you have to fixup above? Which cases are that? In principle the patch looks good to me. Richard. > + if (analyze_only) > + continue; > + /* Generate the permute statement if necessary. */ > + tree first_vec = dr_chain[first_vec_index + ri]; > + tree second_vec = dr_chain[second_vec_index + ri]; > + gassign *stmt = as_a<gassign *> (stmt_info->stmt); > + tree perm_dest > + = vect_create_destination_var (gimple_assign_lhs (stmt), > + vectype); > + perm_dest = make_ssa_name (perm_dest); > + gimple *perm_stmt > + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, > + second_vec, mask_vec); > + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > + gsi); > + if (dce_chain) > { > - /* Generate the permute statement if necessary. */ > - tree first_vec = dr_chain[first_vec_index + ri]; > - tree second_vec = dr_chain[second_vec_index + ri]; > - gassign *stmt = as_a<gassign *> (stmt_info->stmt); > - tree perm_dest > - = vect_create_destination_var (gimple_assign_lhs (stmt), > - vectype); > - perm_dest = make_ssa_name (perm_dest); > - gimple *perm_stmt > - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, > - first_vec, second_vec, mask_vec); > - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > - gsi); > - if (dce_chain) > - { > - bitmap_set_bit (used_defs, first_vec_index + ri); > - bitmap_set_bit (used_defs, second_vec_index + ri); > - } > - > - /* Store the vector statement in NODE. */ > - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > - = perm_stmt; > + bitmap_set_bit (used_defs, first_vec_index + ri); > + bitmap_set_bit (used_defs, second_vec_index + ri); > } > + > + /* Store the vector statement in NODE. */ > + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; > } > } > else if (!analyze_only) > -- > 2.39.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 2023-05-22 13:44 ` Richard Biener @ 2023-05-23 3:01 ` Kewen.Lin 2023-05-23 6:19 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2023-05-23 3:01 UTC (permalink / raw) To: Richard Biener Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner Hi Richi, Thanks for the review! on 2023/5/22 21:44, Richard Biener wrote: > On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi, >> >> Following Richi's suggestion in [1], I'm working on deferring >> cost evaluation next to the transformation, this patch is >> to enhance function vect_transform_slp_perm_load_1 which >> could under-cost for vector permutation, since the costing >> doesn't try to consider nvectors_per_build, it's inconsistent >> with the transformation part. >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >> >> Is it ok for trunk? >> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the >> calculation on n_perms by considering nvectors_per_build. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. >> --- >> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ >> gcc/tree-vect-slp.cc | 66 ++++++++++--------- >> 2 files changed, 57 insertions(+), 32 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> new file mode 100644 >> index 00000000000..e5c4dceddfb >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >> @@ -0,0 +1,23 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-require-effective-target powerpc_p9vector_ok } */ >> +/* Specify power9 to ensure the vectorization is profitable >> + and test point stands, otherwise it could be not profitable >> + to vectorize. */ >> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ >> + >> +/* Verify we cost the exact count for required vec_perm. */ >> + >> +int x[1024], y[1024]; >> + >> +void >> +foo () >> +{ >> + for (int i = 0; i < 512; ++i) >> + { >> + x[2 * i] = y[1023 - (2 * i)]; >> + x[2 * i + 1] = y[1023 - (2 * i + 1)]; >> + } >> +} >> + >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >> index e5c9d7e766e..af9a6dd4fa9 100644 >> --- a/gcc/tree-vect-slp.cc >> +++ b/gcc/tree-vect-slp.cc >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> >> mode = TYPE_MODE (vectype); >> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); >> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); >> >> /* Initialize the vect stmts of NODE to properly insert the generated >> stmts later. */ >> if (! analyze_only) >> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); >> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) >> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) >> SLP_TREE_VEC_STMTS (node).quick_push (NULL); >> >> /* Generate permutation masks for every NODE. Number of masks for each NODE >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> (b) the permutes only need a single vector input. */ >> mask.new_vector (nunits, group_size, 3); >> nelts_to_build = mask.encoded_nelts (); >> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); >> + /* It's possible to obtain zero nstmts during analyze_only, so make >> + it at least one to ensure the later computation for n_perms >> + proceed. */ >> + nvectors_per_build = nstmts > 0 ? nstmts : 1; >> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; >> } >> else >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> return false; >> } >> >> - ++*n_perms; >> - >> + tree mask_vec = NULL_TREE; >> if (!analyze_only) >> - { >> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> + mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> >> - if (second_vec_index == -1) >> - second_vec_index = first_vec_index; >> + if (second_vec_index == -1) >> + second_vec_index = first_vec_index; >> >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> + { >> + ++*n_perms; > > So the "real" change is doing > > *n_perms += nvectors_per_build; > > and *n_perms was unused when !analyze_only? And since at Yes, although both !analyze_only and analyze_only calls pass n_perms in, now only the call sites with analyze_only will use the returned n_perms further. > analysis time we (sometimes?) have zero nvectors you have to > fixup above? Which cases are that? Yes, the fixup is to avoid to result in unexpected n_perms in function vect_optimize_slp_pass::internal_node_cost。 One typical case is gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize out one more vec_perm unexpectedly. In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms is zero or not (vec_perm not needed or needed). if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL, nullptr, vf, true, false, &n_perms)) { auto rep = SLP_TREE_REPRESENTATIVE (node); if (out_layout_i == 0) { /* Use the fallback cost if the load is an N-to-N permutation. Otherwise assume that the node will be rejected later and rebuilt from scalars. */ if (STMT_VINFO_GROUPED_ACCESS (rep) && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) == SLP_TREE_LANES (node))) return fallback_cost; return 0; } return -1; } /* See the comment above the corresponding VEC_PERM_EXPR handling. */ return n_perms == 0 ? 0 : 1; In vect_optimize_slp_pass::forward_pass (), it only considers the case that factor > 0 (there is some vec_perm needed). /* Accumulate the cost of using LAYOUT_I within NODE, both for the inputs and the outputs. */ int factor = internal_node_cost (vertex.node, layout_i, layout_i); if (factor < 0) { is_possible = false; break; } else if (factor) layout_costs.internal_cost.add_serial_cost ({ vertex.weight * factor, m_optimize_size }); BR, Kewen > > In principle the patch looks good to me. > > Richard. > >> + if (analyze_only) >> + continue; >> + /* Generate the permute statement if necessary. */ >> + tree first_vec = dr_chain[first_vec_index + ri]; >> + tree second_vec = dr_chain[second_vec_index + ri]; >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt); >> + tree perm_dest >> + = vect_create_destination_var (gimple_assign_lhs (stmt), >> + vectype); >> + perm_dest = make_ssa_name (perm_dest); >> + gimple *perm_stmt >> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, >> + second_vec, mask_vec); >> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, >> + gsi); >> + if (dce_chain) >> { >> - /* Generate the permute statement if necessary. */ >> - tree first_vec = dr_chain[first_vec_index + ri]; >> - tree second_vec = dr_chain[second_vec_index + ri]; >> - gassign *stmt = as_a<gassign *> (stmt_info->stmt); >> - tree perm_dest >> - = vect_create_destination_var (gimple_assign_lhs (stmt), >> - vectype); >> - perm_dest = make_ssa_name (perm_dest); >> - gimple *perm_stmt >> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, >> - first_vec, second_vec, mask_vec); >> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, >> - gsi); >> - if (dce_chain) >> - { >> - bitmap_set_bit (used_defs, first_vec_index + ri); >> - bitmap_set_bit (used_defs, second_vec_index + ri); >> - } >> - >> - /* Store the vector statement in NODE. */ >> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] >> - = perm_stmt; >> + bitmap_set_bit (used_defs, first_vec_index + ri); >> + bitmap_set_bit (used_defs, second_vec_index + ri); >> } >> + >> + /* Store the vector statement in NODE. */ >> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; >> } >> } >> else if (!analyze_only) >> -- >> 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 2023-05-23 3:01 ` Kewen.Lin @ 2023-05-23 6:19 ` Richard Biener 2023-05-24 5:23 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2023-05-23 6:19 UTC (permalink / raw) To: Kewen.Lin Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for the review! > > on 2023/5/22 21:44, Richard Biener wrote: > > On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> Following Richi's suggestion in [1], I'm working on deferring > >> cost evaluation next to the transformation, this patch is > >> to enhance function vect_transform_slp_perm_load_1 which > >> could under-cost for vector permutation, since the costing > >> doesn't try to consider nvectors_per_build, it's inconsistent > >> with the transformation part. > >> > >> Bootstrapped and regtested on x86_64-redhat-linux, > >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > >> > >> Is it ok for trunk? > >> > >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html > >> > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the > >> calculation on n_perms by considering nvectors_per_build. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. > >> --- > >> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ > >> gcc/tree-vect-slp.cc | 66 ++++++++++--------- > >> 2 files changed, 57 insertions(+), 32 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > >> > >> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > >> new file mode 100644 > >> index 00000000000..e5c4dceddfb > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c > >> @@ -0,0 +1,23 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-require-effective-target vect_int } */ > >> +/* { dg-require-effective-target powerpc_p9vector_ok } */ > >> +/* Specify power9 to ensure the vectorization is profitable > >> + and test point stands, otherwise it could be not profitable > >> + to vectorize. */ > >> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ > >> + > >> +/* Verify we cost the exact count for required vec_perm. */ > >> + > >> +int x[1024], y[1024]; > >> + > >> +void > >> +foo () > >> +{ > >> + for (int i = 0; i < 512; ++i) > >> + { > >> + x[2 * i] = y[1023 - (2 * i)]; > >> + x[2 * i + 1] = y[1023 - (2 * i + 1)]; > >> + } > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ > >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > >> index e5c9d7e766e..af9a6dd4fa9 100644 > >> --- a/gcc/tree-vect-slp.cc > >> +++ b/gcc/tree-vect-slp.cc > >> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > >> > >> mode = TYPE_MODE (vectype); > >> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > >> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); > >> > >> /* Initialize the vect stmts of NODE to properly insert the generated > >> stmts later. */ > >> if (! analyze_only) > >> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); > >> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) > >> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) > >> SLP_TREE_VEC_STMTS (node).quick_push (NULL); > >> > >> /* Generate permutation masks for every NODE. Number of masks for each NODE > >> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > >> (b) the permutes only need a single vector input. */ > >> mask.new_vector (nunits, group_size, 3); > >> nelts_to_build = mask.encoded_nelts (); > >> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); > >> + /* It's possible to obtain zero nstmts during analyze_only, so make > >> + it at least one to ensure the later computation for n_perms > >> + proceed. */ > >> + nvectors_per_build = nstmts > 0 ? nstmts : 1; > >> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; > >> } > >> else > >> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > >> return false; > >> } > >> > >> - ++*n_perms; > >> - > >> + tree mask_vec = NULL_TREE; > >> if (!analyze_only) > >> - { > >> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); > >> + mask_vec = vect_gen_perm_mask_checked (vectype, indices); > >> > >> - if (second_vec_index == -1) > >> - second_vec_index = first_vec_index; > >> + if (second_vec_index == -1) > >> + second_vec_index = first_vec_index; > >> > >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > >> + { > >> + ++*n_perms; > > > > So the "real" change is doing > > > > *n_perms += nvectors_per_build; > > > > and *n_perms was unused when !analyze_only? And since at > > Yes, although both !analyze_only and analyze_only calls pass n_perms in, now > only the call sites with analyze_only will use the returned n_perms further. > > > analysis time we (sometimes?) have zero nvectors you have to > > fixup above? Which cases are that? > > Yes, the fixup is to avoid to result in unexpected n_perms in function > vect_optimize_slp_pass::internal_node_cost。 One typical case is > gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize > out one more vec_perm unexpectedly. > > In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms > is zero or not (vec_perm not needed or needed). > > if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL, > nullptr, vf, true, false, &n_perms)) > { > auto rep = SLP_TREE_REPRESENTATIVE (node); > if (out_layout_i == 0) > { > /* Use the fallback cost if the load is an N-to-N permutation. > Otherwise assume that the node will be rejected later > and rebuilt from scalars. */ > if (STMT_VINFO_GROUPED_ACCESS (rep) > && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) > == SLP_TREE_LANES (node))) > return fallback_cost; > return 0; > } > return -1; > } > > /* See the comment above the corresponding VEC_PERM_EXPR handling. */ > return n_perms == 0 ? 0 : 1; > > In vect_optimize_slp_pass::forward_pass (), it only considers the case that > factor > 0 (there is some vec_perm needed). > > /* Accumulate the cost of using LAYOUT_I within NODE, > both for the inputs and the outputs. */ > int factor = internal_node_cost (vertex.node, layout_i, > layout_i); > if (factor < 0) > { > is_possible = false; > break; > } > else if (factor) > layout_costs.internal_cost.add_serial_cost > ({ vertex.weight * factor, m_optimize_size }); Ah, OK - thanks for clarifying. The patch is OK. Richard. > BR, > Kewen > > > > > In principle the patch looks good to me. > > > > Richard. > > > >> + if (analyze_only) > >> + continue; > >> + /* Generate the permute statement if necessary. */ > >> + tree first_vec = dr_chain[first_vec_index + ri]; > >> + tree second_vec = dr_chain[second_vec_index + ri]; > >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt); > >> + tree perm_dest > >> + = vect_create_destination_var (gimple_assign_lhs (stmt), > >> + vectype); > >> + perm_dest = make_ssa_name (perm_dest); > >> + gimple *perm_stmt > >> + = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec, > >> + second_vec, mask_vec); > >> + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > >> + gsi); > >> + if (dce_chain) > >> { > >> - /* Generate the permute statement if necessary. */ > >> - tree first_vec = dr_chain[first_vec_index + ri]; > >> - tree second_vec = dr_chain[second_vec_index + ri]; > >> - gassign *stmt = as_a<gassign *> (stmt_info->stmt); > >> - tree perm_dest > >> - = vect_create_destination_var (gimple_assign_lhs (stmt), > >> - vectype); > >> - perm_dest = make_ssa_name (perm_dest); > >> - gimple *perm_stmt > >> - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, > >> - first_vec, second_vec, mask_vec); > >> - vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > >> - gsi); > >> - if (dce_chain) > >> - { > >> - bitmap_set_bit (used_defs, first_vec_index + ri); > >> - bitmap_set_bit (used_defs, second_vec_index + ri); > >> - } > >> - > >> - /* Store the vector statement in NODE. */ > >> - SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > >> - = perm_stmt; > >> + bitmap_set_bit (used_defs, first_vec_index + ri); > >> + bitmap_set_bit (used_defs, second_vec_index + ri); > >> } > >> + > >> + /* Store the vector statement in NODE. */ > >> + SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; > >> } > >> } > >> else if (!analyze_only) > >> -- > >> 2.39.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1 2023-05-23 6:19 ` Richard Biener @ 2023-05-24 5:23 ` Kewen.Lin 0 siblings, 0 replies; 10+ messages in thread From: Kewen.Lin @ 2023-05-24 5:23 UTC (permalink / raw) To: Richard Biener Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner on 2023/5/23 14:19, Richard Biener wrote: > On Tue, May 23, 2023 at 5:01 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi Richi, >> >> Thanks for the review! >> >> on 2023/5/22 21:44, Richard Biener wrote: >>> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> Hi, >>>> >>>> Following Richi's suggestion in [1], I'm working on deferring >>>> cost evaluation next to the transformation, this patch is >>>> to enhance function vect_transform_slp_perm_load_1 which >>>> could under-cost for vector permutation, since the costing >>>> doesn't try to consider nvectors_per_build, it's inconsistent >>>> with the transformation part. >>>> >>>> Bootstrapped and regtested on x86_64-redhat-linux, >>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >>>> >>>> Is it ok for trunk? >>>> >>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html >>>> >>>> BR, >>>> Kewen >>>> ----- >>>> gcc/ChangeLog: >>>> >>>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the >>>> calculation on n_perms by considering nvectors_per_build. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test. >>>> --- >>>> .../vect/costmodel/ppc/costmodel-slp-perm.c | 23 +++++++ >>>> gcc/tree-vect-slp.cc | 66 ++++++++++--------- >>>> 2 files changed, 57 insertions(+), 32 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> >>>> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> new file mode 100644 >>>> index 00000000000..e5c4dceddfb >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c >>>> @@ -0,0 +1,23 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-require-effective-target vect_int } */ >>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */ >>>> +/* Specify power9 to ensure the vectorization is profitable >>>> + and test point stands, otherwise it could be not profitable >>>> + to vectorize. */ >>>> +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */ >>>> + >>>> +/* Verify we cost the exact count for required vec_perm. */ >>>> + >>>> +int x[1024], y[1024]; >>>> + >>>> +void >>>> +foo () >>>> +{ >>>> + for (int i = 0; i < 512; ++i) >>>> + { >>>> + x[2 * i] = y[1023 - (2 * i)]; >>>> + x[2 * i + 1] = y[1023 - (2 * i + 1)]; >>>> + } >>>> +} >>>> + >>>> +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */ >>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >>>> index e5c9d7e766e..af9a6dd4fa9 100644 >>>> --- a/gcc/tree-vect-slp.cc >>>> +++ b/gcc/tree-vect-slp.cc >>>> @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> >>>> mode = TYPE_MODE (vectype); >>>> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); >>>> + unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node); >>>> >>>> /* Initialize the vect stmts of NODE to properly insert the generated >>>> stmts later. */ >>>> if (! analyze_only) >>>> - for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); >>>> - i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) >>>> + for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++) >>>> SLP_TREE_VEC_STMTS (node).quick_push (NULL); >>>> >>>> /* Generate permutation masks for every NODE. Number of masks for each NODE >>>> @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> (b) the permutes only need a single vector input. */ >>>> mask.new_vector (nunits, group_size, 3); >>>> nelts_to_build = mask.encoded_nelts (); >>>> - nvectors_per_build = SLP_TREE_VEC_STMTS (node).length (); >>>> + /* It's possible to obtain zero nstmts during analyze_only, so make >>>> + it at least one to ensure the later computation for n_perms >>>> + proceed. */ >>>> + nvectors_per_build = nstmts > 0 ? nstmts : 1; >>>> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3; >>>> } >>>> else >>>> @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >>>> return false; >>>> } >>>> >>>> - ++*n_perms; >>>> - >>>> + tree mask_vec = NULL_TREE; >>>> if (!analyze_only) >>>> - { >>>> - tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); >>>> + mask_vec = vect_gen_perm_mask_checked (vectype, indices); >>>> >>>> - if (second_vec_index == -1) >>>> - second_vec_index = first_vec_index; >>>> + if (second_vec_index == -1) >>>> + second_vec_index = first_vec_index; >>>> >>>> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >>>> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >>>> + { >>>> + ++*n_perms; >>> >>> So the "real" change is doing >>> >>> *n_perms += nvectors_per_build; >>> >>> and *n_perms was unused when !analyze_only? And since at >> >> Yes, although both !analyze_only and analyze_only calls pass n_perms in, now >> only the call sites with analyze_only will use the returned n_perms further. >> >>> analysis time we (sometimes?) have zero nvectors you have to >>> fixup above? Which cases are that? >> >> Yes, the fixup is to avoid to result in unexpected n_perms in function >> vect_optimize_slp_pass::internal_node_cost。 One typical case is >> gcc.dg/vect/bb-slp-50.c, without special casing zero, slp2 fails to optimize >> out one more vec_perm unexpectedly. >> >> In vect_optimize_slp_pass::internal_node_cost, it checks if the returned n_perms >> is zero or not (vec_perm not needed or needed). >> >> if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL, >> nullptr, vf, true, false, &n_perms)) >> { >> auto rep = SLP_TREE_REPRESENTATIVE (node); >> if (out_layout_i == 0) >> { >> /* Use the fallback cost if the load is an N-to-N permutation. >> Otherwise assume that the node will be rejected later >> and rebuilt from scalars. */ >> if (STMT_VINFO_GROUPED_ACCESS (rep) >> && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep)) >> == SLP_TREE_LANES (node))) >> return fallback_cost; >> return 0; >> } >> return -1; >> } >> >> /* See the comment above the corresponding VEC_PERM_EXPR handling. */ >> return n_perms == 0 ? 0 : 1; >> >> In vect_optimize_slp_pass::forward_pass (), it only considers the case that >> factor > 0 (there is some vec_perm needed). >> >> /* Accumulate the cost of using LAYOUT_I within NODE, >> both for the inputs and the outputs. */ >> int factor = internal_node_cost (vertex.node, layout_i, >> layout_i); >> if (factor < 0) >> { >> is_possible = false; >> break; >> } >> else if (factor) >> layout_costs.internal_cost.add_serial_cost >> ({ vertex.weight * factor, m_optimize_size }); > > Ah, OK - thanks for clarifying. > > The patch is OK. Thanks! Committed in r14-1151. BR, Kewen ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin 2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin @ 2023-05-17 6:34 ` Richard Biener 2023-05-17 7:18 ` Kewen.Lin 1 sibling, 1 reply; 10+ messages in thread From: Richard Biener @ 2023-05-17 6:34 UTC (permalink / raw) To: Kewen.Lin Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi, > > This patch is to refactor the handlings for the case (index > == count) in a loop of vect_transform_slp_perm_load_1, in > order to prepare a subsequent adjustment on *nperm. This > patch doesn't have any functional changes. The diff is impossible to be reviewed - can you explain the refactoring you have done or also attach a patch more clearly showing what you change? > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the > handling on the case index == count. > --- > gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++---------------------- > 1 file changed, 44 insertions(+), 45 deletions(-) > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 3b7a21724ec..e5c9d7e766e 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > noop_p = false; > mask[index++] = mask_element; > > - if (index == count && !noop_p) > + if (index == count) > { > - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); > - if (!can_vec_perm_const_p (mode, mode, indices)) > + if (!noop_p) > { > - if (dump_p) > + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); > + if (!can_vec_perm_const_p (mode, mode, indices)) > { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, > - vect_location, > - "unsupported vect permute { "); > - for (i = 0; i < count; ++i) > + if (dump_p) > { > - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); > - dump_printf (MSG_MISSED_OPTIMIZATION, " "); > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "unsupported vect permute { "); > + for (i = 0; i < count; ++i) > + { > + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); > + dump_printf (MSG_MISSED_OPTIMIZATION, " "); > + } > + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); > } > - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); > + gcc_assert (analyze_only); > + return false; > } > - gcc_assert (analyze_only); > - return false; > - } > > - ++*n_perms; > - } > + ++*n_perms; > > - if (index == count) > - { > - if (!analyze_only) > - { > - tree mask_vec = NULL_TREE; > - > - if (! noop_p) > - mask_vec = vect_gen_perm_mask_checked (vectype, indices); > + if (!analyze_only) > + { > + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); > > - if (second_vec_index == -1) > - second_vec_index = first_vec_index; > + if (second_vec_index == -1) > + second_vec_index = first_vec_index; > > - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > - { > - /* Generate the permute statement if necessary. */ > - tree first_vec = dr_chain[first_vec_index + ri]; > - tree second_vec = dr_chain[second_vec_index + ri]; > - gimple *perm_stmt; > - if (! noop_p) > + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > { > - gassign *stmt = as_a <gassign *> (stmt_info->stmt); > + /* Generate the permute statement if necessary. */ > + tree first_vec = dr_chain[first_vec_index + ri]; > + tree second_vec = dr_chain[second_vec_index + ri]; > + gassign *stmt = as_a<gassign *> (stmt_info->stmt); > tree perm_dest > = vect_create_destination_var (gimple_assign_lhs (stmt), > vectype); > perm_dest = make_ssa_name (perm_dest); > - perm_stmt > + gimple *perm_stmt > = gimple_build_assign (perm_dest, VEC_PERM_EXPR, > - first_vec, second_vec, > - mask_vec); > + first_vec, second_vec, mask_vec); > vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > gsi); > if (dce_chain) > @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > bitmap_set_bit (used_defs, first_vec_index + ri); > bitmap_set_bit (used_defs, second_vec_index + ri); > } > + > + /* Store the vector statement in NODE. */ > + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > + = perm_stmt; > } > - else > - { > - /* If mask was NULL_TREE generate the requested > - identity transform. */ > - perm_stmt = SSA_NAME_DEF_STMT (first_vec); > - if (dce_chain) > - bitmap_set_bit (used_defs, first_vec_index + ri); > - } > + } > + } > + else if (!analyze_only) > + { > + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > + { > + tree first_vec = dr_chain[first_vec_index + ri]; > + /* If mask was NULL_TREE generate the requested > + identity transform. */ > + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec); > + if (dce_chain) > + bitmap_set_bit (used_defs, first_vec_index + ri); > > /* Store the vector statement in NODE. */ > SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; > -- > 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener @ 2023-05-17 7:18 ` Kewen.Lin 2023-05-18 6:12 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Kewen.Lin @ 2023-05-17 7:18 UTC (permalink / raw) To: Richard Biener Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner Hi Richi, on 2023/5/17 14:34, Richard Biener wrote: > On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi, >> >> This patch is to refactor the handlings for the case (index >> == count) in a loop of vect_transform_slp_perm_load_1, in >> order to prepare a subsequent adjustment on *nperm. This >> patch doesn't have any functional changes. > > The diff is impossible to be reviewed - can you explain the > refactoring you have done or also attach a patch more clearly > showing what you change? Sorry, I should have made it more clear. It mainly to combine these two hunks: if (index == count && !noop_p) { // A ... // ++*n_perms; } if (index == count) { if (!analyze_only) { if (!noop_p) // B1 ... // B2 ... for ... { if (!noop_p) // B3 building VEC_PERM_EXPR else // B4 building nothing (no uses for B2 and its seq) } } // B5 } The former can be part of the latter, so it becomes to: if (index == count) { if (!noop_p) { // A ... // ++*n_perms; if (!analyze_only) { // B1 ... // B2 ... for ... // B3 building VEC_PERM_EXPR } } else if (!analyze_only) { // no B2 since no any further uses here. for ... // B4 building nothing } // B5 ... } But it's mainly the basic for the subsequent patch for consistent n_perms calculation, the patch 2/2 is to make it further become to: if (index == count) { if (!noop_p) { // A ... if (!analyze_only) // B1 ... // B2 ... (trivial computations during analyze_only or not) for ... { // ++*n_perms; (now n_perms is consistent with building VEC_PERM_EXPR) if (analyze_only) continue; // B3 building VEC_PERM_EXPR } } else if (!analyze_only) { // no B2 since no any further uses here. for ... // B4 building nothing } // B5 ... } BR, Kewen > >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the >> handling on the case index == count. >> --- >> gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++---------------------- >> 1 file changed, 44 insertions(+), 45 deletions(-) >> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >> index 3b7a21724ec..e5c9d7e766e 100644 >> --- a/gcc/tree-vect-slp.cc >> +++ b/gcc/tree-vect-slp.cc >> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> noop_p = false; >> mask[index++] = mask_element; >> >> - if (index == count && !noop_p) >> + if (index == count) >> { >> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); >> - if (!can_vec_perm_const_p (mode, mode, indices)) >> + if (!noop_p) >> { >> - if (dump_p) >> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); >> + if (!can_vec_perm_const_p (mode, mode, indices)) >> { >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, >> - vect_location, >> - "unsupported vect permute { "); >> - for (i = 0; i < count; ++i) >> + if (dump_p) >> { >> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); >> - dump_printf (MSG_MISSED_OPTIMIZATION, " "); >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "unsupported vect permute { "); >> + for (i = 0; i < count; ++i) >> + { >> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); >> + dump_printf (MSG_MISSED_OPTIMIZATION, " "); >> + } >> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); >> } >> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); >> + gcc_assert (analyze_only); >> + return false; >> } >> - gcc_assert (analyze_only); >> - return false; >> - } >> >> - ++*n_perms; >> - } >> + ++*n_perms; >> >> - if (index == count) >> - { >> - if (!analyze_only) >> - { >> - tree mask_vec = NULL_TREE; >> - >> - if (! noop_p) >> - mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> + if (!analyze_only) >> + { >> + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); >> >> - if (second_vec_index == -1) >> - second_vec_index = first_vec_index; >> + if (second_vec_index == -1) >> + second_vec_index = first_vec_index; >> >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> - { >> - /* Generate the permute statement if necessary. */ >> - tree first_vec = dr_chain[first_vec_index + ri]; >> - tree second_vec = dr_chain[second_vec_index + ri]; >> - gimple *perm_stmt; >> - if (! noop_p) >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> { >> - gassign *stmt = as_a <gassign *> (stmt_info->stmt); >> + /* Generate the permute statement if necessary. */ >> + tree first_vec = dr_chain[first_vec_index + ri]; >> + tree second_vec = dr_chain[second_vec_index + ri]; >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt); >> tree perm_dest >> = vect_create_destination_var (gimple_assign_lhs (stmt), >> vectype); >> perm_dest = make_ssa_name (perm_dest); >> - perm_stmt >> + gimple *perm_stmt >> = gimple_build_assign (perm_dest, VEC_PERM_EXPR, >> - first_vec, second_vec, >> - mask_vec); >> + first_vec, second_vec, mask_vec); >> vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, >> gsi); >> if (dce_chain) >> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, >> bitmap_set_bit (used_defs, first_vec_index + ri); >> bitmap_set_bit (used_defs, second_vec_index + ri); >> } >> + >> + /* Store the vector statement in NODE. */ >> + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] >> + = perm_stmt; >> } >> - else >> - { >> - /* If mask was NULL_TREE generate the requested >> - identity transform. */ >> - perm_stmt = SSA_NAME_DEF_STMT (first_vec); >> - if (dce_chain) >> - bitmap_set_bit (used_defs, first_vec_index + ri); >> - } >> + } >> + } >> + else if (!analyze_only) >> + { >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) >> + { >> + tree first_vec = dr_chain[first_vec_index + ri]; >> + /* If mask was NULL_TREE generate the requested >> + identity transform. */ >> + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec); >> + if (dce_chain) >> + bitmap_set_bit (used_defs, first_vec_index + ri); >> >> /* Store the vector statement in NODE. */ >> SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; >> -- >> 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 2023-05-17 7:18 ` Kewen.Lin @ 2023-05-18 6:12 ` Richard Biener 2023-05-22 5:37 ` Kewen.Lin 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2023-05-18 6:12 UTC (permalink / raw) To: Kewen.Lin Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi Richi, > > on 2023/5/17 14:34, Richard Biener wrote: > > On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> This patch is to refactor the handlings for the case (index > >> == count) in a loop of vect_transform_slp_perm_load_1, in > >> order to prepare a subsequent adjustment on *nperm. This > >> patch doesn't have any functional changes. > > > > The diff is impossible to be reviewed - can you explain the > > refactoring you have done or also attach a patch more clearly > > showing what you change? > > Sorry, I should have made it more clear. > It mainly to combine these two hunks: > > if (index == count && !noop_p) > { > // A ... > // ++*n_perms; > } > > if (index == count) > { > if (!analyze_only) > { > if (!noop_p) > // B1 ... > > // B2 ... > > for ... > { > if (!noop_p) > // B3 building VEC_PERM_EXPR > else > // B4 building nothing (no uses for B2 and its seq) > } > } > // B5 > } > > The former can be part of the latter, so it becomes to: > > if (index == count) > { > if (!noop_p) > { > // A ... > // ++*n_perms; > > if (!analyze_only) > { > // B1 ... > // B2 ... > for ... > // B3 building VEC_PERM_EXPR > } > } > else if (!analyze_only) > { > // no B2 since no any further uses here. > for ... > // B4 building nothing > } > // B5 ... > } Ah, thanks - that made reviewing easy. 1/2 is OK for trunk. Thanks, Richard. > But it's mainly the basic for the subsequent patch for consistent n_perms calculation, > the patch 2/2 is to make it further become to: > > if (index == count) > { > if (!noop_p) > { > // A ... > > if (!analyze_only) > // B1 ... > > // B2 ... (trivial computations during analyze_only or not) > > for ... > { > // ++*n_perms; (now n_perms is consistent with building VEC_PERM_EXPR) > if (analyze_only) > continue; > // B3 building VEC_PERM_EXPR > } > } > else if (!analyze_only) > { > // no B2 since no any further uses here. > for ... > // B4 building nothing > } > // B5 ... > } > > BR, > Kewen > > > > > >> Bootstrapped and regtested on x86_64-redhat-linux, > >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > >> > >> BR, > >> Kewen > >> ----- > >> gcc/ChangeLog: > >> > >> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the > >> handling on the case index == count. > >> --- > >> gcc/tree-vect-slp.cc | 89 ++++++++++++++++++++++---------------------- > >> 1 file changed, 44 insertions(+), 45 deletions(-) > >> > >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > >> index 3b7a21724ec..e5c9d7e766e 100644 > >> --- a/gcc/tree-vect-slp.cc > >> +++ b/gcc/tree-vect-slp.cc > >> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > >> noop_p = false; > >> mask[index++] = mask_element; > >> > >> - if (index == count && !noop_p) > >> + if (index == count) > >> { > >> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); > >> - if (!can_vec_perm_const_p (mode, mode, indices)) > >> + if (!noop_p) > >> { > >> - if (dump_p) > >> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits); > >> + if (!can_vec_perm_const_p (mode, mode, indices)) > >> { > >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, > >> - vect_location, > >> - "unsupported vect permute { "); > >> - for (i = 0; i < count; ++i) > >> + if (dump_p) > >> { > >> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); > >> - dump_printf (MSG_MISSED_OPTIMIZATION, " "); > >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> + "unsupported vect permute { "); > >> + for (i = 0; i < count; ++i) > >> + { > >> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]); > >> + dump_printf (MSG_MISSED_OPTIMIZATION, " "); > >> + } > >> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); > >> } > >> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); > >> + gcc_assert (analyze_only); > >> + return false; > >> } > >> - gcc_assert (analyze_only); > >> - return false; > >> - } > >> > >> - ++*n_perms; > >> - } > >> + ++*n_perms; > >> > >> - if (index == count) > >> - { > >> - if (!analyze_only) > >> - { > >> - tree mask_vec = NULL_TREE; > >> - > >> - if (! noop_p) > >> - mask_vec = vect_gen_perm_mask_checked (vectype, indices); > >> + if (!analyze_only) > >> + { > >> + tree mask_vec = vect_gen_perm_mask_checked (vectype, indices); > >> > >> - if (second_vec_index == -1) > >> - second_vec_index = first_vec_index; > >> + if (second_vec_index == -1) > >> + second_vec_index = first_vec_index; > >> > >> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > >> - { > >> - /* Generate the permute statement if necessary. */ > >> - tree first_vec = dr_chain[first_vec_index + ri]; > >> - tree second_vec = dr_chain[second_vec_index + ri]; > >> - gimple *perm_stmt; > >> - if (! noop_p) > >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > >> { > >> - gassign *stmt = as_a <gassign *> (stmt_info->stmt); > >> + /* Generate the permute statement if necessary. */ > >> + tree first_vec = dr_chain[first_vec_index + ri]; > >> + tree second_vec = dr_chain[second_vec_index + ri]; > >> + gassign *stmt = as_a<gassign *> (stmt_info->stmt); > >> tree perm_dest > >> = vect_create_destination_var (gimple_assign_lhs (stmt), > >> vectype); > >> perm_dest = make_ssa_name (perm_dest); > >> - perm_stmt > >> + gimple *perm_stmt > >> = gimple_build_assign (perm_dest, VEC_PERM_EXPR, > >> - first_vec, second_vec, > >> - mask_vec); > >> + first_vec, second_vec, mask_vec); > >> vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, > >> gsi); > >> if (dce_chain) > >> @@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, slp_tree node, > >> bitmap_set_bit (used_defs, first_vec_index + ri); > >> bitmap_set_bit (used_defs, second_vec_index + ri); > >> } > >> + > >> + /* Store the vector statement in NODE. */ > >> + SLP_TREE_VEC_STMTS (node) [vect_stmts_counter++] > >> + = perm_stmt; > >> } > >> - else > >> - { > >> - /* If mask was NULL_TREE generate the requested > >> - identity transform. */ > >> - perm_stmt = SSA_NAME_DEF_STMT (first_vec); > >> - if (dce_chain) > >> - bitmap_set_bit (used_defs, first_vec_index + ri); > >> - } > >> + } > >> + } > >> + else if (!analyze_only) > >> + { > >> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri) > >> + { > >> + tree first_vec = dr_chain[first_vec_index + ri]; > >> + /* If mask was NULL_TREE generate the requested > >> + identity transform. */ > >> + gimple *perm_stmt = SSA_NAME_DEF_STMT (first_vec); > >> + if (dce_chain) > >> + bitmap_set_bit (used_defs, first_vec_index + ri); > >> > >> /* Store the vector statement in NODE. */ > >> SLP_TREE_VEC_STMTS (node)[vect_stmts_counter++] = perm_stmt; > >> -- > >> 2.39.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 2023-05-18 6:12 ` Richard Biener @ 2023-05-22 5:37 ` Kewen.Lin 0 siblings, 0 replies; 10+ messages in thread From: Kewen.Lin @ 2023-05-22 5:37 UTC (permalink / raw) To: Richard Biener Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Peter Bergner on 2023/5/18 14:12, Richard Biener wrote: > On Wed, May 17, 2023 at 9:19 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi Richi, >> >> on 2023/5/17 14:34, Richard Biener wrote: >>> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> Hi, >>>> >>>> This patch is to refactor the handlings for the case (index >>>> == count) in a loop of vect_transform_slp_perm_load_1, in >>>> order to prepare a subsequent adjustment on *nperm. This >>>> patch doesn't have any functional changes. >>> >>> The diff is impossible to be reviewed - can you explain the >>> refactoring you have done or also attach a patch more clearly >>> showing what you change? >> >> Sorry, I should have made it more clear. >> It mainly to combine these two hunks: >> >> if (index == count && !noop_p) >> { >> // A ... >> // ++*n_perms; >> } >> >> if (index == count) >> { >> if (!analyze_only) >> { >> if (!noop_p) >> // B1 ... >> >> // B2 ... >> >> for ... >> { >> if (!noop_p) >> // B3 building VEC_PERM_EXPR >> else >> // B4 building nothing (no uses for B2 and its seq) >> } >> } >> // B5 >> } >> >> The former can be part of the latter, so it becomes to: >> >> if (index == count) >> { >> if (!noop_p) >> { >> // A ... >> // ++*n_perms; >> >> if (!analyze_only) >> { >> // B1 ... >> // B2 ... >> for ... >> // B3 building VEC_PERM_EXPR >> } >> } >> else if (!analyze_only) >> { >> // no B2 since no any further uses here. >> for ... >> // B4 building nothing >> } >> // B5 ... >> } > > Ah, thanks - that made reviewing easy. 1/2 is OK for trunk. Thanks for the review! Pushed as r14-1028. BR, Kewen ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-24 5:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-17 6:09 [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1 Kewen.Lin 2023-05-17 6:15 ` [PATCH 2/2] vect: Enhance cost evaluation " Kewen.Lin 2023-05-22 13:44 ` Richard Biener 2023-05-23 3:01 ` Kewen.Lin 2023-05-23 6:19 ` Richard Biener 2023-05-24 5:23 ` Kewen.Lin 2023-05-17 6:34 ` [PATCH 1/2] vect: Refactor code for index == count " Richard Biener 2023-05-17 7:18 ` Kewen.Lin 2023-05-18 6:12 ` Richard Biener 2023-05-22 5:37 ` Kewen.Lin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).