* [PATCH] Support reduction def re-use for epilogue with different vector size @ 2021-07-13 12:09 Richard Biener 2021-07-13 14:17 ` Richard Sandiford 2021-07-15 12:25 ` Christophe Lyon 0 siblings, 2 replies; 5+ messages in thread From: Richard Biener @ 2021-07-13 12:09 UTC (permalink / raw) To: gcc-patches; +Cc: richard.sandiford The following adds support for re-using the vector reduction def from the main loop in vectorized epilogue loops on architectures which use different vector sizes for the epilogue. That's only x86 as far as I am aware. vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap & regtest in progress. There's costing issues on x86 which usually prevent vectorizing an epilogue with a reduction, at least for loops that only have a reduction - it could be mitigated by not accounting for the epilogue there if we can compute that we can re-use the main loops cost. Richard - did I figure the correct place to adjust? I guess adjusting accumulator->reduc_input in vect_transform_cycle_phi for re-use by the skip code in vect_create_epilog_for_reduction is a bit awkward but at least we're conciously doing vect_create_epilog_for_reduction last (via vectorizing live operations). OK in the unlikely case all testing succeeds (I also want to run it through SPEC with/without -fno-vect-cost-model which will take some time)? Thanks, Richard. 2021-07-13 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle vector types where the old vector type has a multiple of the new vector type elements. (vect_create_partial_epilog): New function, split out from... (vect_create_epilog_for_reduction): ... here. (vect_transform_cycle_phi): Reduce the re-used accumulator to the new vector type. * gcc.target/i386/vect-reduc-1.c: New testcase. --- gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ gcc/tree-vect-loop.c | 223 ++++++++++++------- 2 files changed, 155 insertions(+), 85 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c new file mode 100644 index 00000000000..9ee9ba4e736 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */ + +#define N 32 +int foo (int *a, int n) +{ + int sum = 1; + for (int i = 0; i < 8*N + 4; ++i) + sum += a[i]; + return sum; +} + +/* The reduction epilog should be vectorized and the accumulator + re-used. */ +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */ +/* { dg-final { scan-assembler-times "psrl" 2 } } */ +/* { dg-final { scan-assembler-times "padd" 5 } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 8c27d75f889..98e2a845629 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo, ones as well. */ tree vectype = STMT_VINFO_VECTYPE (reduc_info); tree old_vectype = TREE_TYPE (accumulator->reduc_input); - if (!useless_type_conversion_p (old_vectype, vectype)) + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), + TYPE_VECTOR_SUBPARTS (vectype))) return false; /* Non-SLP reductions might apply an adjustment after the reduction @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo, return true; } +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */ + +static tree +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code, + gimple_seq *seq) +{ + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant (); + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + tree stype = TREE_TYPE (vectype); + tree new_temp = vec_def; + while (nunits > nunits1) + { + nunits /= 2; + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), + stype, nunits); + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); + + /* The target has to make sure we support lowpart/highpart + extraction, either via direct vector extract or through + an integer mode punning. */ + tree dst1, dst2; + gimple *epilog_stmt; + if (convert_optab_handler (vec_extract_optab, + TYPE_MODE (TREE_TYPE (new_temp)), + TYPE_MODE (vectype1)) + != CODE_FOR_nothing) + { + /* Extract sub-vectors directly once vec_extract becomes + a conversion optab. */ + dst1 = make_ssa_name (vectype1); + epilog_stmt + = gimple_build_assign (dst1, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, vectype1, + new_temp, TYPE_SIZE (vectype1), + bitsize_int (0))); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + dst2 = make_ssa_name (vectype1); + epilog_stmt + = gimple_build_assign (dst2, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, vectype1, + new_temp, TYPE_SIZE (vectype1), + bitsize_int (bitsize))); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + } + else + { + /* Extract via punning to appropriately sized integer mode + vector. */ + tree eltype = build_nonstandard_integer_type (bitsize, 1); + tree etype = build_vector_type (eltype, 2); + gcc_assert (convert_optab_handler (vec_extract_optab, + TYPE_MODE (etype), + TYPE_MODE (eltype)) + != CODE_FOR_nothing); + tree tem = make_ssa_name (etype); + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + etype, new_temp)); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + new_temp = tem; + tem = make_ssa_name (eltype); + epilog_stmt + = gimple_build_assign (tem, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, eltype, + new_temp, TYPE_SIZE (eltype), + bitsize_int (0))); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + dst1 = make_ssa_name (vectype1); + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype1, tem)); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + tem = make_ssa_name (eltype); + epilog_stmt + = gimple_build_assign (tem, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, eltype, + new_temp, TYPE_SIZE (eltype), + bitsize_int (bitsize))); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + dst2 = make_ssa_name (vectype1); + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype1, tem)); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + } + + new_temp = make_ssa_name (vectype1); + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); + gimple_seq_add_stmt_without_update (seq, epilog_stmt); + } + + return new_temp; +} + /* Function vect_create_epilog_for_reduction Create code at the loop-epilog to finalize the result of a reduction @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, /* First reduce the vector to the desired vector size we should do shift reduction on by combining upper and lower halves. */ - new_temp = reduc_inputs[0]; - while (nunits > nunits1) - { - nunits /= 2; - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), - stype, nunits); - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); - - /* The target has to make sure we support lowpart/highpart - extraction, either via direct vector extract or through - an integer mode punning. */ - tree dst1, dst2; - if (convert_optab_handler (vec_extract_optab, - TYPE_MODE (TREE_TYPE (new_temp)), - TYPE_MODE (vectype1)) - != CODE_FOR_nothing) - { - /* Extract sub-vectors directly once vec_extract becomes - a conversion optab. */ - dst1 = make_ssa_name (vectype1); - epilog_stmt - = gimple_build_assign (dst1, BIT_FIELD_REF, - build3 (BIT_FIELD_REF, vectype1, - new_temp, TYPE_SIZE (vectype1), - bitsize_int (0))); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - dst2 = make_ssa_name (vectype1); - epilog_stmt - = gimple_build_assign (dst2, BIT_FIELD_REF, - build3 (BIT_FIELD_REF, vectype1, - new_temp, TYPE_SIZE (vectype1), - bitsize_int (bitsize))); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - } - else - { - /* Extract via punning to appropriately sized integer mode - vector. */ - tree eltype = build_nonstandard_integer_type (bitsize, 1); - tree etype = build_vector_type (eltype, 2); - gcc_assert (convert_optab_handler (vec_extract_optab, - TYPE_MODE (etype), - TYPE_MODE (eltype)) - != CODE_FOR_nothing); - tree tem = make_ssa_name (etype); - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, - build1 (VIEW_CONVERT_EXPR, - etype, new_temp)); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - new_temp = tem; - tem = make_ssa_name (eltype); - epilog_stmt - = gimple_build_assign (tem, BIT_FIELD_REF, - build3 (BIT_FIELD_REF, eltype, - new_temp, TYPE_SIZE (eltype), - bitsize_int (0))); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - dst1 = make_ssa_name (vectype1); - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, - build1 (VIEW_CONVERT_EXPR, - vectype1, tem)); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - tem = make_ssa_name (eltype); - epilog_stmt - = gimple_build_assign (tem, BIT_FIELD_REF, - build3 (BIT_FIELD_REF, eltype, - new_temp, TYPE_SIZE (eltype), - bitsize_int (bitsize))); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - dst2 = make_ssa_name (vectype1); - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, - build1 (VIEW_CONVERT_EXPR, - vectype1, tem)); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - } - - new_temp = make_ssa_name (vectype1); - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); - reduc_inputs[0] = new_temp; - } + gimple_seq stmts = NULL; + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1, + code, &stmts); + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); + reduc_inputs[0] = new_temp; if (reduce_with_shift && !slp_reduc) { @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, if (auto *accumulator = reduc_info->reused_accumulator) { + tree def = accumulator->reduc_input; + unsigned int nreduc; + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)), + TYPE_VECTOR_SUBPARTS (vectype_out), + &nreduc); + gcc_assert (res); + if (nreduc != 1) + { + /* Reduce the single vector to a smaller one. */ + gimple_seq stmts = NULL; + def = vect_create_partial_epilog (def, vectype_out, + STMT_VINFO_REDUC_CODE (reduc_info), + &stmts); + /* Adjust the input so we pick up the partially reduced value + for the skip edge in vect_create_epilog_for_reduction. */ + accumulator->reduc_input = def; + if (loop_vinfo->main_loop_edge) + { + /* While we'd like to insert on the edge this will split + blocks and disturb bookkeeping, we also will eventually + need this on the skip edge. Rely on sinking to + fixup optimal placement and insert in the pred. */ + gimple_stmt_iterator gsi + = gsi_last_bb (loop_vinfo->main_loop_edge->src); + /* Insert before a cond that eventually skips the + epilogue. */ + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) + gsi_prev (&gsi); + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); + } + else + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), + stmts); + } if (loop_vinfo->main_loop_edge) vec_initial_defs[0] - = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input, + = vect_get_main_loop_result (loop_vinfo, def, vec_initial_defs[0]); else - vec_initial_defs.safe_push (accumulator->reduc_input); - gcc_assert (vec_initial_defs.length () == 1); + vec_initial_defs.safe_push (def); } /* Generate the reduction PHIs upfront. */ -- 2.26.2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size 2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener @ 2021-07-13 14:17 ` Richard Sandiford 2021-07-15 12:25 ` Christophe Lyon 1 sibling, 0 replies; 5+ messages in thread From: Richard Sandiford @ 2021-07-13 14:17 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener <rguenther@suse.de> writes: > The following adds support for re-using the vector reduction def > from the main loop in vectorized epilogue loops on architectures > which use different vector sizes for the epilogue. That's only > x86 as far as I am aware. > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap & > regtest in progress. > > There's costing issues on x86 which usually prevent vectorizing > an epilogue with a reduction, at least for loops that only > have a reduction - it could be mitigated by not accounting for > the epilogue there if we can compute that we can re-use the > main loops cost. > > Richard - did I figure the correct place to adjust? I guess > adjusting accumulator->reduc_input in vect_transform_cycle_phi > for re-use by the skip code in vect_create_epilog_for_reduction > is a bit awkward but at least we're conciously doing > vect_create_epilog_for_reduction last (via vectorizing live > operations). Yeah. IMO it'd be a bit cleaner to store the new accumulator directly in the reduc_info, but I don't feel strongly about it. Apart from that and a minor nit below, it looks good to me FWIW. (At some point it'd be good for reduc_info to be its own structure, separate from stmt_vec_info, so that there's less of a cost associated with storing more data there.) Thanks, Richard > OK in the unlikely case all testing succeeds (I also want to > run it through SPEC with/without -fno-vect-cost-model which > will take some time)? > > Thanks, > Richard. > > 2021-07-13 Richard Biener <rguenther@suse.de> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > vector types where the old vector type has a multiple of > the new vector type elements. > (vect_create_partial_epilog): New function, split out from... > (vect_create_epilog_for_reduction): ... here. > (vect_transform_cycle_phi): Reduce the re-used accumulator > to the new vector type. > > * gcc.target/i386/vect-reduc-1.c: New testcase. > --- > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ > gcc/tree-vect-loop.c | 223 ++++++++++++------- > 2 files changed, 155 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > new file mode 100644 > index 00000000000..9ee9ba4e736 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */ > + > +#define N 32 > +int foo (int *a, int n) > +{ > + int sum = 1; > + for (int i = 0; i < 8*N + 4; ++i) > + sum += a[i]; > + return sum; > +} > + > +/* The reduction epilog should be vectorized and the accumulator > + re-used. */ > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */ > +/* { dg-final { scan-assembler-times "psrl" 2 } } */ > +/* { dg-final { scan-assembler-times "padd" 5 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 8c27d75f889..98e2a845629 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo, > ones as well. */ > tree vectype = STMT_VINFO_VECTYPE (reduc_info); > tree old_vectype = TREE_TYPE (accumulator->reduc_input); > - if (!useless_type_conversion_p (old_vectype, vectype)) > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), > + TYPE_VECTOR_SUBPARTS (vectype))) > return false; > > /* Non-SLP reductions might apply an adjustment after the reduction The comment above this needs updating too. > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo, > return true; > } > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */ > + > +static tree > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code, > + gimple_seq *seq) > +{ > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant (); > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > + tree stype = TREE_TYPE (vectype); > + tree new_temp = vec_def; > + while (nunits > nunits1) > + { > + nunits /= 2; > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), > + stype, nunits); > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > + > + /* The target has to make sure we support lowpart/highpart > + extraction, either via direct vector extract or through > + an integer mode punning. */ > + tree dst1, dst2; > + gimple *epilog_stmt; > + if (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (TREE_TYPE (new_temp)), > + TYPE_MODE (vectype1)) > + != CODE_FOR_nothing) > + { > + /* Extract sub-vectors directly once vec_extract becomes > + a conversion optab. */ > + dst1 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst1, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE (vectype1), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst2, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE (vectype1), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + else > + { > + /* Extract via punning to appropriately sized integer mode > + vector. */ > + tree eltype = build_nonstandard_integer_type (bitsize, 1); > + tree etype = build_vector_type (eltype, 2); > + gcc_assert (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (etype), > + TYPE_MODE (eltype)) > + != CODE_FOR_nothing); > + tree tem = make_ssa_name (etype); > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + etype, new_temp)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + new_temp = tem; > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst1 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + new_temp = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + return new_temp; > +} > + > /* Function vect_create_epilog_for_reduction > > Create code at the loop-epilog to finalize the result of a reduction > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > > /* First reduce the vector to the desired vector size we should > do shift reduction on by combining upper and lower halves. */ > - new_temp = reduc_inputs[0]; > - while (nunits > nunits1) > - { > - nunits /= 2; > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), > - stype, nunits); > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > - > - /* The target has to make sure we support lowpart/highpart > - extraction, either via direct vector extract or through > - an integer mode punning. */ > - tree dst1, dst2; > - if (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (TREE_TYPE (new_temp)), > - TYPE_MODE (vectype1)) > - != CODE_FOR_nothing) > - { > - /* Extract sub-vectors directly once vec_extract becomes > - a conversion optab. */ > - dst1 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst1, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE (vectype1), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst2, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE (vectype1), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - else > - { > - /* Extract via punning to appropriately sized integer mode > - vector. */ > - tree eltype = build_nonstandard_integer_type (bitsize, 1); > - tree etype = build_vector_type (eltype, 2); > - gcc_assert (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (etype), > - TYPE_MODE (eltype)) > - != CODE_FOR_nothing); > - tree tem = make_ssa_name (etype); > - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - etype, new_temp)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - new_temp = tem; > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE (eltype), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst1 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE (eltype), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - > - new_temp = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - reduc_inputs[0] = new_temp; > - } > + gimple_seq stmts = NULL; > + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1, > + code, &stmts); > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > + reduc_inputs[0] = new_temp; > > if (reduce_with_shift && !slp_reduc) > { > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > if (auto *accumulator = reduc_info->reused_accumulator) > { > + tree def = accumulator->reduc_input; > + unsigned int nreduc; > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)), > + TYPE_VECTOR_SUBPARTS (vectype_out), > + &nreduc); > + gcc_assert (res); > + if (nreduc != 1) > + { > + /* Reduce the single vector to a smaller one. */ > + gimple_seq stmts = NULL; > + def = vect_create_partial_epilog (def, vectype_out, > + STMT_VINFO_REDUC_CODE (reduc_info), > + &stmts); > + /* Adjust the input so we pick up the partially reduced value > + for the skip edge in vect_create_epilog_for_reduction. */ > + accumulator->reduc_input = def; > + if (loop_vinfo->main_loop_edge) > + { > + /* While we'd like to insert on the edge this will split > + blocks and disturb bookkeeping, we also will eventually > + need this on the skip edge. Rely on sinking to > + fixup optimal placement and insert in the pred. */ > + gimple_stmt_iterator gsi > + = gsi_last_bb (loop_vinfo->main_loop_edge->src); > + /* Insert before a cond that eventually skips the > + epilogue. */ > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) > + gsi_prev (&gsi); > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); > + } > + else > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), > + stmts); > + } > if (loop_vinfo->main_loop_edge) > vec_initial_defs[0] > - = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input, > + = vect_get_main_loop_result (loop_vinfo, def, > vec_initial_defs[0]); > else > - vec_initial_defs.safe_push (accumulator->reduc_input); > - gcc_assert (vec_initial_defs.length () == 1); > + vec_initial_defs.safe_push (def); > } > > /* Generate the reduction PHIs upfront. */ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size 2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener 2021-07-13 14:17 ` Richard Sandiford @ 2021-07-15 12:25 ` Christophe Lyon 2021-07-15 12:34 ` Richard Biener 1 sibling, 1 reply; 5+ messages in thread From: Christophe Lyon @ 2021-07-15 12:25 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Richard Sandiford Hi, On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote: > The following adds support for re-using the vector reduction def > from the main loop in vectorized epilogue loops on architectures > which use different vector sizes for the epilogue. That's only > x86 as far as I am aware. > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap & > regtest in progress. > > There's costing issues on x86 which usually prevent vectorizing > an epilogue with a reduction, at least for loops that only > have a reduction - it could be mitigated by not accounting for > the epilogue there if we can compute that we can re-use the > main loops cost. > > Richard - did I figure the correct place to adjust? I guess > adjusting accumulator->reduc_input in vect_transform_cycle_phi > for re-use by the skip code in vect_create_epilog_for_reduction > is a bit awkward but at least we're conciously doing > vect_create_epilog_for_reduction last (via vectorizing live > operations). > > OK in the unlikely case all testing succeeds (I also want to > run it through SPEC with/without -fno-vect-cost-model which > will take some time)? > > Thanks, > Richard. > > 2021-07-13 Richard Biener <rguenther@suse.de> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > vector types where the old vector type has a multiple of > the new vector type elements. > (vect_create_partial_epilog): New function, split out from... > (vect_create_epilog_for_reduction): ... here. > (vect_transform_cycle_phi): Reduce the re-used accumulator > to the new vector type. > > * gcc.target/i386/vect-reduc-1.c: New testcase. > This patch is causing regressions on aarch64: FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error) FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors) Excess errors: /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in 'PHI' argument 1 vector(2) unsigned int vector(2) int _91 = PHI <_90(17), _83(11)> during GIMPLE pass: vect dump file: ./pr92324-4.c.167t.vect /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error: verify_gimple failed 0xe6438e verify_gimple_in_cfg(function*, bool) /gcc/tree-cfg.c:5535 0xd13902 execute_function_todo /gcc/passes.c:2042 0xd142a5 execute_todo /gcc/passes.c:2096 FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv Thanks, Christophe > --- > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ > gcc/tree-vect-loop.c | 223 ++++++++++++------- > 2 files changed, 155 insertions(+), 85 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > new file mode 100644 > index 00000000000..9ee9ba4e736 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */ > + > +#define N 32 > +int foo (int *a, int n) > +{ > + int sum = 1; > + for (int i = 0; i < 8*N + 4; ++i) > + sum += a[i]; > + return sum; > +} > + > +/* The reduction epilog should be vectorized and the accumulator > + re-used. */ > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */ > +/* { dg-final { scan-assembler-times "psrl" 2 } } */ > +/* { dg-final { scan-assembler-times "padd" 5 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 8c27d75f889..98e2a845629 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info > loop_vinfo, > ones as well. */ > tree vectype = STMT_VINFO_VECTYPE (reduc_info); > tree old_vectype = TREE_TYPE (accumulator->reduc_input); > - if (!useless_type_conversion_p (old_vectype, vectype)) > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), > + TYPE_VECTOR_SUBPARTS (vectype))) > return false; > > /* Non-SLP reductions might apply an adjustment after the reduction > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info > loop_vinfo, > return true; > } > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */ > + > +static tree > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code > code, > + gimple_seq *seq) > +{ > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE > (vec_def)).to_constant (); > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > + tree stype = TREE_TYPE (vectype); > + tree new_temp = vec_def; > + while (nunits > nunits1) > + { > + nunits /= 2; > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > (vectype), > + stype, nunits); > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > + > + /* The target has to make sure we support lowpart/highpart > + extraction, either via direct vector extract or through > + an integer mode punning. */ > + tree dst1, dst2; > + gimple *epilog_stmt; > + if (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (TREE_TYPE (new_temp)), > + TYPE_MODE (vectype1)) > + != CODE_FOR_nothing) > + { > + /* Extract sub-vectors directly once vec_extract becomes > + a conversion optab. */ > + dst1 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst1, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE > (vectype1), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt > + = gimple_build_assign (dst2, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, vectype1, > + new_temp, TYPE_SIZE > (vectype1), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + else > + { > + /* Extract via punning to appropriately sized integer mode > + vector. */ > + tree eltype = build_nonstandard_integer_type (bitsize, 1); > + tree etype = build_vector_type (eltype, 2); > + gcc_assert (convert_optab_handler (vec_extract_optab, > + TYPE_MODE (etype), > + TYPE_MODE (eltype)) > + != CODE_FOR_nothing); > + tree tem = make_ssa_name (etype); > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + etype, new_temp)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + new_temp = tem; > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (0))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst1 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + tem = make_ssa_name (eltype); > + epilog_stmt > + = gimple_build_assign (tem, BIT_FIELD_REF, > + build3 (BIT_FIELD_REF, eltype, > + new_temp, TYPE_SIZE (eltype), > + bitsize_int (bitsize))); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + dst2 = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > + build1 (VIEW_CONVERT_EXPR, > + vectype1, tem)); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + new_temp = make_ssa_name (vectype1); > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > + } > + > + return new_temp; > +} > + > /* Function vect_create_epilog_for_reduction > > Create code at the loop-epilog to finalize the result of a reduction > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > > /* First reduce the vector to the desired vector size we should > do shift reduction on by combining upper and lower halves. */ > - new_temp = reduc_inputs[0]; > - while (nunits > nunits1) > - { > - nunits /= 2; > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > (vectype), > - stype, nunits); > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > - > - /* The target has to make sure we support lowpart/highpart > - extraction, either via direct vector extract or through > - an integer mode punning. */ > - tree dst1, dst2; > - if (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (TREE_TYPE (new_temp)), > - TYPE_MODE (vectype1)) > - != CODE_FOR_nothing) > - { > - /* Extract sub-vectors directly once vec_extract becomes > - a conversion optab. */ > - dst1 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst1, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE > (vectype1), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt > - = gimple_build_assign (dst2, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, vectype1, > - new_temp, TYPE_SIZE > (vectype1), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - else > - { > - /* Extract via punning to appropriately sized integer mode > - vector. */ > - tree eltype = build_nonstandard_integer_type (bitsize, 1); > - tree etype = build_vector_type (eltype, 2); > - gcc_assert (convert_optab_handler (vec_extract_optab, > - TYPE_MODE (etype), > - TYPE_MODE (eltype)) > - != CODE_FOR_nothing); > - tree tem = make_ssa_name (etype); > - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - etype, new_temp)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - new_temp = tem; > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE > (eltype), > - bitsize_int (0))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst1 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - tem = make_ssa_name (eltype); > - epilog_stmt > - = gimple_build_assign (tem, BIT_FIELD_REF, > - build3 (BIT_FIELD_REF, eltype, > - new_temp, TYPE_SIZE > (eltype), > - bitsize_int (bitsize))); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - dst2 = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > - build1 (VIEW_CONVERT_EXPR, > - vectype1, tem)); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - } > - > - new_temp = make_ssa_name (vectype1); > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > - reduc_inputs[0] = new_temp; > - } > + gimple_seq stmts = NULL; > + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1, > + code, &stmts); > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > + reduc_inputs[0] = new_temp; > > if (reduce_with_shift && !slp_reduc) > { > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > if (auto *accumulator = reduc_info->reused_accumulator) > { > + tree def = accumulator->reduc_input; > + unsigned int nreduc; > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE > (def)), > + TYPE_VECTOR_SUBPARTS (vectype_out), > + &nreduc); > + gcc_assert (res); > + if (nreduc != 1) > + { > + /* Reduce the single vector to a smaller one. */ > + gimple_seq stmts = NULL; > + def = vect_create_partial_epilog (def, vectype_out, > + STMT_VINFO_REDUC_CODE > (reduc_info), > + &stmts); > + /* Adjust the input so we pick up the partially reduced value > + for the skip edge in vect_create_epilog_for_reduction. */ > + accumulator->reduc_input = def; > + if (loop_vinfo->main_loop_edge) > + { > + /* While we'd like to insert on the edge this will split > + blocks and disturb bookkeeping, we also will eventually > + need this on the skip edge. Rely on sinking to > + fixup optimal placement and insert in the pred. */ > + gimple_stmt_iterator gsi > + = gsi_last_bb (loop_vinfo->main_loop_edge->src); > + /* Insert before a cond that eventually skips the > + epilogue. */ > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) > + gsi_prev (&gsi); > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); > + } > + else > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), > + stmts); > + } > if (loop_vinfo->main_loop_edge) > vec_initial_defs[0] > - = vect_get_main_loop_result (loop_vinfo, > accumulator->reduc_input, > + = vect_get_main_loop_result (loop_vinfo, def, > vec_initial_defs[0]); > else > - vec_initial_defs.safe_push (accumulator->reduc_input); > - gcc_assert (vec_initial_defs.length () == 1); > + vec_initial_defs.safe_push (def); > } > > /* Generate the reduction PHIs upfront. */ > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size 2021-07-15 12:25 ` Christophe Lyon @ 2021-07-15 12:34 ` Richard Biener 2021-07-15 15:19 ` Christophe Lyon 0 siblings, 1 reply; 5+ messages in thread From: Richard Biener @ 2021-07-15 12:34 UTC (permalink / raw) To: Christophe Lyon; +Cc: GCC Patches, Richard Sandiford On Thu, 15 Jul 2021, Christophe Lyon wrote: > Hi, > > > > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote: > > > The following adds support for re-using the vector reduction def > > from the main loop in vectorized epilogue loops on architectures > > which use different vector sizes for the epilogue. That's only > > x86 as far as I am aware. > > > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap & > > regtest in progress. > > > > There's costing issues on x86 which usually prevent vectorizing > > an epilogue with a reduction, at least for loops that only > > have a reduction - it could be mitigated by not accounting for > > the epilogue there if we can compute that we can re-use the > > main loops cost. > > > > Richard - did I figure the correct place to adjust? I guess > > adjusting accumulator->reduc_input in vect_transform_cycle_phi > > for re-use by the skip code in vect_create_epilog_for_reduction > > is a bit awkward but at least we're conciously doing > > vect_create_epilog_for_reduction last (via vectorizing live > > operations). > > > > OK in the unlikely case all testing succeeds (I also want to > > run it through SPEC with/without -fno-vect-cost-model which > > will take some time)? > > > > Thanks, > > Richard. > > > > 2021-07-13 Richard Biener <rguenther@suse.de> > > > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > > vector types where the old vector type has a multiple of > > the new vector type elements. > > (vect_create_partial_epilog): New function, split out from... > > (vect_create_epilog_for_reduction): ... here. > > (vect_transform_cycle_phi): Reduce the re-used accumulator > > to the new vector type. > > > > * gcc.target/i386/vect-reduc-1.c: New testcase. > > > > This patch is causing regressions on aarch64: > FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error) > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors) > Excess errors: > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in > 'PHI' argument 1 > vector(2) unsigned int > vector(2) int > _91 = PHI <_90(17), _83(11)> > during GIMPLE pass: vect > dump file: ./pr92324-4.c.167t.vect > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error: > verify_gimple failed > 0xe6438e verify_gimple_in_cfg(function*, bool) > /gcc/tree-cfg.c:5535 > 0xd13902 execute_function_todo > /gcc/passes.c:2042 > 0xd142a5 execute_todo > /gcc/passes.c:2096 > > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv What exact options do you pass to cc1 to get this? Can you track this in a PR please? Thanks, Richard. > Thanks, > > Christophe > > > > > --- > > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ > > gcc/tree-vect-loop.c | 223 ++++++++++++------- > > 2 files changed, 155 insertions(+), 85 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > new file mode 100644 > > index 00000000000..9ee9ba4e736 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */ > > + > > +#define N 32 > > +int foo (int *a, int n) > > +{ > > + int sum = 1; > > + for (int i = 0; i < 8*N + 4; ++i) > > + sum += a[i]; > > + return sum; > > +} > > + > > +/* The reduction epilog should be vectorized and the accumulator > > + re-used. */ > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */ > > +/* { dg-final { scan-assembler-times "padd" 5 } } */ > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > index 8c27d75f889..98e2a845629 100644 > > --- a/gcc/tree-vect-loop.c > > +++ b/gcc/tree-vect-loop.c > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info > > loop_vinfo, > > ones as well. */ > > tree vectype = STMT_VINFO_VECTYPE (reduc_info); > > tree old_vectype = TREE_TYPE (accumulator->reduc_input); > > - if (!useless_type_conversion_p (old_vectype, vectype)) > > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), > > + TYPE_VECTOR_SUBPARTS (vectype))) > > return false; > > > > /* Non-SLP reductions might apply an adjustment after the reduction > > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info > > loop_vinfo, > > return true; > > } > > > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation > > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */ > > + > > +static tree > > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code > > code, > > + gimple_seq *seq) > > +{ > > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE > > (vec_def)).to_constant (); > > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > > + tree stype = TREE_TYPE (vectype); > > + tree new_temp = vec_def; > > + while (nunits > nunits1) > > + { > > + nunits /= 2; > > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > > (vectype), > > + stype, nunits); > > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > > + > > + /* The target has to make sure we support lowpart/highpart > > + extraction, either via direct vector extract or through > > + an integer mode punning. */ > > + tree dst1, dst2; > > + gimple *epilog_stmt; > > + if (convert_optab_handler (vec_extract_optab, > > + TYPE_MODE (TREE_TYPE (new_temp)), > > + TYPE_MODE (vectype1)) > > + != CODE_FOR_nothing) > > + { > > + /* Extract sub-vectors directly once vec_extract becomes > > + a conversion optab. */ > > + dst1 = make_ssa_name (vectype1); > > + epilog_stmt > > + = gimple_build_assign (dst1, BIT_FIELD_REF, > > + build3 (BIT_FIELD_REF, vectype1, > > + new_temp, TYPE_SIZE > > (vectype1), > > + bitsize_int (0))); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + dst2 = make_ssa_name (vectype1); > > + epilog_stmt > > + = gimple_build_assign (dst2, BIT_FIELD_REF, > > + build3 (BIT_FIELD_REF, vectype1, > > + new_temp, TYPE_SIZE > > (vectype1), > > + bitsize_int (bitsize))); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + } > > + else > > + { > > + /* Extract via punning to appropriately sized integer mode > > + vector. */ > > + tree eltype = build_nonstandard_integer_type (bitsize, 1); > > + tree etype = build_vector_type (eltype, 2); > > + gcc_assert (convert_optab_handler (vec_extract_optab, > > + TYPE_MODE (etype), > > + TYPE_MODE (eltype)) > > + != CODE_FOR_nothing); > > + tree tem = make_ssa_name (etype); > > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > > + build1 (VIEW_CONVERT_EXPR, > > + etype, new_temp)); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + new_temp = tem; > > + tem = make_ssa_name (eltype); > > + epilog_stmt > > + = gimple_build_assign (tem, BIT_FIELD_REF, > > + build3 (BIT_FIELD_REF, eltype, > > + new_temp, TYPE_SIZE (eltype), > > + bitsize_int (0))); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + dst1 = make_ssa_name (vectype1); > > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > > + build1 (VIEW_CONVERT_EXPR, > > + vectype1, tem)); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + tem = make_ssa_name (eltype); > > + epilog_stmt > > + = gimple_build_assign (tem, BIT_FIELD_REF, > > + build3 (BIT_FIELD_REF, eltype, > > + new_temp, TYPE_SIZE (eltype), > > + bitsize_int (bitsize))); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + dst2 = make_ssa_name (vectype1); > > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > > + build1 (VIEW_CONVERT_EXPR, > > + vectype1, tem)); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + } > > + > > + new_temp = make_ssa_name (vectype1); > > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > + } > > + > > + return new_temp; > > +} > > + > > /* Function vect_create_epilog_for_reduction > > > > Create code at the loop-epilog to finalize the result of a reduction > > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info > > loop_vinfo, > > > > /* First reduce the vector to the desired vector size we should > > do shift reduction on by combining upper and lower halves. */ > > - new_temp = reduc_inputs[0]; > > - while (nunits > nunits1) > > - { > > - nunits /= 2; > > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > > (vectype), > > - stype, nunits); > > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > > - > > - /* The target has to make sure we support lowpart/highpart > > - extraction, either via direct vector extract or through > > - an integer mode punning. */ > > - tree dst1, dst2; > > - if (convert_optab_handler (vec_extract_optab, > > - TYPE_MODE (TREE_TYPE (new_temp)), > > - TYPE_MODE (vectype1)) > > - != CODE_FOR_nothing) > > - { > > - /* Extract sub-vectors directly once vec_extract becomes > > - a conversion optab. */ > > - dst1 = make_ssa_name (vectype1); > > - epilog_stmt > > - = gimple_build_assign (dst1, BIT_FIELD_REF, > > - build3 (BIT_FIELD_REF, vectype1, > > - new_temp, TYPE_SIZE > > (vectype1), > > - bitsize_int (0))); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - dst2 = make_ssa_name (vectype1); > > - epilog_stmt > > - = gimple_build_assign (dst2, BIT_FIELD_REF, > > - build3 (BIT_FIELD_REF, vectype1, > > - new_temp, TYPE_SIZE > > (vectype1), > > - bitsize_int (bitsize))); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - } > > - else > > - { > > - /* Extract via punning to appropriately sized integer mode > > - vector. */ > > - tree eltype = build_nonstandard_integer_type (bitsize, 1); > > - tree etype = build_vector_type (eltype, 2); > > - gcc_assert (convert_optab_handler (vec_extract_optab, > > - TYPE_MODE (etype), > > - TYPE_MODE (eltype)) > > - != CODE_FOR_nothing); > > - tree tem = make_ssa_name (etype); > > - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > > - build1 (VIEW_CONVERT_EXPR, > > - etype, new_temp)); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - new_temp = tem; > > - tem = make_ssa_name (eltype); > > - epilog_stmt > > - = gimple_build_assign (tem, BIT_FIELD_REF, > > - build3 (BIT_FIELD_REF, eltype, > > - new_temp, TYPE_SIZE > > (eltype), > > - bitsize_int (0))); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - dst1 = make_ssa_name (vectype1); > > - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > > - build1 (VIEW_CONVERT_EXPR, > > - vectype1, tem)); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - tem = make_ssa_name (eltype); > > - epilog_stmt > > - = gimple_build_assign (tem, BIT_FIELD_REF, > > - build3 (BIT_FIELD_REF, eltype, > > - new_temp, TYPE_SIZE > > (eltype), > > - bitsize_int (bitsize))); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - dst2 = make_ssa_name (vectype1); > > - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > > - build1 (VIEW_CONVERT_EXPR, > > - vectype1, tem)); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - } > > - > > - new_temp = make_ssa_name (vectype1); > > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > - reduc_inputs[0] = new_temp; > > - } > > + gimple_seq stmts = NULL; > > + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1, > > + code, &stmts); > > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > > + reduc_inputs[0] = new_temp; > > > > if (reduce_with_shift && !slp_reduc) > > { > > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > > > if (auto *accumulator = reduc_info->reused_accumulator) > > { > > + tree def = accumulator->reduc_input; > > + unsigned int nreduc; > > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE > > (def)), > > + TYPE_VECTOR_SUBPARTS (vectype_out), > > + &nreduc); > > + gcc_assert (res); > > + if (nreduc != 1) > > + { > > + /* Reduce the single vector to a smaller one. */ > > + gimple_seq stmts = NULL; > > + def = vect_create_partial_epilog (def, vectype_out, > > + STMT_VINFO_REDUC_CODE > > (reduc_info), > > + &stmts); > > + /* Adjust the input so we pick up the partially reduced value > > + for the skip edge in vect_create_epilog_for_reduction. */ > > + accumulator->reduc_input = def; > > + if (loop_vinfo->main_loop_edge) > > + { > > + /* While we'd like to insert on the edge this will split > > + blocks and disturb bookkeeping, we also will eventually > > + need this on the skip edge. Rely on sinking to > > + fixup optimal placement and insert in the pred. */ > > + gimple_stmt_iterator gsi > > + = gsi_last_bb (loop_vinfo->main_loop_edge->src); > > + /* Insert before a cond that eventually skips the > > + epilogue. */ > > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) > > + gsi_prev (&gsi); > > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); > > + } > > + else > > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), > > + stmts); > > + } > > if (loop_vinfo->main_loop_edge) > > vec_initial_defs[0] > > - = vect_get_main_loop_result (loop_vinfo, > > accumulator->reduc_input, > > + = vect_get_main_loop_result (loop_vinfo, def, > > vec_initial_defs[0]); > > else > > - vec_initial_defs.safe_push (accumulator->reduc_input); > > - gcc_assert (vec_initial_defs.length () == 1); > > + vec_initial_defs.safe_push (def); > > } > > > > /* Generate the reduction PHIs upfront. */ > > -- > > 2.26.2 > > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size 2021-07-15 12:34 ` Richard Biener @ 2021-07-15 15:19 ` Christophe Lyon 0 siblings, 0 replies; 5+ messages in thread From: Christophe Lyon @ 2021-07-15 15:19 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Richard Sandiford On Thu, Jul 15, 2021 at 2:34 PM Richard Biener <rguenther@suse.de> wrote: > On Thu, 15 Jul 2021, Christophe Lyon wrote: > > > Hi, > > > > > > > > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> > wrote: > > > > > The following adds support for re-using the vector reduction def > > > from the main loop in vectorized epilogue loops on architectures > > > which use different vector sizes for the epilogue. That's only > > > x86 as far as I am aware. > > > > > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap & > > > regtest in progress. > > > > > > There's costing issues on x86 which usually prevent vectorizing > > > an epilogue with a reduction, at least for loops that only > > > have a reduction - it could be mitigated by not accounting for > > > the epilogue there if we can compute that we can re-use the > > > main loops cost. > > > > > > Richard - did I figure the correct place to adjust? I guess > > > adjusting accumulator->reduc_input in vect_transform_cycle_phi > > > for re-use by the skip code in vect_create_epilog_for_reduction > > > is a bit awkward but at least we're conciously doing > > > vect_create_epilog_for_reduction last (via vectorizing live > > > operations). > > > > > > OK in the unlikely case all testing succeeds (I also want to > > > run it through SPEC with/without -fno-vect-cost-model which > > > will take some time)? > > > > > > Thanks, > > > Richard. > > > > > > 2021-07-13 Richard Biener <rguenther@suse.de> > > > > > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle > > > vector types where the old vector type has a multiple of > > > the new vector type elements. > > > (vect_create_partial_epilog): New function, split out from... > > > (vect_create_epilog_for_reduction): ... here. > > > (vect_transform_cycle_phi): Reduce the re-used accumulator > > > to the new vector type. > > > > > > * gcc.target/i386/vect-reduc-1.c: New testcase. > > > > > > > This patch is causing regressions on aarch64: > > FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error) > > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output > > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors) > > Excess errors: > > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in > > 'PHI' argument 1 > > vector(2) unsigned int > > vector(2) int > > _91 = PHI <_90(17), _83(11)> > > during GIMPLE pass: vect > > dump file: ./pr92324-4.c.167t.vect > > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error: > > verify_gimple failed > > 0xe6438e verify_gimple_in_cfg(function*, bool) > > /gcc/tree-cfg.c:5535 > > 0xd13902 execute_function_todo > > /gcc/passes.c:2042 > > 0xd142a5 execute_todo > > /gcc/passes.c:2096 > > > > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler > fminnmv > > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler > fmaxnmv > > What exact options do you pass to cc1 to get this? Can you track this > in a PR please? > > Thanks, > Richard. > > Sure, I filed PR 101462 Christophe > > Thanks, > > > > Christophe > > > > > > > > > --- > > > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++ > > > gcc/tree-vect-loop.c | 223 ++++++++++++------- > > > 2 files changed, 155 insertions(+), 85 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > > new file mode 100644 > > > index 00000000000..9ee9ba4e736 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c > > > @@ -0,0 +1,17 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } > */ > > > + > > > +#define N 32 > > > +int foo (int *a, int n) > > > +{ > > > + int sum = 1; > > > + for (int i = 0; i < 8*N + 4; ++i) > > > + sum += a[i]; > > > + return sum; > > > +} > > > + > > > +/* The reduction epilog should be vectorized and the accumulator > > > + re-used. */ > > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } > */ > > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */ > > > +/* { dg-final { scan-assembler-times "padd" 5 } } */ > > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > > index 8c27d75f889..98e2a845629 100644 > > > --- a/gcc/tree-vect-loop.c > > > +++ b/gcc/tree-vect-loop.c > > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info > > > loop_vinfo, > > > ones as well. */ > > > tree vectype = STMT_VINFO_VECTYPE (reduc_info); > > > tree old_vectype = TREE_TYPE (accumulator->reduc_input); > > > - if (!useless_type_conversion_p (old_vectype, vectype)) > > > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype), > > > + TYPE_VECTOR_SUBPARTS (vectype))) > > > return false; > > > > > > /* Non-SLP reductions might apply an adjustment after the reduction > > > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info > > > loop_vinfo, > > > return true; > > > } > > > > > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation > > > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. > */ > > > + > > > +static tree > > > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code > > > code, > > > + gimple_seq *seq) > > > +{ > > > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > (vec_def)).to_constant (); > > > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > > > + tree stype = TREE_TYPE (vectype); > > > + tree new_temp = vec_def; > > > + while (nunits > nunits1) > > > + { > > > + nunits /= 2; > > > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > > > (vectype), > > > + stype, > nunits); > > > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > > > + > > > + /* The target has to make sure we support lowpart/highpart > > > + extraction, either via direct vector extract or through > > > + an integer mode punning. */ > > > + tree dst1, dst2; > > > + gimple *epilog_stmt; > > > + if (convert_optab_handler (vec_extract_optab, > > > + TYPE_MODE (TREE_TYPE (new_temp)), > > > + TYPE_MODE (vectype1)) > > > + != CODE_FOR_nothing) > > > + { > > > + /* Extract sub-vectors directly once vec_extract becomes > > > + a conversion optab. */ > > > + dst1 = make_ssa_name (vectype1); > > > + epilog_stmt > > > + = gimple_build_assign (dst1, BIT_FIELD_REF, > > > + build3 (BIT_FIELD_REF, vectype1, > > > + new_temp, TYPE_SIZE > > > (vectype1), > > > + bitsize_int (0))); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + dst2 = make_ssa_name (vectype1); > > > + epilog_stmt > > > + = gimple_build_assign (dst2, BIT_FIELD_REF, > > > + build3 (BIT_FIELD_REF, vectype1, > > > + new_temp, TYPE_SIZE > > > (vectype1), > > > + bitsize_int (bitsize))); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + } > > > + else > > > + { > > > + /* Extract via punning to appropriately sized integer mode > > > + vector. */ > > > + tree eltype = build_nonstandard_integer_type (bitsize, 1); > > > + tree etype = build_vector_type (eltype, 2); > > > + gcc_assert (convert_optab_handler (vec_extract_optab, > > > + TYPE_MODE (etype), > > > + TYPE_MODE (eltype)) > > > + != CODE_FOR_nothing); > > > + tree tem = make_ssa_name (etype); > > > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR, > > > + build1 (VIEW_CONVERT_EXPR, > > > + etype, new_temp)); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + new_temp = tem; > > > + tem = make_ssa_name (eltype); > > > + epilog_stmt > > > + = gimple_build_assign (tem, BIT_FIELD_REF, > > > + build3 (BIT_FIELD_REF, eltype, > > > + new_temp, TYPE_SIZE > (eltype), > > > + bitsize_int (0))); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + dst1 = make_ssa_name (vectype1); > > > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR, > > > + build1 (VIEW_CONVERT_EXPR, > > > + vectype1, tem)); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + tem = make_ssa_name (eltype); > > > + epilog_stmt > > > + = gimple_build_assign (tem, BIT_FIELD_REF, > > > + build3 (BIT_FIELD_REF, eltype, > > > + new_temp, TYPE_SIZE > (eltype), > > > + bitsize_int (bitsize))); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + dst2 = make_ssa_name (vectype1); > > > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR, > > > + build1 (VIEW_CONVERT_EXPR, > > > + vectype1, tem)); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + } > > > + > > > + new_temp = make_ssa_name (vectype1); > > > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2); > > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt); > > > + } > > > + > > > + return new_temp; > > > +} > > > + > > > /* Function vect_create_epilog_for_reduction > > > > > > Create code at the loop-epilog to finalize the result of a > reduction > > > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info > > > loop_vinfo, > > > > > > /* First reduce the vector to the desired vector size we should > > > do shift reduction on by combining upper and lower halves. */ > > > - new_temp = reduc_inputs[0]; > > > - while (nunits > nunits1) > > > - { > > > - nunits /= 2; > > > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE > > > (vectype), > > > - stype, > nunits); > > > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1)); > > > - > > > - /* The target has to make sure we support lowpart/highpart > > > - extraction, either via direct vector extract or through > > > - an integer mode punning. */ > > > - tree dst1, dst2; > > > - if (convert_optab_handler (vec_extract_optab, > > > - TYPE_MODE (TREE_TYPE (new_temp)), > > > - TYPE_MODE (vectype1)) > > > - != CODE_FOR_nothing) > > > - { > > > - /* Extract sub-vectors directly once vec_extract becomes > > > - a conversion optab. */ > > > - dst1 = make_ssa_name (vectype1); > > > - epilog_stmt > > > - = gimple_build_assign (dst1, BIT_FIELD_REF, > > > - build3 (BIT_FIELD_REF, > vectype1, > > > - new_temp, TYPE_SIZE > > > (vectype1), > > > - bitsize_int (0))); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - dst2 = make_ssa_name (vectype1); > > > - epilog_stmt > > > - = gimple_build_assign (dst2, BIT_FIELD_REF, > > > - build3 (BIT_FIELD_REF, > vectype1, > > > - new_temp, TYPE_SIZE > > > (vectype1), > > > - bitsize_int > (bitsize))); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - } > > > - else > > > - { > > > - /* Extract via punning to appropriately sized integer > mode > > > - vector. */ > > > - tree eltype = build_nonstandard_integer_type (bitsize, > 1); > > > - tree etype = build_vector_type (eltype, 2); > > > - gcc_assert (convert_optab_handler (vec_extract_optab, > > > - TYPE_MODE (etype), > > > - TYPE_MODE (eltype)) > > > - != CODE_FOR_nothing); > > > - tree tem = make_ssa_name (etype); > > > - epilog_stmt = gimple_build_assign (tem, > VIEW_CONVERT_EXPR, > > > - build1 > (VIEW_CONVERT_EXPR, > > > - etype, > new_temp)); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - new_temp = tem; > > > - tem = make_ssa_name (eltype); > > > - epilog_stmt > > > - = gimple_build_assign (tem, BIT_FIELD_REF, > > > - build3 (BIT_FIELD_REF, eltype, > > > - new_temp, TYPE_SIZE > > > (eltype), > > > - bitsize_int (0))); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - dst1 = make_ssa_name (vectype1); > > > - epilog_stmt = gimple_build_assign (dst1, > VIEW_CONVERT_EXPR, > > > - build1 > (VIEW_CONVERT_EXPR, > > > - vectype1, > tem)); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - tem = make_ssa_name (eltype); > > > - epilog_stmt > > > - = gimple_build_assign (tem, BIT_FIELD_REF, > > > - build3 (BIT_FIELD_REF, eltype, > > > - new_temp, TYPE_SIZE > > > (eltype), > > > - bitsize_int > (bitsize))); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - dst2 = make_ssa_name (vectype1); > > > - epilog_stmt = gimple_build_assign (dst2, > VIEW_CONVERT_EXPR, > > > - build1 > (VIEW_CONVERT_EXPR, > > > - vectype1, > tem)); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, > GSI_SAME_STMT); > > > - } > > > - > > > - new_temp = make_ssa_name (vectype1); > > > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, > dst2); > > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); > > > - reduc_inputs[0] = new_temp; > > > - } > > > + gimple_seq stmts = NULL; > > > + new_temp = vect_create_partial_epilog (reduc_inputs[0], > vectype1, > > > + code, &stmts); > > > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > > > + reduc_inputs[0] = new_temp; > > > > > > if (reduce_with_shift && !slp_reduc) > > > { > > > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info > loop_vinfo, > > > > > > if (auto *accumulator = reduc_info->reused_accumulator) > > > { > > > + tree def = accumulator->reduc_input; > > > + unsigned int nreduc; > > > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE > > > (def)), > > > + TYPE_VECTOR_SUBPARTS > (vectype_out), > > > + &nreduc); > > > + gcc_assert (res); > > > + if (nreduc != 1) > > > + { > > > + /* Reduce the single vector to a smaller one. */ > > > + gimple_seq stmts = NULL; > > > + def = vect_create_partial_epilog (def, vectype_out, > > > + STMT_VINFO_REDUC_CODE > > > (reduc_info), > > > + &stmts); > > > + /* Adjust the input so we pick up the partially reduced value > > > + for the skip edge in vect_create_epilog_for_reduction. */ > > > + accumulator->reduc_input = def; > > > + if (loop_vinfo->main_loop_edge) > > > + { > > > + /* While we'd like to insert on the edge this will split > > > + blocks and disturb bookkeeping, we also will > eventually > > > + need this on the skip edge. Rely on sinking to > > > + fixup optimal placement and insert in the pred. */ > > > + gimple_stmt_iterator gsi > > > + = gsi_last_bb (loop_vinfo->main_loop_edge->src); > > > + /* Insert before a cond that eventually skips the > > > + epilogue. */ > > > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi))) > > > + gsi_prev (&gsi); > > > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING); > > > + } > > > + else > > > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge > (loop), > > > + stmts); > > > + } > > > if (loop_vinfo->main_loop_edge) > > > vec_initial_defs[0] > > > - = vect_get_main_loop_result (loop_vinfo, > > > accumulator->reduc_input, > > > + = vect_get_main_loop_result (loop_vinfo, def, > > > vec_initial_defs[0]); > > > else > > > - vec_initial_defs.safe_push (accumulator->reduc_input); > > > - gcc_assert (vec_initial_defs.length () == 1); > > > + vec_initial_defs.safe_push (def); > > > } > > > > > > /* Generate the reduction PHIs upfront. */ > > > -- > > > 2.26.2 > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-15 15:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener 2021-07-13 14:17 ` Richard Sandiford 2021-07-15 12:25 ` Christophe Lyon 2021-07-15 12:34 ` Richard Biener 2021-07-15 15:19 ` Christophe Lyon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).