* [PATCH] Support reduction def re-use for epilogue with different vector size
@ 2021-07-13 12:09 Richard Biener
2021-07-13 14:17 ` Richard Sandiford
2021-07-15 12:25 ` Christophe Lyon
0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2021-07-13 12:09 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.sandiford
The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue. That's only
x86 as far as I am aware.
vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
regtest in progress.
There's costing issues on x86 which usually prevent vectorizing
an epilogue with a reduction, at least for loops that only
have a reduction - it could be mitigated by not accounting for
the epilogue there if we can compute that we can re-use the
main loops cost.
Richard - did I figure the correct place to adjust? I guess
adjusting accumulator->reduc_input in vect_transform_cycle_phi
for re-use by the skip code in vect_create_epilog_for_reduction
is a bit awkward but at least we're conciously doing
vect_create_epilog_for_reduction last (via vectorizing live
operations).
OK in the unlikely case all testing succeeds (I also want to
run it through SPEC with/without -fno-vect-cost-model which
will take some time)?
Thanks,
Richard.
2021-07-13 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
vector types where the old vector type has a multiple of
the new vector type elements.
(vect_create_partial_epilog): New function, split out from...
(vect_create_epilog_for_reduction): ... here.
(vect_transform_cycle_phi): Reduce the re-used accumulator
to the new vector type.
* gcc.target/i386/vect-reduc-1.c: New testcase.
---
gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++
gcc/tree-vect-loop.c | 223 ++++++++++++-------
2 files changed, 155 insertions(+), 85 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
new file mode 100644
index 00000000000..9ee9ba4e736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
+
+#define N 32
+int foo (int *a, int n)
+{
+ int sum = 1;
+ for (int i = 0; i < 8*N + 4; ++i)
+ sum += a[i];
+ return sum;
+}
+
+/* The reduction epilog should be vectorized and the accumulator
+ re-used. */
+/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-times "psrl" 2 } } */
+/* { dg-final { scan-assembler-times "padd" 5 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 8c27d75f889..98e2a845629 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
ones as well. */
tree vectype = STMT_VINFO_VECTYPE (reduc_info);
tree old_vectype = TREE_TYPE (accumulator->reduc_input);
- if (!useless_type_conversion_p (old_vectype, vectype))
+ if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
+ TYPE_VECTOR_SUBPARTS (vectype)))
return false;
/* Non-SLP reductions might apply an adjustment after the reduction
@@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
return true;
}
+/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
+ CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */
+
+static tree
+vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
+ gimple_seq *seq)
+{
+ unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
+ unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+ tree stype = TREE_TYPE (vectype);
+ tree new_temp = vec_def;
+ while (nunits > nunits1)
+ {
+ nunits /= 2;
+ tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+ stype, nunits);
+ unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
+
+ /* The target has to make sure we support lowpart/highpart
+ extraction, either via direct vector extract or through
+ an integer mode punning. */
+ tree dst1, dst2;
+ gimple *epilog_stmt;
+ if (convert_optab_handler (vec_extract_optab,
+ TYPE_MODE (TREE_TYPE (new_temp)),
+ TYPE_MODE (vectype1))
+ != CODE_FOR_nothing)
+ {
+ /* Extract sub-vectors directly once vec_extract becomes
+ a conversion optab. */
+ dst1 = make_ssa_name (vectype1);
+ epilog_stmt
+ = gimple_build_assign (dst1, BIT_FIELD_REF,
+ build3 (BIT_FIELD_REF, vectype1,
+ new_temp, TYPE_SIZE (vectype1),
+ bitsize_int (0)));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ dst2 = make_ssa_name (vectype1);
+ epilog_stmt
+ = gimple_build_assign (dst2, BIT_FIELD_REF,
+ build3 (BIT_FIELD_REF, vectype1,
+ new_temp, TYPE_SIZE (vectype1),
+ bitsize_int (bitsize)));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ }
+ else
+ {
+ /* Extract via punning to appropriately sized integer mode
+ vector. */
+ tree eltype = build_nonstandard_integer_type (bitsize, 1);
+ tree etype = build_vector_type (eltype, 2);
+ gcc_assert (convert_optab_handler (vec_extract_optab,
+ TYPE_MODE (etype),
+ TYPE_MODE (eltype))
+ != CODE_FOR_nothing);
+ tree tem = make_ssa_name (etype);
+ epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
+ build1 (VIEW_CONVERT_EXPR,
+ etype, new_temp));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ new_temp = tem;
+ tem = make_ssa_name (eltype);
+ epilog_stmt
+ = gimple_build_assign (tem, BIT_FIELD_REF,
+ build3 (BIT_FIELD_REF, eltype,
+ new_temp, TYPE_SIZE (eltype),
+ bitsize_int (0)));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ dst1 = make_ssa_name (vectype1);
+ epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
+ build1 (VIEW_CONVERT_EXPR,
+ vectype1, tem));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ tem = make_ssa_name (eltype);
+ epilog_stmt
+ = gimple_build_assign (tem, BIT_FIELD_REF,
+ build3 (BIT_FIELD_REF, eltype,
+ new_temp, TYPE_SIZE (eltype),
+ bitsize_int (bitsize)));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ dst2 = make_ssa_name (vectype1);
+ epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
+ build1 (VIEW_CONVERT_EXPR,
+ vectype1, tem));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ }
+
+ new_temp = make_ssa_name (vectype1);
+ epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ }
+
+ return new_temp;
+}
+
/* Function vect_create_epilog_for_reduction
Create code at the loop-epilog to finalize the result of a reduction
@@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
/* First reduce the vector to the desired vector size we should
do shift reduction on by combining upper and lower halves. */
- new_temp = reduc_inputs[0];
- while (nunits > nunits1)
- {
- nunits /= 2;
- vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
- stype, nunits);
- unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
-
- /* The target has to make sure we support lowpart/highpart
- extraction, either via direct vector extract or through
- an integer mode punning. */
- tree dst1, dst2;
- if (convert_optab_handler (vec_extract_optab,
- TYPE_MODE (TREE_TYPE (new_temp)),
- TYPE_MODE (vectype1))
- != CODE_FOR_nothing)
- {
- /* Extract sub-vectors directly once vec_extract becomes
- a conversion optab. */
- dst1 = make_ssa_name (vectype1);
- epilog_stmt
- = gimple_build_assign (dst1, BIT_FIELD_REF,
- build3 (BIT_FIELD_REF, vectype1,
- new_temp, TYPE_SIZE (vectype1),
- bitsize_int (0)));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- dst2 = make_ssa_name (vectype1);
- epilog_stmt
- = gimple_build_assign (dst2, BIT_FIELD_REF,
- build3 (BIT_FIELD_REF, vectype1,
- new_temp, TYPE_SIZE (vectype1),
- bitsize_int (bitsize)));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- }
- else
- {
- /* Extract via punning to appropriately sized integer mode
- vector. */
- tree eltype = build_nonstandard_integer_type (bitsize, 1);
- tree etype = build_vector_type (eltype, 2);
- gcc_assert (convert_optab_handler (vec_extract_optab,
- TYPE_MODE (etype),
- TYPE_MODE (eltype))
- != CODE_FOR_nothing);
- tree tem = make_ssa_name (etype);
- epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
- build1 (VIEW_CONVERT_EXPR,
- etype, new_temp));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- new_temp = tem;
- tem = make_ssa_name (eltype);
- epilog_stmt
- = gimple_build_assign (tem, BIT_FIELD_REF,
- build3 (BIT_FIELD_REF, eltype,
- new_temp, TYPE_SIZE (eltype),
- bitsize_int (0)));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- dst1 = make_ssa_name (vectype1);
- epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
- build1 (VIEW_CONVERT_EXPR,
- vectype1, tem));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- tem = make_ssa_name (eltype);
- epilog_stmt
- = gimple_build_assign (tem, BIT_FIELD_REF,
- build3 (BIT_FIELD_REF, eltype,
- new_temp, TYPE_SIZE (eltype),
- bitsize_int (bitsize)));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- dst2 = make_ssa_name (vectype1);
- epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
- build1 (VIEW_CONVERT_EXPR,
- vectype1, tem));
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- }
-
- new_temp = make_ssa_name (vectype1);
- epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
- gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
- reduc_inputs[0] = new_temp;
- }
+ gimple_seq stmts = NULL;
+ new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
+ code, &stmts);
+ gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+ reduc_inputs[0] = new_temp;
if (reduce_with_shift && !slp_reduc)
{
@@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
if (auto *accumulator = reduc_info->reused_accumulator)
{
+ tree def = accumulator->reduc_input;
+ unsigned int nreduc;
+ bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
+ TYPE_VECTOR_SUBPARTS (vectype_out),
+ &nreduc);
+ gcc_assert (res);
+ if (nreduc != 1)
+ {
+ /* Reduce the single vector to a smaller one. */
+ gimple_seq stmts = NULL;
+ def = vect_create_partial_epilog (def, vectype_out,
+ STMT_VINFO_REDUC_CODE (reduc_info),
+ &stmts);
+ /* Adjust the input so we pick up the partially reduced value
+ for the skip edge in vect_create_epilog_for_reduction. */
+ accumulator->reduc_input = def;
+ if (loop_vinfo->main_loop_edge)
+ {
+ /* While we'd like to insert on the edge this will split
+ blocks and disturb bookkeeping, we also will eventually
+ need this on the skip edge. Rely on sinking to
+ fixup optimal placement and insert in the pred. */
+ gimple_stmt_iterator gsi
+ = gsi_last_bb (loop_vinfo->main_loop_edge->src);
+ /* Insert before a cond that eventually skips the
+ epilogue. */
+ if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
+ gsi_prev (&gsi);
+ gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
+ }
+ else
+ gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
+ stmts);
+ }
if (loop_vinfo->main_loop_edge)
vec_initial_defs[0]
- = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,
+ = vect_get_main_loop_result (loop_vinfo, def,
vec_initial_defs[0]);
else
- vec_initial_defs.safe_push (accumulator->reduc_input);
- gcc_assert (vec_initial_defs.length () == 1);
+ vec_initial_defs.safe_push (def);
}
/* Generate the reduction PHIs upfront. */
--
2.26.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
@ 2021-07-13 14:17 ` Richard Sandiford
2021-07-15 12:25 ` Christophe Lyon
1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2021-07-13 14:17 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
Richard Biener <rguenther@suse.de> writes:
> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue. That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust? I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).
Yeah. IMO it'd be a bit cleaner to store the new accumulator directly
in the reduc_info, but I don't feel strongly about it.
Apart from that and a minor nit below, it looks good to me FWIW.
(At some point it'd be good for reduc_info to be its own structure,
separate from stmt_vec_info, so that there's less of a cost associated
with storing more data there.)
Thanks,
Richard
> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13 Richard Biener <rguenther@suse.de>
>
> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> vector types where the old vector type has a multiple of
> the new vector type elements.
> (vect_create_partial_epilog): New function, split out from...
> (vect_create_epilog_for_reduction): ... here.
> (vect_transform_cycle_phi): Reduce the re-used accumulator
> to the new vector type.
>
> * gcc.target/i386/vect-reduc-1.c: New testcase.
> ---
> gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++
> gcc/tree-vect-loop.c | 223 ++++++++++++-------
> 2 files changed, 155 insertions(+), 85 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 00000000000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> + int sum = 1;
> + for (int i = 0; i < 8*N + 4; ++i)
> + sum += a[i];
> + return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> + re-used. */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
> ones as well. */
> tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> - if (!useless_type_conversion_p (old_vectype, vectype))
> + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> + TYPE_VECTOR_SUBPARTS (vectype)))
> return false;
>
> /* Non-SLP reductions might apply an adjustment after the reduction
The comment above this needs updating too.
> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
> return true;
> }
>
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
> + gimple_seq *seq)
> +{
> + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
> + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> + tree stype = TREE_TYPE (vectype);
> + tree new_temp = vec_def;
> + while (nunits > nunits1)
> + {
> + nunits /= 2;
> + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> + stype, nunits);
> + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> + /* The target has to make sure we support lowpart/highpart
> + extraction, either via direct vector extract or through
> + an integer mode punning. */
> + tree dst1, dst2;
> + gimple *epilog_stmt;
> + if (convert_optab_handler (vec_extract_optab,
> + TYPE_MODE (TREE_TYPE (new_temp)),
> + TYPE_MODE (vectype1))
> + != CODE_FOR_nothing)
> + {
> + /* Extract sub-vectors directly once vec_extract becomes
> + a conversion optab. */
> + dst1 = make_ssa_name (vectype1);
> + epilog_stmt
> + = gimple_build_assign (dst1, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, vectype1,
> + new_temp, TYPE_SIZE (vectype1),
> + bitsize_int (0)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst2 = make_ssa_name (vectype1);
> + epilog_stmt
> + = gimple_build_assign (dst2, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, vectype1,
> + new_temp, TYPE_SIZE (vectype1),
> + bitsize_int (bitsize)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> + else
> + {
> + /* Extract via punning to appropriately sized integer mode
> + vector. */
> + tree eltype = build_nonstandard_integer_type (bitsize, 1);
> + tree etype = build_vector_type (eltype, 2);
> + gcc_assert (convert_optab_handler (vec_extract_optab,
> + TYPE_MODE (etype),
> + TYPE_MODE (eltype))
> + != CODE_FOR_nothing);
> + tree tem = make_ssa_name (etype);
> + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + etype, new_temp));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + new_temp = tem;
> + tem = make_ssa_name (eltype);
> + epilog_stmt
> + = gimple_build_assign (tem, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, eltype,
> + new_temp, TYPE_SIZE (eltype),
> + bitsize_int (0)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst1 = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + vectype1, tem));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + tem = make_ssa_name (eltype);
> + epilog_stmt
> + = gimple_build_assign (tem, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, eltype,
> + new_temp, TYPE_SIZE (eltype),
> + bitsize_int (bitsize)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst2 = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + vectype1, tem));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> +
> + new_temp = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> +
> + return new_temp;
> +}
> +
> /* Function vect_create_epilog_for_reduction
>
> Create code at the loop-epilog to finalize the result of a reduction
> @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>
> /* First reduce the vector to the desired vector size we should
> do shift reduction on by combining upper and lower halves. */
> - new_temp = reduc_inputs[0];
> - while (nunits > nunits1)
> - {
> - nunits /= 2;
> - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> - stype, nunits);
> - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> -
> - /* The target has to make sure we support lowpart/highpart
> - extraction, either via direct vector extract or through
> - an integer mode punning. */
> - tree dst1, dst2;
> - if (convert_optab_handler (vec_extract_optab,
> - TYPE_MODE (TREE_TYPE (new_temp)),
> - TYPE_MODE (vectype1))
> - != CODE_FOR_nothing)
> - {
> - /* Extract sub-vectors directly once vec_extract becomes
> - a conversion optab. */
> - dst1 = make_ssa_name (vectype1);
> - epilog_stmt
> - = gimple_build_assign (dst1, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, vectype1,
> - new_temp, TYPE_SIZE (vectype1),
> - bitsize_int (0)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst2 = make_ssa_name (vectype1);
> - epilog_stmt
> - = gimple_build_assign (dst2, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, vectype1,
> - new_temp, TYPE_SIZE (vectype1),
> - bitsize_int (bitsize)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - }
> - else
> - {
> - /* Extract via punning to appropriately sized integer mode
> - vector. */
> - tree eltype = build_nonstandard_integer_type (bitsize, 1);
> - tree etype = build_vector_type (eltype, 2);
> - gcc_assert (convert_optab_handler (vec_extract_optab,
> - TYPE_MODE (etype),
> - TYPE_MODE (eltype))
> - != CODE_FOR_nothing);
> - tree tem = make_ssa_name (etype);
> - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - etype, new_temp));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - new_temp = tem;
> - tem = make_ssa_name (eltype);
> - epilog_stmt
> - = gimple_build_assign (tem, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, eltype,
> - new_temp, TYPE_SIZE (eltype),
> - bitsize_int (0)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst1 = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - vectype1, tem));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - tem = make_ssa_name (eltype);
> - epilog_stmt
> - = gimple_build_assign (tem, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, eltype,
> - new_temp, TYPE_SIZE (eltype),
> - bitsize_int (bitsize)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst2 = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - vectype1, tem));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - }
> -
> - new_temp = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - reduc_inputs[0] = new_temp;
> - }
> + gimple_seq stmts = NULL;
> + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> + code, &stmts);
> + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> + reduc_inputs[0] = new_temp;
>
> if (reduce_with_shift && !slp_reduc)
> {
> @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>
> if (auto *accumulator = reduc_info->reused_accumulator)
> {
> + tree def = accumulator->reduc_input;
> + unsigned int nreduc;
> + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
> + TYPE_VECTOR_SUBPARTS (vectype_out),
> + &nreduc);
> + gcc_assert (res);
> + if (nreduc != 1)
> + {
> + /* Reduce the single vector to a smaller one. */
> + gimple_seq stmts = NULL;
> + def = vect_create_partial_epilog (def, vectype_out,
> + STMT_VINFO_REDUC_CODE (reduc_info),
> + &stmts);
> + /* Adjust the input so we pick up the partially reduced value
> + for the skip edge in vect_create_epilog_for_reduction. */
> + accumulator->reduc_input = def;
> + if (loop_vinfo->main_loop_edge)
> + {
> + /* While we'd like to insert on the edge this will split
> + blocks and disturb bookkeeping, we also will eventually
> + need this on the skip edge. Rely on sinking to
> + fixup optimal placement and insert in the pred. */
> + gimple_stmt_iterator gsi
> + = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> + /* Insert before a cond that eventually skips the
> + epilogue. */
> + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> + gsi_prev (&gsi);
> + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> + }
> + else
> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> + stmts);
> + }
> if (loop_vinfo->main_loop_edge)
> vec_initial_defs[0]
> - = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,
> + = vect_get_main_loop_result (loop_vinfo, def,
> vec_initial_defs[0]);
> else
> - vec_initial_defs.safe_push (accumulator->reduc_input);
> - gcc_assert (vec_initial_defs.length () == 1);
> + vec_initial_defs.safe_push (def);
> }
>
> /* Generate the reduction PHIs upfront. */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
2021-07-13 14:17 ` Richard Sandiford
@ 2021-07-15 12:25 ` Christophe Lyon
2021-07-15 12:34 ` Richard Biener
1 sibling, 1 reply; 5+ messages in thread
From: Christophe Lyon @ 2021-07-15 12:25 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
Hi,
On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote:
> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue. That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust? I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).
>
> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13 Richard Biener <rguenther@suse.de>
>
> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> vector types where the old vector type has a multiple of
> the new vector type elements.
> (vect_create_partial_epilog): New function, split out from...
> (vect_create_epilog_for_reduction): ... here.
> (vect_transform_cycle_phi): Reduce the re-used accumulator
> to the new vector type.
>
> * gcc.target/i386/vect-reduc-1.c: New testcase.
>
This patch is causing regressions on aarch64:
FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
'PHI' argument 1
vector(2) unsigned int
vector(2) int
_91 = PHI <_90(17), _83(11)>
during GIMPLE pass: vect
dump file: ./pr92324-4.c.167t.vect
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
verify_gimple failed
0xe6438e verify_gimple_in_cfg(function*, bool)
/gcc/tree-cfg.c:5535
0xd13902 execute_function_todo
/gcc/passes.c:2042
0xd142a5 execute_todo
/gcc/passes.c:2096
FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv
Thanks,
Christophe
> ---
> gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++
> gcc/tree-vect-loop.c | 223 ++++++++++++-------
> 2 files changed, 155 insertions(+), 85 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 00000000000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> + int sum = 1;
> + for (int i = 0; i < 8*N + 4; ++i)
> + sum += a[i];
> + return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> + re-used. */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
> ones as well. */
> tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> - if (!useless_type_conversion_p (old_vectype, vectype))
> + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> + TYPE_VECTOR_SUBPARTS (vectype)))
> return false;
>
> /* Non-SLP reductions might apply an adjustment after the reduction
> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
> return true;
> }
>
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> code,
> + gimple_seq *seq)
> +{
> + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (vec_def)).to_constant ();
> + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> + tree stype = TREE_TYPE (vectype);
> + tree new_temp = vec_def;
> + while (nunits > nunits1)
> + {
> + nunits /= 2;
> + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> (vectype),
> + stype, nunits);
> + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> + /* The target has to make sure we support lowpart/highpart
> + extraction, either via direct vector extract or through
> + an integer mode punning. */
> + tree dst1, dst2;
> + gimple *epilog_stmt;
> + if (convert_optab_handler (vec_extract_optab,
> + TYPE_MODE (TREE_TYPE (new_temp)),
> + TYPE_MODE (vectype1))
> + != CODE_FOR_nothing)
> + {
> + /* Extract sub-vectors directly once vec_extract becomes
> + a conversion optab. */
> + dst1 = make_ssa_name (vectype1);
> + epilog_stmt
> + = gimple_build_assign (dst1, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, vectype1,
> + new_temp, TYPE_SIZE
> (vectype1),
> + bitsize_int (0)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst2 = make_ssa_name (vectype1);
> + epilog_stmt
> + = gimple_build_assign (dst2, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, vectype1,
> + new_temp, TYPE_SIZE
> (vectype1),
> + bitsize_int (bitsize)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> + else
> + {
> + /* Extract via punning to appropriately sized integer mode
> + vector. */
> + tree eltype = build_nonstandard_integer_type (bitsize, 1);
> + tree etype = build_vector_type (eltype, 2);
> + gcc_assert (convert_optab_handler (vec_extract_optab,
> + TYPE_MODE (etype),
> + TYPE_MODE (eltype))
> + != CODE_FOR_nothing);
> + tree tem = make_ssa_name (etype);
> + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + etype, new_temp));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + new_temp = tem;
> + tem = make_ssa_name (eltype);
> + epilog_stmt
> + = gimple_build_assign (tem, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, eltype,
> + new_temp, TYPE_SIZE (eltype),
> + bitsize_int (0)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst1 = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + vectype1, tem));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + tem = make_ssa_name (eltype);
> + epilog_stmt
> + = gimple_build_assign (tem, BIT_FIELD_REF,
> + build3 (BIT_FIELD_REF, eltype,
> + new_temp, TYPE_SIZE (eltype),
> + bitsize_int (bitsize)));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + dst2 = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> + build1 (VIEW_CONVERT_EXPR,
> + vectype1, tem));
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> +
> + new_temp = make_ssa_name (vectype1);
> + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> + }
> +
> + return new_temp;
> +}
> +
> /* Function vect_create_epilog_for_reduction
>
> Create code at the loop-epilog to finalize the result of a reduction
> @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
>
> /* First reduce the vector to the desired vector size we should
> do shift reduction on by combining upper and lower halves. */
> - new_temp = reduc_inputs[0];
> - while (nunits > nunits1)
> - {
> - nunits /= 2;
> - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> (vectype),
> - stype, nunits);
> - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> -
> - /* The target has to make sure we support lowpart/highpart
> - extraction, either via direct vector extract or through
> - an integer mode punning. */
> - tree dst1, dst2;
> - if (convert_optab_handler (vec_extract_optab,
> - TYPE_MODE (TREE_TYPE (new_temp)),
> - TYPE_MODE (vectype1))
> - != CODE_FOR_nothing)
> - {
> - /* Extract sub-vectors directly once vec_extract becomes
> - a conversion optab. */
> - dst1 = make_ssa_name (vectype1);
> - epilog_stmt
> - = gimple_build_assign (dst1, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, vectype1,
> - new_temp, TYPE_SIZE
> (vectype1),
> - bitsize_int (0)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst2 = make_ssa_name (vectype1);
> - epilog_stmt
> - = gimple_build_assign (dst2, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, vectype1,
> - new_temp, TYPE_SIZE
> (vectype1),
> - bitsize_int (bitsize)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - }
> - else
> - {
> - /* Extract via punning to appropriately sized integer mode
> - vector. */
> - tree eltype = build_nonstandard_integer_type (bitsize, 1);
> - tree etype = build_vector_type (eltype, 2);
> - gcc_assert (convert_optab_handler (vec_extract_optab,
> - TYPE_MODE (etype),
> - TYPE_MODE (eltype))
> - != CODE_FOR_nothing);
> - tree tem = make_ssa_name (etype);
> - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - etype, new_temp));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - new_temp = tem;
> - tem = make_ssa_name (eltype);
> - epilog_stmt
> - = gimple_build_assign (tem, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, eltype,
> - new_temp, TYPE_SIZE
> (eltype),
> - bitsize_int (0)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst1 = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - vectype1, tem));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - tem = make_ssa_name (eltype);
> - epilog_stmt
> - = gimple_build_assign (tem, BIT_FIELD_REF,
> - build3 (BIT_FIELD_REF, eltype,
> - new_temp, TYPE_SIZE
> (eltype),
> - bitsize_int (bitsize)));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - dst2 = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> - build1 (VIEW_CONVERT_EXPR,
> - vectype1, tem));
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - }
> -
> - new_temp = make_ssa_name (vectype1);
> - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> - reduc_inputs[0] = new_temp;
> - }
> + gimple_seq stmts = NULL;
> + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> + code, &stmts);
> + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> + reduc_inputs[0] = new_temp;
>
> if (reduce_with_shift && !slp_reduc)
> {
> @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>
> if (auto *accumulator = reduc_info->reused_accumulator)
> {
> + tree def = accumulator->reduc_input;
> + unsigned int nreduc;
> + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (def)),
> + TYPE_VECTOR_SUBPARTS (vectype_out),
> + &nreduc);
> + gcc_assert (res);
> + if (nreduc != 1)
> + {
> + /* Reduce the single vector to a smaller one. */
> + gimple_seq stmts = NULL;
> + def = vect_create_partial_epilog (def, vectype_out,
> + STMT_VINFO_REDUC_CODE
> (reduc_info),
> + &stmts);
> + /* Adjust the input so we pick up the partially reduced value
> + for the skip edge in vect_create_epilog_for_reduction. */
> + accumulator->reduc_input = def;
> + if (loop_vinfo->main_loop_edge)
> + {
> + /* While we'd like to insert on the edge this will split
> + blocks and disturb bookkeeping, we also will eventually
> + need this on the skip edge. Rely on sinking to
> + fixup optimal placement and insert in the pred. */
> + gimple_stmt_iterator gsi
> + = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> + /* Insert before a cond that eventually skips the
> + epilogue. */
> + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> + gsi_prev (&gsi);
> + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> + }
> + else
> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> + stmts);
> + }
> if (loop_vinfo->main_loop_edge)
> vec_initial_defs[0]
> - = vect_get_main_loop_result (loop_vinfo,
> accumulator->reduc_input,
> + = vect_get_main_loop_result (loop_vinfo, def,
> vec_initial_defs[0]);
> else
> - vec_initial_defs.safe_push (accumulator->reduc_input);
> - gcc_assert (vec_initial_defs.length () == 1);
> + vec_initial_defs.safe_push (def);
> }
>
> /* Generate the reduction PHIs upfront. */
> --
> 2.26.2
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
2021-07-15 12:25 ` Christophe Lyon
@ 2021-07-15 12:34 ` Richard Biener
2021-07-15 15:19 ` Christophe Lyon
0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2021-07-15 12:34 UTC (permalink / raw)
To: Christophe Lyon; +Cc: GCC Patches, Richard Sandiford
On Thu, 15 Jul 2021, Christophe Lyon wrote:
> Hi,
>
>
>
> On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote:
>
> > The following adds support for re-using the vector reduction def
> > from the main loop in vectorized epilogue loops on architectures
> > which use different vector sizes for the epilogue. That's only
> > x86 as far as I am aware.
> >
> > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > regtest in progress.
> >
> > There's costing issues on x86 which usually prevent vectorizing
> > an epilogue with a reduction, at least for loops that only
> > have a reduction - it could be mitigated by not accounting for
> > the epilogue there if we can compute that we can re-use the
> > main loops cost.
> >
> > Richard - did I figure the correct place to adjust? I guess
> > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > for re-use by the skip code in vect_create_epilog_for_reduction
> > is a bit awkward but at least we're conciously doing
> > vect_create_epilog_for_reduction last (via vectorizing live
> > operations).
> >
> > OK in the unlikely case all testing succeeds (I also want to
> > run it through SPEC with/without -fno-vect-cost-model which
> > will take some time)?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-13 Richard Biener <rguenther@suse.de>
> >
> > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > vector types where the old vector type has a multiple of
> > the new vector type elements.
> > (vect_create_partial_epilog): New function, split out from...
> > (vect_create_epilog_for_reduction): ... here.
> > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > to the new vector type.
> >
> > * gcc.target/i386/vect-reduc-1.c: New testcase.
> >
>
> This patch is causing regressions on aarch64:
> FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> 'PHI' argument 1
> vector(2) unsigned int
> vector(2) int
> _91 = PHI <_90(17), _83(11)>
> during GIMPLE pass: vect
> dump file: ./pr92324-4.c.167t.vect
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> verify_gimple failed
> 0xe6438e verify_gimple_in_cfg(function*, bool)
> /gcc/tree-cfg.c:5535
> 0xd13902 execute_function_todo
> /gcc/passes.c:2042
> 0xd142a5 execute_todo
> /gcc/passes.c:2096
>
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv
What exact options do you pass to cc1 to get this? Can you track this
in a PR please?
Thanks,
Richard.
> Thanks,
>
> Christophe
>
>
>
> > ---
> > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++
> > gcc/tree-vect-loop.c | 223 ++++++++++++-------
> > 2 files changed, 155 insertions(+), 85 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > new file mode 100644
> > index 00000000000..9ee9ba4e736
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> > +
> > +#define N 32
> > +int foo (int *a, int n)
> > +{
> > + int sum = 1;
> > + for (int i = 0; i < 8*N + 4; ++i)
> > + sum += a[i];
> > + return sum;
> > +}
> > +
> > +/* The reduction epilog should be vectorized and the accumulator
> > + re-used. */
> > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index 8c27d75f889..98e2a845629 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> > ones as well. */
> > tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> > tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > - if (!useless_type_conversion_p (old_vectype, vectype))
> > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > + TYPE_VECTOR_SUBPARTS (vectype)))
> > return false;
> >
> > /* Non-SLP reductions might apply an adjustment after the reduction
> > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> > return true;
> > }
> >
> > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE. */
> > +
> > +static tree
> > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> > code,
> > + gimple_seq *seq)
> > +{
> > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > (vec_def)).to_constant ();
> > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> > + tree stype = TREE_TYPE (vectype);
> > + tree new_temp = vec_def;
> > + while (nunits > nunits1)
> > + {
> > + nunits /= 2;
> > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > (vectype),
> > + stype, nunits);
> > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > +
> > + /* The target has to make sure we support lowpart/highpart
> > + extraction, either via direct vector extract or through
> > + an integer mode punning. */
> > + tree dst1, dst2;
> > + gimple *epilog_stmt;
> > + if (convert_optab_handler (vec_extract_optab,
> > + TYPE_MODE (TREE_TYPE (new_temp)),
> > + TYPE_MODE (vectype1))
> > + != CODE_FOR_nothing)
> > + {
> > + /* Extract sub-vectors directly once vec_extract becomes
> > + a conversion optab. */
> > + dst1 = make_ssa_name (vectype1);
> > + epilog_stmt
> > + = gimple_build_assign (dst1, BIT_FIELD_REF,
> > + build3 (BIT_FIELD_REF, vectype1,
> > + new_temp, TYPE_SIZE
> > (vectype1),
> > + bitsize_int (0)));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + dst2 = make_ssa_name (vectype1);
> > + epilog_stmt
> > + = gimple_build_assign (dst2, BIT_FIELD_REF,
> > + build3 (BIT_FIELD_REF, vectype1,
> > + new_temp, TYPE_SIZE
> > (vectype1),
> > + bitsize_int (bitsize)));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + }
> > + else
> > + {
> > + /* Extract via punning to appropriately sized integer mode
> > + vector. */
> > + tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > + tree etype = build_vector_type (eltype, 2);
> > + gcc_assert (convert_optab_handler (vec_extract_optab,
> > + TYPE_MODE (etype),
> > + TYPE_MODE (eltype))
> > + != CODE_FOR_nothing);
> > + tree tem = make_ssa_name (etype);
> > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > + build1 (VIEW_CONVERT_EXPR,
> > + etype, new_temp));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + new_temp = tem;
> > + tem = make_ssa_name (eltype);
> > + epilog_stmt
> > + = gimple_build_assign (tem, BIT_FIELD_REF,
> > + build3 (BIT_FIELD_REF, eltype,
> > + new_temp, TYPE_SIZE (eltype),
> > + bitsize_int (0)));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + dst1 = make_ssa_name (vectype1);
> > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > + build1 (VIEW_CONVERT_EXPR,
> > + vectype1, tem));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + tem = make_ssa_name (eltype);
> > + epilog_stmt
> > + = gimple_build_assign (tem, BIT_FIELD_REF,
> > + build3 (BIT_FIELD_REF, eltype,
> > + new_temp, TYPE_SIZE (eltype),
> > + bitsize_int (bitsize)));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + dst2 = make_ssa_name (vectype1);
> > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > + build1 (VIEW_CONVERT_EXPR,
> > + vectype1, tem));
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + }
> > +
> > + new_temp = make_ssa_name (vectype1);
> > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > + }
> > +
> > + return new_temp;
> > +}
> > +
> > /* Function vect_create_epilog_for_reduction
> >
> > Create code at the loop-epilog to finalize the result of a reduction
> > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> >
> > /* First reduce the vector to the desired vector size we should
> > do shift reduction on by combining upper and lower halves. */
> > - new_temp = reduc_inputs[0];
> > - while (nunits > nunits1)
> > - {
> > - nunits /= 2;
> > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > (vectype),
> > - stype, nunits);
> > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > -
> > - /* The target has to make sure we support lowpart/highpart
> > - extraction, either via direct vector extract or through
> > - an integer mode punning. */
> > - tree dst1, dst2;
> > - if (convert_optab_handler (vec_extract_optab,
> > - TYPE_MODE (TREE_TYPE (new_temp)),
> > - TYPE_MODE (vectype1))
> > - != CODE_FOR_nothing)
> > - {
> > - /* Extract sub-vectors directly once vec_extract becomes
> > - a conversion optab. */
> > - dst1 = make_ssa_name (vectype1);
> > - epilog_stmt
> > - = gimple_build_assign (dst1, BIT_FIELD_REF,
> > - build3 (BIT_FIELD_REF, vectype1,
> > - new_temp, TYPE_SIZE
> > (vectype1),
> > - bitsize_int (0)));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - dst2 = make_ssa_name (vectype1);
> > - epilog_stmt
> > - = gimple_build_assign (dst2, BIT_FIELD_REF,
> > - build3 (BIT_FIELD_REF, vectype1,
> > - new_temp, TYPE_SIZE
> > (vectype1),
> > - bitsize_int (bitsize)));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - }
> > - else
> > - {
> > - /* Extract via punning to appropriately sized integer mode
> > - vector. */
> > - tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > - tree etype = build_vector_type (eltype, 2);
> > - gcc_assert (convert_optab_handler (vec_extract_optab,
> > - TYPE_MODE (etype),
> > - TYPE_MODE (eltype))
> > - != CODE_FOR_nothing);
> > - tree tem = make_ssa_name (etype);
> > - epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > - build1 (VIEW_CONVERT_EXPR,
> > - etype, new_temp));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - new_temp = tem;
> > - tem = make_ssa_name (eltype);
> > - epilog_stmt
> > - = gimple_build_assign (tem, BIT_FIELD_REF,
> > - build3 (BIT_FIELD_REF, eltype,
> > - new_temp, TYPE_SIZE
> > (eltype),
> > - bitsize_int (0)));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - dst1 = make_ssa_name (vectype1);
> > - epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > - build1 (VIEW_CONVERT_EXPR,
> > - vectype1, tem));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - tem = make_ssa_name (eltype);
> > - epilog_stmt
> > - = gimple_build_assign (tem, BIT_FIELD_REF,
> > - build3 (BIT_FIELD_REF, eltype,
> > - new_temp, TYPE_SIZE
> > (eltype),
> > - bitsize_int (bitsize)));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - dst2 = make_ssa_name (vectype1);
> > - epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > - build1 (VIEW_CONVERT_EXPR,
> > - vectype1, tem));
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - }
> > -
> > - new_temp = make_ssa_name (vectype1);
> > - epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > - reduc_inputs[0] = new_temp;
> > - }
> > + gimple_seq stmts = NULL;
> > + new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> > + code, &stmts);
> > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > + reduc_inputs[0] = new_temp;
> >
> > if (reduce_with_shift && !slp_reduc)
> > {
> > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
> >
> > if (auto *accumulator = reduc_info->reused_accumulator)
> > {
> > + tree def = accumulator->reduc_input;
> > + unsigned int nreduc;
> > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > (def)),
> > + TYPE_VECTOR_SUBPARTS (vectype_out),
> > + &nreduc);
> > + gcc_assert (res);
> > + if (nreduc != 1)
> > + {
> > + /* Reduce the single vector to a smaller one. */
> > + gimple_seq stmts = NULL;
> > + def = vect_create_partial_epilog (def, vectype_out,
> > + STMT_VINFO_REDUC_CODE
> > (reduc_info),
> > + &stmts);
> > + /* Adjust the input so we pick up the partially reduced value
> > + for the skip edge in vect_create_epilog_for_reduction. */
> > + accumulator->reduc_input = def;
> > + if (loop_vinfo->main_loop_edge)
> > + {
> > + /* While we'd like to insert on the edge this will split
> > + blocks and disturb bookkeeping, we also will eventually
> > + need this on the skip edge. Rely on sinking to
> > + fixup optimal placement and insert in the pred. */
> > + gimple_stmt_iterator gsi
> > + = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> > + /* Insert before a cond that eventually skips the
> > + epilogue. */
> > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> > + gsi_prev (&gsi);
> > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> > + }
> > + else
> > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> > + stmts);
> > + }
> > if (loop_vinfo->main_loop_edge)
> > vec_initial_defs[0]
> > - = vect_get_main_loop_result (loop_vinfo,
> > accumulator->reduc_input,
> > + = vect_get_main_loop_result (loop_vinfo, def,
> > vec_initial_defs[0]);
> > else
> > - vec_initial_defs.safe_push (accumulator->reduc_input);
> > - gcc_assert (vec_initial_defs.length () == 1);
> > + vec_initial_defs.safe_push (def);
> > }
> >
> > /* Generate the reduction PHIs upfront. */
> > --
> > 2.26.2
> >
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
2021-07-15 12:34 ` Richard Biener
@ 2021-07-15 15:19 ` Christophe Lyon
0 siblings, 0 replies; 5+ messages in thread
From: Christophe Lyon @ 2021-07-15 15:19 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
On Thu, Jul 15, 2021 at 2:34 PM Richard Biener <rguenther@suse.de> wrote:
> On Thu, 15 Jul 2021, Christophe Lyon wrote:
>
> > Hi,
> >
> >
> >
> > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de>
> wrote:
> >
> > > The following adds support for re-using the vector reduction def
> > > from the main loop in vectorized epilogue loops on architectures
> > > which use different vector sizes for the epilogue. That's only
> > > x86 as far as I am aware.
> > >
> > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > > regtest in progress.
> > >
> > > There's costing issues on x86 which usually prevent vectorizing
> > > an epilogue with a reduction, at least for loops that only
> > > have a reduction - it could be mitigated by not accounting for
> > > the epilogue there if we can compute that we can re-use the
> > > main loops cost.
> > >
> > > Richard - did I figure the correct place to adjust? I guess
> > > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > > for re-use by the skip code in vect_create_epilog_for_reduction
> > > is a bit awkward but at least we're conciously doing
> > > vect_create_epilog_for_reduction last (via vectorizing live
> > > operations).
> > >
> > > OK in the unlikely case all testing succeeds (I also want to
> > > run it through SPEC with/without -fno-vect-cost-model which
> > > will take some time)?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-13 Richard Biener <rguenther@suse.de>
> > >
> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > > vector types where the old vector type has a multiple of
> > > the new vector type elements.
> > > (vect_create_partial_epilog): New function, split out from...
> > > (vect_create_epilog_for_reduction): ... here.
> > > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > > to the new vector type.
> > >
> > > * gcc.target/i386/vect-reduc-1.c: New testcase.
> > >
> >
> > This patch is causing regressions on aarch64:
> > FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> > Excess errors:
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> > 'PHI' argument 1
> > vector(2) unsigned int
> > vector(2) int
> > _91 = PHI <_90(17), _83(11)>
> > during GIMPLE pass: vect
> > dump file: ./pr92324-4.c.167t.vect
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> > verify_gimple failed
> > 0xe6438e verify_gimple_in_cfg(function*, bool)
> > /gcc/tree-cfg.c:5535
> > 0xd13902 execute_function_todo
> > /gcc/passes.c:2042
> > 0xd142a5 execute_todo
> > /gcc/passes.c:2096
> >
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fminnmv
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fmaxnmv
>
> What exact options do you pass to cc1 to get this? Can you track this
> in a PR please?
>
> Thanks,
> Richard.
>
>
Sure, I filed PR 101462
Christophe
> > Thanks,
> >
> > Christophe
> >
> >
> >
> > > ---
> > > gcc/testsuite/gcc.target/i386/vect-reduc-1.c | 17 ++
> > > gcc/tree-vect-loop.c | 223 ++++++++++++-------
> > > 2 files changed, 155 insertions(+), 85 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > new file mode 100644
> > > index 00000000000..9ee9ba4e736
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" }
> */
> > > +
> > > +#define N 32
> > > +int foo (int *a, int n)
> > > +{
> > > + int sum = 1;
> > > + for (int i = 0; i < 8*N + 4; ++i)
> > > + sum += a[i];
> > > + return sum;
> > > +}
> > > +
> > > +/* The reduction epilog should be vectorized and the accumulator
> > > + re-used. */
> > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } }
> */
> > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index 8c27d75f889..98e2a845629 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > > ones as well. */
> > > tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> > > tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > > - if (!useless_type_conversion_p (old_vectype, vectype))
> > > + if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > > + TYPE_VECTOR_SUBPARTS (vectype)))
> > > return false;
> > >
> > > /* Non-SLP reductions might apply an adjustment after the reduction
> > > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > > return true;
> > > }
> > >
> > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > > + CODE emitting stmts before GSI. Returns a vector def of VECTYPE.
> */
> > > +
> > > +static tree
> > > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> > > code,
> > > + gimple_seq *seq)
> > > +{
> > > + unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > > (vec_def)).to_constant ();
> > > + unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> > > + tree stype = TREE_TYPE (vectype);
> > > + tree new_temp = vec_def;
> > > + while (nunits > nunits1)
> > > + {
> > > + nunits /= 2;
> > > + tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > > (vectype),
> > > + stype,
> nunits);
> > > + unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > > +
> > > + /* The target has to make sure we support lowpart/highpart
> > > + extraction, either via direct vector extract or through
> > > + an integer mode punning. */
> > > + tree dst1, dst2;
> > > + gimple *epilog_stmt;
> > > + if (convert_optab_handler (vec_extract_optab,
> > > + TYPE_MODE (TREE_TYPE (new_temp)),
> > > + TYPE_MODE (vectype1))
> > > + != CODE_FOR_nothing)
> > > + {
> > > + /* Extract sub-vectors directly once vec_extract becomes
> > > + a conversion optab. */
> > > + dst1 = make_ssa_name (vectype1);
> > > + epilog_stmt
> > > + = gimple_build_assign (dst1, BIT_FIELD_REF,
> > > + build3 (BIT_FIELD_REF, vectype1,
> > > + new_temp, TYPE_SIZE
> > > (vectype1),
> > > + bitsize_int (0)));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + dst2 = make_ssa_name (vectype1);
> > > + epilog_stmt
> > > + = gimple_build_assign (dst2, BIT_FIELD_REF,
> > > + build3 (BIT_FIELD_REF, vectype1,
> > > + new_temp, TYPE_SIZE
> > > (vectype1),
> > > + bitsize_int (bitsize)));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + }
> > > + else
> > > + {
> > > + /* Extract via punning to appropriately sized integer mode
> > > + vector. */
> > > + tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > > + tree etype = build_vector_type (eltype, 2);
> > > + gcc_assert (convert_optab_handler (vec_extract_optab,
> > > + TYPE_MODE (etype),
> > > + TYPE_MODE (eltype))
> > > + != CODE_FOR_nothing);
> > > + tree tem = make_ssa_name (etype);
> > > + epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > > + build1 (VIEW_CONVERT_EXPR,
> > > + etype, new_temp));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + new_temp = tem;
> > > + tem = make_ssa_name (eltype);
> > > + epilog_stmt
> > > + = gimple_build_assign (tem, BIT_FIELD_REF,
> > > + build3 (BIT_FIELD_REF, eltype,
> > > + new_temp, TYPE_SIZE
> (eltype),
> > > + bitsize_int (0)));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + dst1 = make_ssa_name (vectype1);
> > > + epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > > + build1 (VIEW_CONVERT_EXPR,
> > > + vectype1, tem));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + tem = make_ssa_name (eltype);
> > > + epilog_stmt
> > > + = gimple_build_assign (tem, BIT_FIELD_REF,
> > > + build3 (BIT_FIELD_REF, eltype,
> > > + new_temp, TYPE_SIZE
> (eltype),
> > > + bitsize_int (bitsize)));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + dst2 = make_ssa_name (vectype1);
> > > + epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > > + build1 (VIEW_CONVERT_EXPR,
> > > + vectype1, tem));
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + }
> > > +
> > > + new_temp = make_ssa_name (vectype1);
> > > + epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > > + gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > + }
> > > +
> > > + return new_temp;
> > > +}
> > > +
> > > /* Function vect_create_epilog_for_reduction
> > >
> > > Create code at the loop-epilog to finalize the result of a
> reduction
> > > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> > > loop_vinfo,
> > >
> > > /* First reduce the vector to the desired vector size we should
> > > do shift reduction on by combining upper and lower halves. */
> > > - new_temp = reduc_inputs[0];
> > > - while (nunits > nunits1)
> > > - {
> > > - nunits /= 2;
> > > - vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > > (vectype),
> > > - stype,
> nunits);
> > > - unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > > -
> > > - /* The target has to make sure we support lowpart/highpart
> > > - extraction, either via direct vector extract or through
> > > - an integer mode punning. */
> > > - tree dst1, dst2;
> > > - if (convert_optab_handler (vec_extract_optab,
> > > - TYPE_MODE (TREE_TYPE (new_temp)),
> > > - TYPE_MODE (vectype1))
> > > - != CODE_FOR_nothing)
> > > - {
> > > - /* Extract sub-vectors directly once vec_extract becomes
> > > - a conversion optab. */
> > > - dst1 = make_ssa_name (vectype1);
> > > - epilog_stmt
> > > - = gimple_build_assign (dst1, BIT_FIELD_REF,
> > > - build3 (BIT_FIELD_REF,
> vectype1,
> > > - new_temp, TYPE_SIZE
> > > (vectype1),
> > > - bitsize_int (0)));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - dst2 = make_ssa_name (vectype1);
> > > - epilog_stmt
> > > - = gimple_build_assign (dst2, BIT_FIELD_REF,
> > > - build3 (BIT_FIELD_REF,
> vectype1,
> > > - new_temp, TYPE_SIZE
> > > (vectype1),
> > > - bitsize_int
> (bitsize)));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - }
> > > - else
> > > - {
> > > - /* Extract via punning to appropriately sized integer
> mode
> > > - vector. */
> > > - tree eltype = build_nonstandard_integer_type (bitsize,
> 1);
> > > - tree etype = build_vector_type (eltype, 2);
> > > - gcc_assert (convert_optab_handler (vec_extract_optab,
> > > - TYPE_MODE (etype),
> > > - TYPE_MODE (eltype))
> > > - != CODE_FOR_nothing);
> > > - tree tem = make_ssa_name (etype);
> > > - epilog_stmt = gimple_build_assign (tem,
> VIEW_CONVERT_EXPR,
> > > - build1
> (VIEW_CONVERT_EXPR,
> > > - etype,
> new_temp));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - new_temp = tem;
> > > - tem = make_ssa_name (eltype);
> > > - epilog_stmt
> > > - = gimple_build_assign (tem, BIT_FIELD_REF,
> > > - build3 (BIT_FIELD_REF, eltype,
> > > - new_temp, TYPE_SIZE
> > > (eltype),
> > > - bitsize_int (0)));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - dst1 = make_ssa_name (vectype1);
> > > - epilog_stmt = gimple_build_assign (dst1,
> VIEW_CONVERT_EXPR,
> > > - build1
> (VIEW_CONVERT_EXPR,
> > > - vectype1,
> tem));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - tem = make_ssa_name (eltype);
> > > - epilog_stmt
> > > - = gimple_build_assign (tem, BIT_FIELD_REF,
> > > - build3 (BIT_FIELD_REF, eltype,
> > > - new_temp, TYPE_SIZE
> > > (eltype),
> > > - bitsize_int
> (bitsize)));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - dst2 = make_ssa_name (vectype1);
> > > - epilog_stmt = gimple_build_assign (dst2,
> VIEW_CONVERT_EXPR,
> > > - build1
> (VIEW_CONVERT_EXPR,
> > > - vectype1,
> tem));
> > > - gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > - }
> > > -
> > > - new_temp = make_ssa_name (vectype1);
> > > - epilog_stmt = gimple_build_assign (new_temp, code, dst1,
> dst2);
> > > - gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > > - reduc_inputs[0] = new_temp;
> > > - }
> > > + gimple_seq stmts = NULL;
> > > + new_temp = vect_create_partial_epilog (reduc_inputs[0],
> vectype1,
> > > + code, &stmts);
> > > + gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > + reduc_inputs[0] = new_temp;
> > >
> > > if (reduce_with_shift && !slp_reduc)
> > > {
> > > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info
> loop_vinfo,
> > >
> > > if (auto *accumulator = reduc_info->reused_accumulator)
> > > {
> > > + tree def = accumulator->reduc_input;
> > > + unsigned int nreduc;
> > > + bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > > (def)),
> > > + TYPE_VECTOR_SUBPARTS
> (vectype_out),
> > > + &nreduc);
> > > + gcc_assert (res);
> > > + if (nreduc != 1)
> > > + {
> > > + /* Reduce the single vector to a smaller one. */
> > > + gimple_seq stmts = NULL;
> > > + def = vect_create_partial_epilog (def, vectype_out,
> > > + STMT_VINFO_REDUC_CODE
> > > (reduc_info),
> > > + &stmts);
> > > + /* Adjust the input so we pick up the partially reduced value
> > > + for the skip edge in vect_create_epilog_for_reduction. */
> > > + accumulator->reduc_input = def;
> > > + if (loop_vinfo->main_loop_edge)
> > > + {
> > > + /* While we'd like to insert on the edge this will split
> > > + blocks and disturb bookkeeping, we also will
> eventually
> > > + need this on the skip edge. Rely on sinking to
> > > + fixup optimal placement and insert in the pred. */
> > > + gimple_stmt_iterator gsi
> > > + = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> > > + /* Insert before a cond that eventually skips the
> > > + epilogue. */
> > > + if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> > > + gsi_prev (&gsi);
> > > + gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> > > + }
> > > + else
> > > + gsi_insert_seq_on_edge_immediate (loop_preheader_edge
> (loop),
> > > + stmts);
> > > + }
> > > if (loop_vinfo->main_loop_edge)
> > > vec_initial_defs[0]
> > > - = vect_get_main_loop_result (loop_vinfo,
> > > accumulator->reduc_input,
> > > + = vect_get_main_loop_result (loop_vinfo, def,
> > > vec_initial_defs[0]);
> > > else
> > > - vec_initial_defs.safe_push (accumulator->reduc_input);
> > > - gcc_assert (vec_initial_defs.length () == 1);
> > > + vec_initial_defs.safe_push (def);
> > > }
> > >
> > > /* Generate the reduction PHIs upfront. */
> > > --
> > > 2.26.2
> > >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-15 15:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
2021-07-13 14:17 ` Richard Sandiford
2021-07-15 12:25 ` Christophe Lyon
2021-07-15 12:34 ` Richard Biener
2021-07-15 15:19 ` Christophe Lyon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).