public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Support reduction def re-use for epilogue with different vector size
@ 2021-07-13 12:09 Richard Biener
  2021-07-13 14:17 ` Richard Sandiford
  2021-07-15 12:25 ` Christophe Lyon
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2021-07-13 12:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue.  That's only
x86 as far as I am aware.

vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
regtest in progress.

There's costing issues on x86 which usually prevent vectorizing
an epilogue with a reduction, at least for loops that only
have a reduction - it could be mitigated by not accounting for
the epilogue there if we can compute that we can re-use the
main loops cost.

Richard - did I figure the correct place to adjust?  I guess
adjusting accumulator->reduc_input in vect_transform_cycle_phi
for re-use by the skip code in vect_create_epilog_for_reduction
is a bit awkward but at least we're conciously doing
vect_create_epilog_for_reduction last (via vectorizing live
operations).

OK in the unlikely case all testing succeeds (I also want to
run it through SPEC with/without -fno-vect-cost-model which
will take some time)?

Thanks,
Richard.

2021-07-13  Richard Biener  <rguenther@suse.de>

	* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
	vector types where the old vector type has a multiple of
	the new vector type elements.
	(vect_create_partial_epilog): New function, split out from...
	(vect_create_epilog_for_reduction): ... here.
	(vect_transform_cycle_phi): Reduce the re-used accumulator
	to the new vector type.

	* gcc.target/i386/vect-reduc-1.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
 gcc/tree-vect-loop.c                         | 223 ++++++++++++-------
 2 files changed, 155 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
new file mode 100644
index 00000000000..9ee9ba4e736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
+
+#define N 32
+int foo (int *a, int n)
+{
+  int sum = 1;
+  for (int i = 0; i < 8*N + 4; ++i)
+    sum += a[i];
+  return sum;
+}
+
+/* The reduction epilog should be vectorized and the accumulator
+   re-used.  */
+/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-times "psrl" 2 } } */
+/* { dg-final { scan-assembler-times "padd" 5 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 8c27d75f889..98e2a845629 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
      ones as well.  */
   tree vectype = STMT_VINFO_VECTYPE (reduc_info);
   tree old_vectype = TREE_TYPE (accumulator->reduc_input);
-  if (!useless_type_conversion_p (old_vectype, vectype))
+  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
+			    TYPE_VECTOR_SUBPARTS (vectype)))
     return false;
 
   /* Non-SLP reductions might apply an adjustment after the reduction
@@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
+   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
+
+static tree
+vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
+			    gimple_seq *seq)
+{
+  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
+  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+  tree stype = TREE_TYPE (vectype);
+  tree new_temp = vec_def;
+  while (nunits > nunits1)
+    {
+      nunits /= 2;
+      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+							   stype, nunits);
+      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
+
+      /* The target has to make sure we support lowpart/highpart
+	 extraction, either via direct vector extract or through
+	 an integer mode punning.  */
+      tree dst1, dst2;
+      gimple *epilog_stmt;
+      if (convert_optab_handler (vec_extract_optab,
+				 TYPE_MODE (TREE_TYPE (new_temp)),
+				 TYPE_MODE (vectype1))
+	  != CODE_FOR_nothing)
+	{
+	  /* Extract sub-vectors directly once vec_extract becomes
+	     a conversion optab.  */
+	  dst1 = make_ssa_name (vectype1);
+	  epilog_stmt
+	      = gimple_build_assign (dst1, BIT_FIELD_REF,
+				     build3 (BIT_FIELD_REF, vectype1,
+					     new_temp, TYPE_SIZE (vectype1),
+					     bitsize_int (0)));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	  dst2 =  make_ssa_name (vectype1);
+	  epilog_stmt
+	      = gimple_build_assign (dst2, BIT_FIELD_REF,
+				     build3 (BIT_FIELD_REF, vectype1,
+					     new_temp, TYPE_SIZE (vectype1),
+					     bitsize_int (bitsize)));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	}
+      else
+	{
+	  /* Extract via punning to appropriately sized integer mode
+	     vector.  */
+	  tree eltype = build_nonstandard_integer_type (bitsize, 1);
+	  tree etype = build_vector_type (eltype, 2);
+	  gcc_assert (convert_optab_handler (vec_extract_optab,
+					     TYPE_MODE (etype),
+					     TYPE_MODE (eltype))
+		      != CODE_FOR_nothing);
+	  tree tem = make_ssa_name (etype);
+	  epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
+					     build1 (VIEW_CONVERT_EXPR,
+						     etype, new_temp));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	  new_temp = tem;
+	  tem = make_ssa_name (eltype);
+	  epilog_stmt
+	      = gimple_build_assign (tem, BIT_FIELD_REF,
+				     build3 (BIT_FIELD_REF, eltype,
+					     new_temp, TYPE_SIZE (eltype),
+					     bitsize_int (0)));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	  dst1 = make_ssa_name (vectype1);
+	  epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
+					     build1 (VIEW_CONVERT_EXPR,
+						     vectype1, tem));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	  tem = make_ssa_name (eltype);
+	  epilog_stmt
+	      = gimple_build_assign (tem, BIT_FIELD_REF,
+				     build3 (BIT_FIELD_REF, eltype,
+					     new_temp, TYPE_SIZE (eltype),
+					     bitsize_int (bitsize)));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	  dst2 =  make_ssa_name (vectype1);
+	  epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
+					     build1 (VIEW_CONVERT_EXPR,
+						     vectype1, tem));
+	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+	}
+
+      new_temp = make_ssa_name (vectype1);
+      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
+      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+    }
+
+  return new_temp;
+}
+
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 
       /* First reduce the vector to the desired vector size we should
 	 do shift reduction on by combining upper and lower halves.  */
-      new_temp = reduc_inputs[0];
-      while (nunits > nunits1)
-	{
-	  nunits /= 2;
-	  vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
-							  stype, nunits);
-	  unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
-
-	  /* The target has to make sure we support lowpart/highpart
-	     extraction, either via direct vector extract or through
-	     an integer mode punning.  */
-	  tree dst1, dst2;
-	  if (convert_optab_handler (vec_extract_optab,
-				     TYPE_MODE (TREE_TYPE (new_temp)),
-				     TYPE_MODE (vectype1))
-	      != CODE_FOR_nothing)
-	    {
-	      /* Extract sub-vectors directly once vec_extract becomes
-		 a conversion optab.  */
-	      dst1 = make_ssa_name (vectype1);
-	      epilog_stmt
-		  = gimple_build_assign (dst1, BIT_FIELD_REF,
-					 build3 (BIT_FIELD_REF, vectype1,
-						 new_temp, TYPE_SIZE (vectype1),
-						 bitsize_int (0)));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	      dst2 =  make_ssa_name (vectype1);
-	      epilog_stmt
-		  = gimple_build_assign (dst2, BIT_FIELD_REF,
-					 build3 (BIT_FIELD_REF, vectype1,
-						 new_temp, TYPE_SIZE (vectype1),
-						 bitsize_int (bitsize)));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	    }
-	  else
-	    {
-	      /* Extract via punning to appropriately sized integer mode
-		 vector.  */
-	      tree eltype = build_nonstandard_integer_type (bitsize, 1);
-	      tree etype = build_vector_type (eltype, 2);
-	      gcc_assert (convert_optab_handler (vec_extract_optab,
-						 TYPE_MODE (etype),
-						 TYPE_MODE (eltype))
-			  != CODE_FOR_nothing);
-	      tree tem = make_ssa_name (etype);
-	      epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
-						 build1 (VIEW_CONVERT_EXPR,
-							 etype, new_temp));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	      new_temp = tem;
-	      tem = make_ssa_name (eltype);
-	      epilog_stmt
-		  = gimple_build_assign (tem, BIT_FIELD_REF,
-					 build3 (BIT_FIELD_REF, eltype,
-						 new_temp, TYPE_SIZE (eltype),
-						 bitsize_int (0)));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	      dst1 = make_ssa_name (vectype1);
-	      epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
-						 build1 (VIEW_CONVERT_EXPR,
-							 vectype1, tem));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	      tem = make_ssa_name (eltype);
-	      epilog_stmt
-		  = gimple_build_assign (tem, BIT_FIELD_REF,
-					 build3 (BIT_FIELD_REF, eltype,
-						 new_temp, TYPE_SIZE (eltype),
-						 bitsize_int (bitsize)));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	      dst2 =  make_ssa_name (vectype1);
-	      epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
-						 build1 (VIEW_CONVERT_EXPR,
-							 vectype1, tem));
-	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	    }
-
-	  new_temp = make_ssa_name (vectype1);
-	  epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
-	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	  reduc_inputs[0] = new_temp;
-	}
+      gimple_seq stmts = NULL;
+      new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
+					     code, &stmts);
+      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+      reduc_inputs[0] = new_temp;
 
       if (reduce_with_shift && !slp_reduc)
 	{
@@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
 
   if (auto *accumulator = reduc_info->reused_accumulator)
     {
+      tree def = accumulator->reduc_input;
+      unsigned int nreduc;
+      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
+				      TYPE_VECTOR_SUBPARTS (vectype_out),
+				      &nreduc);
+      gcc_assert (res);
+      if (nreduc != 1)
+	{
+	  /* Reduce the single vector to a smaller one.  */
+	  gimple_seq stmts = NULL;
+	  def = vect_create_partial_epilog (def, vectype_out,
+					    STMT_VINFO_REDUC_CODE (reduc_info),
+					    &stmts);
+	  /* Adjust the input so we pick up the partially reduced value
+	     for the skip edge in vect_create_epilog_for_reduction.  */
+	  accumulator->reduc_input = def;
+	  if (loop_vinfo->main_loop_edge)
+	    {
+	      /* While we'd like to insert on the edge this will split
+		 blocks and disturb bookkeeping, we also will eventually
+		 need this on the skip edge.  Rely on sinking to
+		 fixup optimal placement and insert in the pred.  */
+	      gimple_stmt_iterator gsi
+		= gsi_last_bb (loop_vinfo->main_loop_edge->src);
+	      /* Insert before a cond that eventually skips the
+		 epilogue.  */
+	      if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
+		gsi_prev (&gsi);
+	      gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
+	    }
+	  else
+	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
+					      stmts);
+	}
       if (loop_vinfo->main_loop_edge)
 	vec_initial_defs[0]
-	  = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,
+	  = vect_get_main_loop_result (loop_vinfo, def,
 				       vec_initial_defs[0]);
       else
-	vec_initial_defs.safe_push (accumulator->reduc_input);
-      gcc_assert (vec_initial_defs.length () == 1);
+	vec_initial_defs.safe_push (def);
     }
 
   /* Generate the reduction PHIs upfront.  */
-- 
2.26.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
  2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
@ 2021-07-13 14:17 ` Richard Sandiford
  2021-07-15 12:25 ` Christophe Lyon
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2021-07-13 14:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue.  That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust?  I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).

Yeah.  IMO it'd be a bit cleaner to store the new accumulator directly
in the reduc_info, but I don't feel strongly about it.

Apart from that and a minor nit below, it looks good to me FWIW.

(At some point it'd be good for reduc_info to be its own structure,
separate from stmt_vec_info, so that there's less of a cost associated
with storing more data there.)

Thanks,
Richard

> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13  Richard Biener  <rguenther@suse.de>
>
> 	* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> 	vector types where the old vector type has a multiple of
> 	the new vector type elements.
> 	(vect_create_partial_epilog): New function, split out from...
> 	(vect_create_epilog_for_reduction): ... here.
> 	(vect_transform_cycle_phi): Reduce the re-used accumulator
> 	to the new vector type.
>
> 	* gcc.target/i386/vect-reduc-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
>  gcc/tree-vect-loop.c                         | 223 ++++++++++++-------
>  2 files changed, 155 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 00000000000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +    sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
>       ones as well.  */
>    tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>    tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> +			    TYPE_VECTOR_SUBPARTS (vectype)))
>      return false;
>  
>    /* Non-SLP reductions might apply an adjustment after the reduction

The comment above this needs updating too.

> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
>    return true;
>  }
>  
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
> +			    gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +    {
> +      nunits /= 2;
> +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +							   stype, nunits);
> +      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> +      /* The target has to make sure we support lowpart/highpart
> +	 extraction, either via direct vector extract or through
> +	 an integer mode punning.  */
> +      tree dst1, dst2;
> +      gimple *epilog_stmt;
> +      if (convert_optab_handler (vec_extract_optab,
> +				 TYPE_MODE (TREE_TYPE (new_temp)),
> +				 TYPE_MODE (vectype1))
> +	  != CODE_FOR_nothing)
> +	{
> +	  /* Extract sub-vectors directly once vec_extract becomes
> +	     a conversion optab.  */
> +	  dst1 = make_ssa_name (vectype1);
> +	  epilog_stmt
> +	      = gimple_build_assign (dst1, BIT_FIELD_REF,
> +				     build3 (BIT_FIELD_REF, vectype1,
> +					     new_temp, TYPE_SIZE (vectype1),
> +					     bitsize_int (0)));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	  dst2 =  make_ssa_name (vectype1);
> +	  epilog_stmt
> +	      = gimple_build_assign (dst2, BIT_FIELD_REF,
> +				     build3 (BIT_FIELD_REF, vectype1,
> +					     new_temp, TYPE_SIZE (vectype1),
> +					     bitsize_int (bitsize)));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	}
> +      else
> +	{
> +	  /* Extract via punning to appropriately sized integer mode
> +	     vector.  */
> +	  tree eltype = build_nonstandard_integer_type (bitsize, 1);
> +	  tree etype = build_vector_type (eltype, 2);
> +	  gcc_assert (convert_optab_handler (vec_extract_optab,
> +					     TYPE_MODE (etype),
> +					     TYPE_MODE (eltype))
> +		      != CODE_FOR_nothing);
> +	  tree tem = make_ssa_name (etype);
> +	  epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> +					     build1 (VIEW_CONVERT_EXPR,
> +						     etype, new_temp));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	  new_temp = tem;
> +	  tem = make_ssa_name (eltype);
> +	  epilog_stmt
> +	      = gimple_build_assign (tem, BIT_FIELD_REF,
> +				     build3 (BIT_FIELD_REF, eltype,
> +					     new_temp, TYPE_SIZE (eltype),
> +					     bitsize_int (0)));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	  dst1 = make_ssa_name (vectype1);
> +	  epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> +					     build1 (VIEW_CONVERT_EXPR,
> +						     vectype1, tem));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	  tem = make_ssa_name (eltype);
> +	  epilog_stmt
> +	      = gimple_build_assign (tem, BIT_FIELD_REF,
> +				     build3 (BIT_FIELD_REF, eltype,
> +					     new_temp, TYPE_SIZE (eltype),
> +					     bitsize_int (bitsize)));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	  dst2 =  make_ssa_name (vectype1);
> +	  epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> +					     build1 (VIEW_CONVERT_EXPR,
> +						     vectype1, tem));
> +	  gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +	}
> +
> +      new_temp = make_ssa_name (vectype1);
> +      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> +      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +    }
> +
> +  return new_temp;
> +}
> +
>  /* Function vect_create_epilog_for_reduction
>  
>     Create code at the loop-epilog to finalize the result of a reduction
> @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  
>        /* First reduce the vector to the desired vector size we should
>  	 do shift reduction on by combining upper and lower halves.  */
> -      new_temp = reduc_inputs[0];
> -      while (nunits > nunits1)
> -	{
> -	  nunits /= 2;
> -	  vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> -							  stype, nunits);
> -	  unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> -
> -	  /* The target has to make sure we support lowpart/highpart
> -	     extraction, either via direct vector extract or through
> -	     an integer mode punning.  */
> -	  tree dst1, dst2;
> -	  if (convert_optab_handler (vec_extract_optab,
> -				     TYPE_MODE (TREE_TYPE (new_temp)),
> -				     TYPE_MODE (vectype1))
> -	      != CODE_FOR_nothing)
> -	    {
> -	      /* Extract sub-vectors directly once vec_extract becomes
> -		 a conversion optab.  */
> -	      dst1 = make_ssa_name (vectype1);
> -	      epilog_stmt
> -		  = gimple_build_assign (dst1, BIT_FIELD_REF,
> -					 build3 (BIT_FIELD_REF, vectype1,
> -						 new_temp, TYPE_SIZE (vectype1),
> -						 bitsize_int (0)));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	      dst2 =  make_ssa_name (vectype1);
> -	      epilog_stmt
> -		  = gimple_build_assign (dst2, BIT_FIELD_REF,
> -					 build3 (BIT_FIELD_REF, vectype1,
> -						 new_temp, TYPE_SIZE (vectype1),
> -						 bitsize_int (bitsize)));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	    }
> -	  else
> -	    {
> -	      /* Extract via punning to appropriately sized integer mode
> -		 vector.  */
> -	      tree eltype = build_nonstandard_integer_type (bitsize, 1);
> -	      tree etype = build_vector_type (eltype, 2);
> -	      gcc_assert (convert_optab_handler (vec_extract_optab,
> -						 TYPE_MODE (etype),
> -						 TYPE_MODE (eltype))
> -			  != CODE_FOR_nothing);
> -	      tree tem = make_ssa_name (etype);
> -	      epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> -						 build1 (VIEW_CONVERT_EXPR,
> -							 etype, new_temp));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	      new_temp = tem;
> -	      tem = make_ssa_name (eltype);
> -	      epilog_stmt
> -		  = gimple_build_assign (tem, BIT_FIELD_REF,
> -					 build3 (BIT_FIELD_REF, eltype,
> -						 new_temp, TYPE_SIZE (eltype),
> -						 bitsize_int (0)));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	      dst1 = make_ssa_name (vectype1);
> -	      epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> -						 build1 (VIEW_CONVERT_EXPR,
> -							 vectype1, tem));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	      tem = make_ssa_name (eltype);
> -	      epilog_stmt
> -		  = gimple_build_assign (tem, BIT_FIELD_REF,
> -					 build3 (BIT_FIELD_REF, eltype,
> -						 new_temp, TYPE_SIZE (eltype),
> -						 bitsize_int (bitsize)));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	      dst2 =  make_ssa_name (vectype1);
> -	      epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> -						 build1 (VIEW_CONVERT_EXPR,
> -							 vectype1, tem));
> -	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	    }
> -
> -	  new_temp = make_ssa_name (vectype1);
> -	  epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> -	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -	  reduc_inputs[0] = new_temp;
> -	}
> +      gimple_seq stmts = NULL;
> +      new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> +					     code, &stmts);
> +      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> +      reduc_inputs[0] = new_temp;
>  
>        if (reduce_with_shift && !slp_reduc)
>  	{
> @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>  
>    if (auto *accumulator = reduc_info->reused_accumulator)
>      {
> +      tree def = accumulator->reduc_input;
> +      unsigned int nreduc;
> +      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
> +				      TYPE_VECTOR_SUBPARTS (vectype_out),
> +				      &nreduc);
> +      gcc_assert (res);
> +      if (nreduc != 1)
> +	{
> +	  /* Reduce the single vector to a smaller one.  */
> +	  gimple_seq stmts = NULL;
> +	  def = vect_create_partial_epilog (def, vectype_out,
> +					    STMT_VINFO_REDUC_CODE (reduc_info),
> +					    &stmts);
> +	  /* Adjust the input so we pick up the partially reduced value
> +	     for the skip edge in vect_create_epilog_for_reduction.  */
> +	  accumulator->reduc_input = def;
> +	  if (loop_vinfo->main_loop_edge)
> +	    {
> +	      /* While we'd like to insert on the edge this will split
> +		 blocks and disturb bookkeeping, we also will eventually
> +		 need this on the skip edge.  Rely on sinking to
> +		 fixup optimal placement and insert in the pred.  */
> +	      gimple_stmt_iterator gsi
> +		= gsi_last_bb (loop_vinfo->main_loop_edge->src);
> +	      /* Insert before a cond that eventually skips the
> +		 epilogue.  */
> +	      if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> +		gsi_prev (&gsi);
> +	      gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> +	    }
> +	  else
> +	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> +					      stmts);
> +	}
>        if (loop_vinfo->main_loop_edge)
>  	vec_initial_defs[0]
> -	  = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,
> +	  = vect_get_main_loop_result (loop_vinfo, def,
>  				       vec_initial_defs[0]);
>        else
> -	vec_initial_defs.safe_push (accumulator->reduc_input);
> -      gcc_assert (vec_initial_defs.length () == 1);
> +	vec_initial_defs.safe_push (def);
>      }
>  
>    /* Generate the reduction PHIs upfront.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
  2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
  2021-07-13 14:17 ` Richard Sandiford
@ 2021-07-15 12:25 ` Christophe Lyon
  2021-07-15 12:34   ` Richard Biener
  1 sibling, 1 reply; 5+ messages in thread
From: Christophe Lyon @ 2021-07-15 12:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford

Hi,



On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote:

> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue.  That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust?  I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).
>
> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13  Richard Biener  <rguenther@suse.de>
>
>         * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
>         vector types where the old vector type has a multiple of
>         the new vector type elements.
>         (vect_create_partial_epilog): New function, split out from...
>         (vect_create_epilog_for_reduction): ... here.
>         (vect_transform_cycle_phi): Reduce the re-used accumulator
>         to the new vector type.
>
>         * gcc.target/i386/vect-reduc-1.c: New testcase.
>

This patch is causing regressions on aarch64:
 FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
'PHI' argument 1
vector(2) unsigned int
vector(2) int
_91 = PHI <_90(17), _83(11)>
during GIMPLE pass: vect
dump file: ./pr92324-4.c.167t.vect
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
verify_gimple failed
0xe6438e verify_gimple_in_cfg(function*, bool)
        /gcc/tree-cfg.c:5535
0xd13902 execute_function_todo
        /gcc/passes.c:2042
0xd142a5 execute_todo
        /gcc/passes.c:2096

FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

Thanks,

Christophe



> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
>  gcc/tree-vect-loop.c                         | 223 ++++++++++++-------
>  2 files changed, 155 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 00000000000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +    sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>       ones as well.  */
>    tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>    tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> +                           TYPE_VECTOR_SUBPARTS (vectype)))
>      return false;
>
>    /* Non-SLP reductions might apply an adjustment after the reduction
> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>    return true;
>  }
>
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> code,
> +                           gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (vec_def)).to_constant ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +    {
> +      nunits /= 2;
> +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> (vectype),
> +                                                          stype, nunits);
> +      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> +      /* The target has to make sure we support lowpart/highpart
> +        extraction, either via direct vector extract or through
> +        an integer mode punning.  */
> +      tree dst1, dst2;
> +      gimple *epilog_stmt;
> +      if (convert_optab_handler (vec_extract_optab,
> +                                TYPE_MODE (TREE_TYPE (new_temp)),
> +                                TYPE_MODE (vectype1))
> +         != CODE_FOR_nothing)
> +       {
> +         /* Extract sub-vectors directly once vec_extract becomes
> +            a conversion optab.  */
> +         dst1 = make_ssa_name (vectype1);
> +         epilog_stmt
> +             = gimple_build_assign (dst1, BIT_FIELD_REF,
> +                                    build3 (BIT_FIELD_REF, vectype1,
> +                                            new_temp, TYPE_SIZE
> (vectype1),
> +                                            bitsize_int (0)));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +         dst2 =  make_ssa_name (vectype1);
> +         epilog_stmt
> +             = gimple_build_assign (dst2, BIT_FIELD_REF,
> +                                    build3 (BIT_FIELD_REF, vectype1,
> +                                            new_temp, TYPE_SIZE
> (vectype1),
> +                                            bitsize_int (bitsize)));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       }
> +      else
> +       {
> +         /* Extract via punning to appropriately sized integer mode
> +            vector.  */
> +         tree eltype = build_nonstandard_integer_type (bitsize, 1);
> +         tree etype = build_vector_type (eltype, 2);
> +         gcc_assert (convert_optab_handler (vec_extract_optab,
> +                                            TYPE_MODE (etype),
> +                                            TYPE_MODE (eltype))
> +                     != CODE_FOR_nothing);
> +         tree tem = make_ssa_name (etype);
> +         epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> +                                            build1 (VIEW_CONVERT_EXPR,
> +                                                    etype, new_temp));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +         new_temp = tem;
> +         tem = make_ssa_name (eltype);
> +         epilog_stmt
> +             = gimple_build_assign (tem, BIT_FIELD_REF,
> +                                    build3 (BIT_FIELD_REF, eltype,
> +                                            new_temp, TYPE_SIZE (eltype),
> +                                            bitsize_int (0)));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +         dst1 = make_ssa_name (vectype1);
> +         epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> +                                            build1 (VIEW_CONVERT_EXPR,
> +                                                    vectype1, tem));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +         tem = make_ssa_name (eltype);
> +         epilog_stmt
> +             = gimple_build_assign (tem, BIT_FIELD_REF,
> +                                    build3 (BIT_FIELD_REF, eltype,
> +                                            new_temp, TYPE_SIZE (eltype),
> +                                            bitsize_int (bitsize)));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +         dst2 =  make_ssa_name (vectype1);
> +         epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> +                                            build1 (VIEW_CONVERT_EXPR,
> +                                                    vectype1, tem));
> +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +       }
> +
> +      new_temp = make_ssa_name (vectype1);
> +      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> +      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> +    }
> +
> +  return new_temp;
> +}
> +
>  /* Function vect_create_epilog_for_reduction
>
>     Create code at the loop-epilog to finalize the result of a reduction
> @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
>
>        /* First reduce the vector to the desired vector size we should
>          do shift reduction on by combining upper and lower halves.  */
> -      new_temp = reduc_inputs[0];
> -      while (nunits > nunits1)
> -       {
> -         nunits /= 2;
> -         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> (vectype),
> -                                                         stype, nunits);
> -         unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> -
> -         /* The target has to make sure we support lowpart/highpart
> -            extraction, either via direct vector extract or through
> -            an integer mode punning.  */
> -         tree dst1, dst2;
> -         if (convert_optab_handler (vec_extract_optab,
> -                                    TYPE_MODE (TREE_TYPE (new_temp)),
> -                                    TYPE_MODE (vectype1))
> -             != CODE_FOR_nothing)
> -           {
> -             /* Extract sub-vectors directly once vec_extract becomes
> -                a conversion optab.  */
> -             dst1 = make_ssa_name (vectype1);
> -             epilog_stmt
> -                 = gimple_build_assign (dst1, BIT_FIELD_REF,
> -                                        build3 (BIT_FIELD_REF, vectype1,
> -                                                new_temp, TYPE_SIZE
> (vectype1),
> -                                                bitsize_int (0)));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -             dst2 =  make_ssa_name (vectype1);
> -             epilog_stmt
> -                 = gimple_build_assign (dst2, BIT_FIELD_REF,
> -                                        build3 (BIT_FIELD_REF, vectype1,
> -                                                new_temp, TYPE_SIZE
> (vectype1),
> -                                                bitsize_int (bitsize)));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           }
> -         else
> -           {
> -             /* Extract via punning to appropriately sized integer mode
> -                vector.  */
> -             tree eltype = build_nonstandard_integer_type (bitsize, 1);
> -             tree etype = build_vector_type (eltype, 2);
> -             gcc_assert (convert_optab_handler (vec_extract_optab,
> -                                                TYPE_MODE (etype),
> -                                                TYPE_MODE (eltype))
> -                         != CODE_FOR_nothing);
> -             tree tem = make_ssa_name (etype);
> -             epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> -                                                build1 (VIEW_CONVERT_EXPR,
> -                                                        etype, new_temp));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -             new_temp = tem;
> -             tem = make_ssa_name (eltype);
> -             epilog_stmt
> -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> -                                        build3 (BIT_FIELD_REF, eltype,
> -                                                new_temp, TYPE_SIZE
> (eltype),
> -                                                bitsize_int (0)));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -             dst1 = make_ssa_name (vectype1);
> -             epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> -                                                build1 (VIEW_CONVERT_EXPR,
> -                                                        vectype1, tem));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -             tem = make_ssa_name (eltype);
> -             epilog_stmt
> -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> -                                        build3 (BIT_FIELD_REF, eltype,
> -                                                new_temp, TYPE_SIZE
> (eltype),
> -                                                bitsize_int (bitsize)));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -             dst2 =  make_ssa_name (vectype1);
> -             epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> -                                                build1 (VIEW_CONVERT_EXPR,
> -                                                        vectype1, tem));
> -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -           }
> -
> -         new_temp = make_ssa_name (vectype1);
> -         epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> -         gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> -         reduc_inputs[0] = new_temp;
> -       }
> +      gimple_seq stmts = NULL;
> +      new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> +                                            code, &stmts);
> +      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> +      reduc_inputs[0] = new_temp;
>
>        if (reduce_with_shift && !slp_reduc)
>         {
> @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
>
>    if (auto *accumulator = reduc_info->reused_accumulator)
>      {
> +      tree def = accumulator->reduc_input;
> +      unsigned int nreduc;
> +      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (def)),
> +                                     TYPE_VECTOR_SUBPARTS (vectype_out),
> +                                     &nreduc);
> +      gcc_assert (res);
> +      if (nreduc != 1)
> +       {
> +         /* Reduce the single vector to a smaller one.  */
> +         gimple_seq stmts = NULL;
> +         def = vect_create_partial_epilog (def, vectype_out,
> +                                           STMT_VINFO_REDUC_CODE
> (reduc_info),
> +                                           &stmts);
> +         /* Adjust the input so we pick up the partially reduced value
> +            for the skip edge in vect_create_epilog_for_reduction.  */
> +         accumulator->reduc_input = def;
> +         if (loop_vinfo->main_loop_edge)
> +           {
> +             /* While we'd like to insert on the edge this will split
> +                blocks and disturb bookkeeping, we also will eventually
> +                need this on the skip edge.  Rely on sinking to
> +                fixup optimal placement and insert in the pred.  */
> +             gimple_stmt_iterator gsi
> +               = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> +             /* Insert before a cond that eventually skips the
> +                epilogue.  */
> +             if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> +               gsi_prev (&gsi);
> +             gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> +           }
> +         else
> +           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> +                                             stmts);
> +       }
>        if (loop_vinfo->main_loop_edge)
>         vec_initial_defs[0]
> -         = vect_get_main_loop_result (loop_vinfo,
> accumulator->reduc_input,
> +         = vect_get_main_loop_result (loop_vinfo, def,
>                                        vec_initial_defs[0]);
>        else
> -       vec_initial_defs.safe_push (accumulator->reduc_input);
> -      gcc_assert (vec_initial_defs.length () == 1);
> +       vec_initial_defs.safe_push (def);
>      }
>
>    /* Generate the reduction PHIs upfront.  */
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
  2021-07-15 12:25 ` Christophe Lyon
@ 2021-07-15 12:34   ` Richard Biener
  2021-07-15 15:19     ` Christophe Lyon
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2021-07-15 12:34 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches, Richard Sandiford

On Thu, 15 Jul 2021, Christophe Lyon wrote:

> Hi,
> 
> 
> 
> On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de> wrote:
> 
> > The following adds support for re-using the vector reduction def
> > from the main loop in vectorized epilogue loops on architectures
> > which use different vector sizes for the epilogue.  That's only
> > x86 as far as I am aware.
> >
> > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > regtest in progress.
> >
> > There's costing issues on x86 which usually prevent vectorizing
> > an epilogue with a reduction, at least for loops that only
> > have a reduction - it could be mitigated by not accounting for
> > the epilogue there if we can compute that we can re-use the
> > main loops cost.
> >
> > Richard - did I figure the correct place to adjust?  I guess
> > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > for re-use by the skip code in vect_create_epilog_for_reduction
> > is a bit awkward but at least we're conciously doing
> > vect_create_epilog_for_reduction last (via vectorizing live
> > operations).
> >
> > OK in the unlikely case all testing succeeds (I also want to
> > run it through SPEC with/without -fno-vect-cost-model which
> > will take some time)?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-13  Richard Biener  <rguenther@suse.de>
> >
> >         * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> >         vector types where the old vector type has a multiple of
> >         the new vector type elements.
> >         (vect_create_partial_epilog): New function, split out from...
> >         (vect_create_epilog_for_reduction): ... here.
> >         (vect_transform_cycle_phi): Reduce the re-used accumulator
> >         to the new vector type.
> >
> >         * gcc.target/i386/vect-reduc-1.c: New testcase.
> >
> 
> This patch is causing regressions on aarch64:
>  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> 'PHI' argument 1
> vector(2) unsigned int
> vector(2) int
> _91 = PHI <_90(17), _83(11)>
> during GIMPLE pass: vect
> dump file: ./pr92324-4.c.167t.vect
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> verify_gimple failed
> 0xe6438e verify_gimple_in_cfg(function*, bool)
>         /gcc/tree-cfg.c:5535
> 0xd13902 execute_function_todo
>         /gcc/passes.c:2042
> 0xd142a5 execute_todo
>         /gcc/passes.c:2096
> 
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

What exact options do you pass to cc1 to get this?  Can you track this
in a PR please?

Thanks,
Richard.

> Thanks,
> 
> Christophe
> 
> 
> 
> > ---
> >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> >  gcc/tree-vect-loop.c                         | 223 ++++++++++++-------
> >  2 files changed, 155 insertions(+), 85 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > new file mode 100644
> > index 00000000000..9ee9ba4e736
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> > +
> > +#define N 32
> > +int foo (int *a, int n)
> > +{
> > +  int sum = 1;
> > +  for (int i = 0; i < 8*N + 4; ++i)
> > +    sum += a[i];
> > +  return sum;
> > +}
> > +
> > +/* The reduction epilog should be vectorized and the accumulator
> > +   re-used.  */
> > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index 8c27d75f889..98e2a845629 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >       ones as well.  */
> >    tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> >    tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > +                           TYPE_VECTOR_SUBPARTS (vectype)))
> >      return false;
> >
> >    /* Non-SLP reductions might apply an adjustment after the reduction
> > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >    return true;
> >  }
> >
> > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> > +
> > +static tree
> > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> > code,
> > +                           gimple_seq *seq)
> > +{
> > +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > (vec_def)).to_constant ();
> > +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> > +  tree stype = TREE_TYPE (vectype);
> > +  tree new_temp = vec_def;
> > +  while (nunits > nunits1)
> > +    {
> > +      nunits /= 2;
> > +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > (vectype),
> > +                                                          stype, nunits);
> > +      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > +
> > +      /* The target has to make sure we support lowpart/highpart
> > +        extraction, either via direct vector extract or through
> > +        an integer mode punning.  */
> > +      tree dst1, dst2;
> > +      gimple *epilog_stmt;
> > +      if (convert_optab_handler (vec_extract_optab,
> > +                                TYPE_MODE (TREE_TYPE (new_temp)),
> > +                                TYPE_MODE (vectype1))
> > +         != CODE_FOR_nothing)
> > +       {
> > +         /* Extract sub-vectors directly once vec_extract becomes
> > +            a conversion optab.  */
> > +         dst1 = make_ssa_name (vectype1);
> > +         epilog_stmt
> > +             = gimple_build_assign (dst1, BIT_FIELD_REF,
> > +                                    build3 (BIT_FIELD_REF, vectype1,
> > +                                            new_temp, TYPE_SIZE
> > (vectype1),
> > +                                            bitsize_int (0)));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +         dst2 =  make_ssa_name (vectype1);
> > +         epilog_stmt
> > +             = gimple_build_assign (dst2, BIT_FIELD_REF,
> > +                                    build3 (BIT_FIELD_REF, vectype1,
> > +                                            new_temp, TYPE_SIZE
> > (vectype1),
> > +                                            bitsize_int (bitsize)));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +       }
> > +      else
> > +       {
> > +         /* Extract via punning to appropriately sized integer mode
> > +            vector.  */
> > +         tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > +         tree etype = build_vector_type (eltype, 2);
> > +         gcc_assert (convert_optab_handler (vec_extract_optab,
> > +                                            TYPE_MODE (etype),
> > +                                            TYPE_MODE (eltype))
> > +                     != CODE_FOR_nothing);
> > +         tree tem = make_ssa_name (etype);
> > +         epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > +                                            build1 (VIEW_CONVERT_EXPR,
> > +                                                    etype, new_temp));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +         new_temp = tem;
> > +         tem = make_ssa_name (eltype);
> > +         epilog_stmt
> > +             = gimple_build_assign (tem, BIT_FIELD_REF,
> > +                                    build3 (BIT_FIELD_REF, eltype,
> > +                                            new_temp, TYPE_SIZE (eltype),
> > +                                            bitsize_int (0)));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +         dst1 = make_ssa_name (vectype1);
> > +         epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > +                                            build1 (VIEW_CONVERT_EXPR,
> > +                                                    vectype1, tem));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +         tem = make_ssa_name (eltype);
> > +         epilog_stmt
> > +             = gimple_build_assign (tem, BIT_FIELD_REF,
> > +                                    build3 (BIT_FIELD_REF, eltype,
> > +                                            new_temp, TYPE_SIZE (eltype),
> > +                                            bitsize_int (bitsize)));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +         dst2 =  make_ssa_name (vectype1);
> > +         epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > +                                            build1 (VIEW_CONVERT_EXPR,
> > +                                                    vectype1, tem));
> > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +       }
> > +
> > +      new_temp = make_ssa_name (vectype1);
> > +      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > +      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > +    }
> > +
> > +  return new_temp;
> > +}
> > +
> >  /* Function vect_create_epilog_for_reduction
> >
> >     Create code at the loop-epilog to finalize the result of a reduction
> > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> >
> >        /* First reduce the vector to the desired vector size we should
> >          do shift reduction on by combining upper and lower halves.  */
> > -      new_temp = reduc_inputs[0];
> > -      while (nunits > nunits1)
> > -       {
> > -         nunits /= 2;
> > -         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > (vectype),
> > -                                                         stype, nunits);
> > -         unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > -
> > -         /* The target has to make sure we support lowpart/highpart
> > -            extraction, either via direct vector extract or through
> > -            an integer mode punning.  */
> > -         tree dst1, dst2;
> > -         if (convert_optab_handler (vec_extract_optab,
> > -                                    TYPE_MODE (TREE_TYPE (new_temp)),
> > -                                    TYPE_MODE (vectype1))
> > -             != CODE_FOR_nothing)
> > -           {
> > -             /* Extract sub-vectors directly once vec_extract becomes
> > -                a conversion optab.  */
> > -             dst1 = make_ssa_name (vectype1);
> > -             epilog_stmt
> > -                 = gimple_build_assign (dst1, BIT_FIELD_REF,
> > -                                        build3 (BIT_FIELD_REF, vectype1,
> > -                                                new_temp, TYPE_SIZE
> > (vectype1),
> > -                                                bitsize_int (0)));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -             dst2 =  make_ssa_name (vectype1);
> > -             epilog_stmt
> > -                 = gimple_build_assign (dst2, BIT_FIELD_REF,
> > -                                        build3 (BIT_FIELD_REF, vectype1,
> > -                                                new_temp, TYPE_SIZE
> > (vectype1),
> > -                                                bitsize_int (bitsize)));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -           }
> > -         else
> > -           {
> > -             /* Extract via punning to appropriately sized integer mode
> > -                vector.  */
> > -             tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > -             tree etype = build_vector_type (eltype, 2);
> > -             gcc_assert (convert_optab_handler (vec_extract_optab,
> > -                                                TYPE_MODE (etype),
> > -                                                TYPE_MODE (eltype))
> > -                         != CODE_FOR_nothing);
> > -             tree tem = make_ssa_name (etype);
> > -             epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > -                                                build1 (VIEW_CONVERT_EXPR,
> > -                                                        etype, new_temp));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -             new_temp = tem;
> > -             tem = make_ssa_name (eltype);
> > -             epilog_stmt
> > -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> > -                                        build3 (BIT_FIELD_REF, eltype,
> > -                                                new_temp, TYPE_SIZE
> > (eltype),
> > -                                                bitsize_int (0)));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -             dst1 = make_ssa_name (vectype1);
> > -             epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > -                                                build1 (VIEW_CONVERT_EXPR,
> > -                                                        vectype1, tem));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -             tem = make_ssa_name (eltype);
> > -             epilog_stmt
> > -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> > -                                        build3 (BIT_FIELD_REF, eltype,
> > -                                                new_temp, TYPE_SIZE
> > (eltype),
> > -                                                bitsize_int (bitsize)));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -             dst2 =  make_ssa_name (vectype1);
> > -             epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > -                                                build1 (VIEW_CONVERT_EXPR,
> > -                                                        vectype1, tem));
> > -             gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -           }
> > -
> > -         new_temp = make_ssa_name (vectype1);
> > -         epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > -         gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > -         reduc_inputs[0] = new_temp;
> > -       }
> > +      gimple_seq stmts = NULL;
> > +      new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,
> > +                                            code, &stmts);
> > +      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > +      reduc_inputs[0] = new_temp;
> >
> >        if (reduce_with_shift && !slp_reduc)
> >         {
> > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
> >
> >    if (auto *accumulator = reduc_info->reused_accumulator)
> >      {
> > +      tree def = accumulator->reduc_input;
> > +      unsigned int nreduc;
> > +      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > (def)),
> > +                                     TYPE_VECTOR_SUBPARTS (vectype_out),
> > +                                     &nreduc);
> > +      gcc_assert (res);
> > +      if (nreduc != 1)
> > +       {
> > +         /* Reduce the single vector to a smaller one.  */
> > +         gimple_seq stmts = NULL;
> > +         def = vect_create_partial_epilog (def, vectype_out,
> > +                                           STMT_VINFO_REDUC_CODE
> > (reduc_info),
> > +                                           &stmts);
> > +         /* Adjust the input so we pick up the partially reduced value
> > +            for the skip edge in vect_create_epilog_for_reduction.  */
> > +         accumulator->reduc_input = def;
> > +         if (loop_vinfo->main_loop_edge)
> > +           {
> > +             /* While we'd like to insert on the edge this will split
> > +                blocks and disturb bookkeeping, we also will eventually
> > +                need this on the skip edge.  Rely on sinking to
> > +                fixup optimal placement and insert in the pred.  */
> > +             gimple_stmt_iterator gsi
> > +               = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> > +             /* Insert before a cond that eventually skips the
> > +                epilogue.  */
> > +             if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> > +               gsi_prev (&gsi);
> > +             gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> > +           }
> > +         else
> > +           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
> > +                                             stmts);
> > +       }
> >        if (loop_vinfo->main_loop_edge)
> >         vec_initial_defs[0]
> > -         = vect_get_main_loop_result (loop_vinfo,
> > accumulator->reduc_input,
> > +         = vect_get_main_loop_result (loop_vinfo, def,
> >                                        vec_initial_defs[0]);
> >        else
> > -       vec_initial_defs.safe_push (accumulator->reduc_input);
> > -      gcc_assert (vec_initial_defs.length () == 1);
> > +       vec_initial_defs.safe_push (def);
> >      }
> >
> >    /* Generate the reduction PHIs upfront.  */
> > --
> > 2.26.2
> >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Support reduction def re-use for epilogue with different vector size
  2021-07-15 12:34   ` Richard Biener
@ 2021-07-15 15:19     ` Christophe Lyon
  0 siblings, 0 replies; 5+ messages in thread
From: Christophe Lyon @ 2021-07-15 15:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford

On Thu, Jul 15, 2021 at 2:34 PM Richard Biener <rguenther@suse.de> wrote:

> On Thu, 15 Jul 2021, Christophe Lyon wrote:
>
> > Hi,
> >
> >
> >
> > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener <rguenther@suse.de>
> wrote:
> >
> > > The following adds support for re-using the vector reduction def
> > > from the main loop in vectorized epilogue loops on architectures
> > > which use different vector sizes for the epilogue.  That's only
> > > x86 as far as I am aware.
> > >
> > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > > regtest in progress.
> > >
> > > There's costing issues on x86 which usually prevent vectorizing
> > > an epilogue with a reduction, at least for loops that only
> > > have a reduction - it could be mitigated by not accounting for
> > > the epilogue there if we can compute that we can re-use the
> > > main loops cost.
> > >
> > > Richard - did I figure the correct place to adjust?  I guess
> > > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > > for re-use by the skip code in vect_create_epilog_for_reduction
> > > is a bit awkward but at least we're conciously doing
> > > vect_create_epilog_for_reduction last (via vectorizing live
> > > operations).
> > >
> > > OK in the unlikely case all testing succeeds (I also want to
> > > run it through SPEC with/without -fno-vect-cost-model which
> > > will take some time)?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-13  Richard Biener  <rguenther@suse.de>
> > >
> > >         * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > >         vector types where the old vector type has a multiple of
> > >         the new vector type elements.
> > >         (vect_create_partial_epilog): New function, split out from...
> > >         (vect_create_epilog_for_reduction): ... here.
> > >         (vect_transform_cycle_phi): Reduce the re-used accumulator
> > >         to the new vector type.
> > >
> > >         * gcc.target/i386/vect-reduc-1.c: New testcase.
> > >
> >
> > This patch is causing regressions on aarch64:
> >  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> > Excess errors:
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> > 'PHI' argument 1
> > vector(2) unsigned int
> > vector(2) int
> > _91 = PHI <_90(17), _83(11)>
> > during GIMPLE pass: vect
> > dump file: ./pr92324-4.c.167t.vect
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> > verify_gimple failed
> > 0xe6438e verify_gimple_in_cfg(function*, bool)
> >         /gcc/tree-cfg.c:5535
> > 0xd13902 execute_function_todo
> >         /gcc/passes.c:2042
> > 0xd142a5 execute_todo
> >         /gcc/passes.c:2096
> >
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fminnmv
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fmaxnmv
>
> What exact options do you pass to cc1 to get this?  Can you track this
> in a PR please?
>
> Thanks,
> Richard.
>
>
Sure, I filed PR 101462

Christophe


> > Thanks,
> >
> > Christophe
> >
> >
> >
> > > ---
> > >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> > >  gcc/tree-vect-loop.c                         | 223 ++++++++++++-------
> > >  2 files changed, 155 insertions(+), 85 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > new file mode 100644
> > > index 00000000000..9ee9ba4e736
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" }
> */
> > > +
> > > +#define N 32
> > > +int foo (int *a, int n)
> > > +{
> > > +  int sum = 1;
> > > +  for (int i = 0; i < 8*N + 4; ++i)
> > > +    sum += a[i];
> > > +  return sum;
> > > +}
> > > +
> > > +/* The reduction epilog should be vectorized and the accumulator
> > > +   re-used.  */
> > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } }
> */
> > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index 8c27d75f889..98e2a845629 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > >       ones as well.  */
> > >    tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> > >    tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > > +                           TYPE_VECTOR_SUBPARTS (vectype)))
> > >      return false;
> > >
> > >    /* Non-SLP reductions might apply an adjustment after the reduction
> > > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > >    return true;
> > >  }
> > >
> > > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > > +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.
> */
> > > +
> > > +static tree
> > > +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> > > code,
> > > +                           gimple_seq *seq)
> > > +{
> > > +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > > (vec_def)).to_constant ();
> > > +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> > > +  tree stype = TREE_TYPE (vectype);
> > > +  tree new_temp = vec_def;
> > > +  while (nunits > nunits1)
> > > +    {
> > > +      nunits /= 2;
> > > +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > > (vectype),
> > > +                                                          stype,
> nunits);
> > > +      unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > > +
> > > +      /* The target has to make sure we support lowpart/highpart
> > > +        extraction, either via direct vector extract or through
> > > +        an integer mode punning.  */
> > > +      tree dst1, dst2;
> > > +      gimple *epilog_stmt;
> > > +      if (convert_optab_handler (vec_extract_optab,
> > > +                                TYPE_MODE (TREE_TYPE (new_temp)),
> > > +                                TYPE_MODE (vectype1))
> > > +         != CODE_FOR_nothing)
> > > +       {
> > > +         /* Extract sub-vectors directly once vec_extract becomes
> > > +            a conversion optab.  */
> > > +         dst1 = make_ssa_name (vectype1);
> > > +         epilog_stmt
> > > +             = gimple_build_assign (dst1, BIT_FIELD_REF,
> > > +                                    build3 (BIT_FIELD_REF, vectype1,
> > > +                                            new_temp, TYPE_SIZE
> > > (vectype1),
> > > +                                            bitsize_int (0)));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +         dst2 =  make_ssa_name (vectype1);
> > > +         epilog_stmt
> > > +             = gimple_build_assign (dst2, BIT_FIELD_REF,
> > > +                                    build3 (BIT_FIELD_REF, vectype1,
> > > +                                            new_temp, TYPE_SIZE
> > > (vectype1),
> > > +                                            bitsize_int (bitsize)));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +       }
> > > +      else
> > > +       {
> > > +         /* Extract via punning to appropriately sized integer mode
> > > +            vector.  */
> > > +         tree eltype = build_nonstandard_integer_type (bitsize, 1);
> > > +         tree etype = build_vector_type (eltype, 2);
> > > +         gcc_assert (convert_optab_handler (vec_extract_optab,
> > > +                                            TYPE_MODE (etype),
> > > +                                            TYPE_MODE (eltype))
> > > +                     != CODE_FOR_nothing);
> > > +         tree tem = make_ssa_name (etype);
> > > +         epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,
> > > +                                            build1 (VIEW_CONVERT_EXPR,
> > > +                                                    etype, new_temp));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +         new_temp = tem;
> > > +         tem = make_ssa_name (eltype);
> > > +         epilog_stmt
> > > +             = gimple_build_assign (tem, BIT_FIELD_REF,
> > > +                                    build3 (BIT_FIELD_REF, eltype,
> > > +                                            new_temp, TYPE_SIZE
> (eltype),
> > > +                                            bitsize_int (0)));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +         dst1 = make_ssa_name (vectype1);
> > > +         epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,
> > > +                                            build1 (VIEW_CONVERT_EXPR,
> > > +                                                    vectype1, tem));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +         tem = make_ssa_name (eltype);
> > > +         epilog_stmt
> > > +             = gimple_build_assign (tem, BIT_FIELD_REF,
> > > +                                    build3 (BIT_FIELD_REF, eltype,
> > > +                                            new_temp, TYPE_SIZE
> (eltype),
> > > +                                            bitsize_int (bitsize)));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +         dst2 =  make_ssa_name (vectype1);
> > > +         epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> > > +                                            build1 (VIEW_CONVERT_EXPR,
> > > +                                                    vectype1, tem));
> > > +         gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +       }
> > > +
> > > +      new_temp = make_ssa_name (vectype1);
> > > +      epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);
> > > +      gimple_seq_add_stmt_without_update (seq, epilog_stmt);
> > > +    }
> > > +
> > > +  return new_temp;
> > > +}
> > > +
> > >  /* Function vect_create_epilog_for_reduction
> > >
> > >     Create code at the loop-epilog to finalize the result of a
> reduction
> > > @@ -5684,87 +5780,11 @@ vect_create_epilog_for_reduction (loop_vec_info
> > > loop_vinfo,
> > >
> > >        /* First reduce the vector to the desired vector size we should
> > >          do shift reduction on by combining upper and lower halves.  */
> > > -      new_temp = reduc_inputs[0];
> > > -      while (nunits > nunits1)
> > > -       {
> > > -         nunits /= 2;
> > > -         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE
> > > (vectype),
> > > -                                                         stype,
> nunits);
> > > -         unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> > > -
> > > -         /* The target has to make sure we support lowpart/highpart
> > > -            extraction, either via direct vector extract or through
> > > -            an integer mode punning.  */
> > > -         tree dst1, dst2;
> > > -         if (convert_optab_handler (vec_extract_optab,
> > > -                                    TYPE_MODE (TREE_TYPE (new_temp)),
> > > -                                    TYPE_MODE (vectype1))
> > > -             != CODE_FOR_nothing)
> > > -           {
> > > -             /* Extract sub-vectors directly once vec_extract becomes
> > > -                a conversion optab.  */
> > > -             dst1 = make_ssa_name (vectype1);
> > > -             epilog_stmt
> > > -                 = gimple_build_assign (dst1, BIT_FIELD_REF,
> > > -                                        build3 (BIT_FIELD_REF,
> vectype1,
> > > -                                                new_temp, TYPE_SIZE
> > > (vectype1),
> > > -                                                bitsize_int (0)));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -             dst2 =  make_ssa_name (vectype1);
> > > -             epilog_stmt
> > > -                 = gimple_build_assign (dst2, BIT_FIELD_REF,
> > > -                                        build3 (BIT_FIELD_REF,
> vectype1,
> > > -                                                new_temp, TYPE_SIZE
> > > (vectype1),
> > > -                                                bitsize_int
> (bitsize)));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -           }
> > > -         else
> > > -           {
> > > -             /* Extract via punning to appropriately sized integer
> mode
> > > -                vector.  */
> > > -             tree eltype = build_nonstandard_integer_type (bitsize,
> 1);
> > > -             tree etype = build_vector_type (eltype, 2);
> > > -             gcc_assert (convert_optab_handler (vec_extract_optab,
> > > -                                                TYPE_MODE (etype),
> > > -                                                TYPE_MODE (eltype))
> > > -                         != CODE_FOR_nothing);
> > > -             tree tem = make_ssa_name (etype);
> > > -             epilog_stmt = gimple_build_assign (tem,
> VIEW_CONVERT_EXPR,
> > > -                                                build1
> (VIEW_CONVERT_EXPR,
> > > -                                                        etype,
> new_temp));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -             new_temp = tem;
> > > -             tem = make_ssa_name (eltype);
> > > -             epilog_stmt
> > > -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> > > -                                        build3 (BIT_FIELD_REF, eltype,
> > > -                                                new_temp, TYPE_SIZE
> > > (eltype),
> > > -                                                bitsize_int (0)));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -             dst1 = make_ssa_name (vectype1);
> > > -             epilog_stmt = gimple_build_assign (dst1,
> VIEW_CONVERT_EXPR,
> > > -                                                build1
> (VIEW_CONVERT_EXPR,
> > > -                                                        vectype1,
> tem));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -             tem = make_ssa_name (eltype);
> > > -             epilog_stmt
> > > -                 = gimple_build_assign (tem, BIT_FIELD_REF,
> > > -                                        build3 (BIT_FIELD_REF, eltype,
> > > -                                                new_temp, TYPE_SIZE
> > > (eltype),
> > > -                                                bitsize_int
> (bitsize)));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -             dst2 =  make_ssa_name (vectype1);
> > > -             epilog_stmt = gimple_build_assign (dst2,
> VIEW_CONVERT_EXPR,
> > > -                                                build1
> (VIEW_CONVERT_EXPR,
> > > -                                                        vectype1,
> tem));
> > > -             gsi_insert_before (&exit_gsi, epilog_stmt,
> GSI_SAME_STMT);
> > > -           }
> > > -
> > > -         new_temp = make_ssa_name (vectype1);
> > > -         epilog_stmt = gimple_build_assign (new_temp, code, dst1,
> dst2);
> > > -         gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
> > > -         reduc_inputs[0] = new_temp;
> > > -       }
> > > +      gimple_seq stmts = NULL;
> > > +      new_temp = vect_create_partial_epilog (reduc_inputs[0],
> vectype1,
> > > +                                            code, &stmts);
> > > +      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > +      reduc_inputs[0] = new_temp;
> > >
> > >        if (reduce_with_shift && !slp_reduc)
> > >         {
> > > @@ -7681,13 +7701,46 @@ vect_transform_cycle_phi (loop_vec_info
> loop_vinfo,
> > >
> > >    if (auto *accumulator = reduc_info->reused_accumulator)
> > >      {
> > > +      tree def = accumulator->reduc_input;
> > > +      unsigned int nreduc;
> > > +      bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE
> > > (def)),
> > > +                                     TYPE_VECTOR_SUBPARTS
> (vectype_out),
> > > +                                     &nreduc);
> > > +      gcc_assert (res);
> > > +      if (nreduc != 1)
> > > +       {
> > > +         /* Reduce the single vector to a smaller one.  */
> > > +         gimple_seq stmts = NULL;
> > > +         def = vect_create_partial_epilog (def, vectype_out,
> > > +                                           STMT_VINFO_REDUC_CODE
> > > (reduc_info),
> > > +                                           &stmts);
> > > +         /* Adjust the input so we pick up the partially reduced value
> > > +            for the skip edge in vect_create_epilog_for_reduction.  */
> > > +         accumulator->reduc_input = def;
> > > +         if (loop_vinfo->main_loop_edge)
> > > +           {
> > > +             /* While we'd like to insert on the edge this will split
> > > +                blocks and disturb bookkeeping, we also will
> eventually
> > > +                need this on the skip edge.  Rely on sinking to
> > > +                fixup optimal placement and insert in the pred.  */
> > > +             gimple_stmt_iterator gsi
> > > +               = gsi_last_bb (loop_vinfo->main_loop_edge->src);
> > > +             /* Insert before a cond that eventually skips the
> > > +                epilogue.  */
> > > +             if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))
> > > +               gsi_prev (&gsi);
> > > +             gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);
> > > +           }
> > > +         else
> > > +           gsi_insert_seq_on_edge_immediate (loop_preheader_edge
> (loop),
> > > +                                             stmts);
> > > +       }
> > >        if (loop_vinfo->main_loop_edge)
> > >         vec_initial_defs[0]
> > > -         = vect_get_main_loop_result (loop_vinfo,
> > > accumulator->reduc_input,
> > > +         = vect_get_main_loop_result (loop_vinfo, def,
> > >                                        vec_initial_defs[0]);
> > >        else
> > > -       vec_initial_defs.safe_push (accumulator->reduc_input);
> > > -      gcc_assert (vec_initial_defs.length () == 1);
> > > +       vec_initial_defs.safe_push (def);
> > >      }
> > >
> > >    /* Generate the reduction PHIs upfront.  */
> > > --
> > > 2.26.2
> > >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-15 15:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 12:09 [PATCH] Support reduction def re-use for epilogue with different vector size Richard Biener
2021-07-13 14:17 ` Richard Sandiford
2021-07-15 12:25 ` Christophe Lyon
2021-07-15 12:34   ` Richard Biener
2021-07-15 15:19     ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).