public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-2690] aarch64: Split out aarch64_adjust_body_cost_sve
@ 2021-08-03 12:01 Richard Sandiford
  0 siblings, 0 replies; only message in thread
From: Richard Sandiford @ 2021-08-03 12:01 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b585f0112f293ace8fadc0c7ace59230140b7472

commit r12-2690-gb585f0112f293ace8fadc0c7ace59230140b7472
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Tue Aug 3 13:00:45 2021 +0100

    aarch64: Split out aarch64_adjust_body_cost_sve
    
    This patch splits the SVE-specific part of aarch64_adjust_body_cost
    out into its own subroutine, so that a future patch can call it
    more than once.  I wondered about using a lambda to avoid having
    to pass all the arguments, but in the end this way seemed clearer.
    
    gcc/
            * config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
            function, split out from...
            (aarch64_adjust_body_cost): ...here.

Diff:
---
 gcc/config/aarch64/aarch64.c | 220 +++++++++++++++++++++++++------------------
 1 file changed, 127 insertions(+), 93 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 17fcb34b2c8..b14b6f22aec 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15488,6 +15488,126 @@ aarch64_estimate_min_cycles_per_iter
   return cycles;
 }
 
+/* Subroutine of aarch64_adjust_body_cost for handling SVE.
+   Use ISSUE_INFO to work out how fast the SVE code can be issued and compare
+   it to the equivalent value for scalar code (SCALAR_CYCLES_PER_ITER).
+   If COULD_USE_ADVSIMD is true, also compare it to the issue rate of
+   Advanced SIMD code (ADVSIMD_CYCLES_PER_ITER).
+
+   COSTS is as for aarch64_adjust_body_cost.  ORIG_BODY_COST is the cost
+   originally passed to aarch64_adjust_body_cost and *BODY_COST is the current
+   value of the adjusted cost.  *SHOULD_DISPARAGE is true if we think the loop
+   body is too expensive.  */
+
+static fractional_cost
+aarch64_adjust_body_cost_sve (const aarch64_vector_costs *costs,
+			      const aarch64_vec_issue_info *issue_info,
+			      fractional_cost scalar_cycles_per_iter,
+			      fractional_cost advsimd_cycles_per_iter,
+			      bool could_use_advsimd,
+			      unsigned int orig_body_cost,
+			      unsigned int *body_cost,
+			      bool *should_disparage)
+{
+  /* Estimate the minimum number of cycles per iteration needed to issue
+     non-predicate operations.  */
+  fractional_cost sve_nonpred_cycles_per_iter
+    = aarch64_estimate_min_cycles_per_iter (&costs->sve_ops,
+					    issue_info->sve);
+
+  /* Separately estimate the minimum number of cycles per iteration needed
+     to issue the predicate operations.  */
+  fractional_cost sve_pred_issue_cycles_per_iter
+    = { costs->sve_ops.pred_ops, issue_info->sve->pred_ops_per_cycle };
+
+  /* Calculate the overall limit on the number of cycles per iteration.  */
+  fractional_cost sve_cycles_per_iter
+    = std::max (sve_nonpred_cycles_per_iter, sve_pred_issue_cycles_per_iter);
+
+  if (dump_enabled_p ())
+    {
+      costs->sve_ops.dump ();
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "  estimated cycles per iteration = %f\n",
+		       sve_cycles_per_iter.as_double ());
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "  estimated cycles per iteration for non-predicate"
+		       " operations = %f\n",
+		       sve_nonpred_cycles_per_iter.as_double ());
+      if (costs->sve_ops.pred_ops)
+	dump_printf_loc (MSG_NOTE, vect_location, "  estimated cycles per"
+			 " iteration for predicate operations = %d\n",
+			 sve_pred_issue_cycles_per_iter.as_double ());
+    }
+
+  /* If the scalar version of the loop could issue at least as
+     quickly as the predicate parts of the SVE loop, make the SVE loop
+     prohibitively expensive.  In this case vectorization is adding an
+     overhead that the original scalar code didn't have.
+
+     This is mostly intended to detect cases in which WHILELOs dominate
+     for very tight loops, which is something that normal latency-based
+     costs would not model.  Adding this kind of cliffedge would be
+     too drastic for scalar_cycles_per_iter vs. sve_cycles_per_iter;
+     code in the caller handles that case in a more conservative way.  */
+  fractional_cost sve_estimate = sve_pred_issue_cycles_per_iter + 1;
+  if (scalar_cycles_per_iter < sve_estimate)
+    {
+      unsigned int min_cost
+	= orig_body_cost * estimated_poly_value (BYTES_PER_SVE_VECTOR);
+      if (*body_cost < min_cost)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Increasing body cost to %d because the"
+			     " scalar code could issue within the limit"
+			     " imposed by predicate operations\n",
+			     min_cost);
+	  *body_cost = min_cost;
+	  *should_disparage = true;
+	}
+    }
+
+  /* If it appears that the Advanced SIMD version of a loop could issue
+     more quickly than the SVE one, increase the SVE cost in proportion
+     to the difference.  The intention is to make Advanced SIMD preferable
+     in cases where an Advanced SIMD version exists, without increasing
+     the costs so much that SVE won't be used at all.
+
+     The reasoning is similar to the scalar vs. predicate comparison above:
+     if the issue rate of the SVE code is limited by predicate operations
+     (i.e. if sve_pred_issue_cycles_per_iter > sve_nonpred_cycles_per_iter),
+     and if the Advanced SIMD code could issue within the limit imposed
+     by the predicate operations, the predicate operations are adding an
+     overhead that the original code didn't have and so we should prefer
+     the Advanced SIMD version.  However, if the predicate operations
+     do not dominate in this way, we should only increase the cost of
+     the SVE code if sve_cycles_per_iter is strictly greater than
+     advsimd_cycles_per_iter.  Given rounding effects, this should mean
+     that Advanced SIMD is either better or at least no worse.  */
+  if (sve_nonpred_cycles_per_iter >= sve_pred_issue_cycles_per_iter)
+    sve_estimate = sve_cycles_per_iter;
+  if (could_use_advsimd && advsimd_cycles_per_iter < sve_estimate)
+    {
+      /* This ensures that min_cost > orig_body_cost * 2.  */
+      unsigned int factor = fractional_cost::scale (1, sve_estimate,
+						    advsimd_cycles_per_iter);
+      unsigned int min_cost = orig_body_cost * factor + 1;
+      if (*body_cost < min_cost)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Increasing body cost to %d because Advanced"
+			     " SIMD code could issue as quickly\n",
+			     min_cost);
+	  *body_cost = min_cost;
+	  *should_disparage = true;
+	}
+    }
+
+  return sve_cycles_per_iter;
+}
+
 /* BODY_COST is the cost of a vector loop body recorded in COSTS.
    Adjust the cost as necessary and return the new cost.  */
 static unsigned int
@@ -15583,101 +15703,15 @@ aarch64_adjust_body_cost (aarch64_vector_costs *costs, unsigned int body_cost)
 
   if ((costs->vec_flags & VEC_ANY_SVE) && issue_info->sve)
     {
-      /* Estimate the minimum number of cycles per iteration needed to issue
-	 non-predicate operations.  */
-      fractional_cost sve_cycles_per_iter
-	= aarch64_estimate_min_cycles_per_iter (&costs->sve_ops,
-						issue_info->sve);
-
-      /* Separately estimate the minimum number of cycles per iteration needed
-	 to issue the predicate operations.  */
-      fractional_cost pred_cycles_per_iter
-	= { costs->sve_ops.pred_ops, issue_info->sve->pred_ops_per_cycle };
-
       if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location, "SVE issue estimate:\n");
-	  costs->sve_ops.dump ();
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "  estimated cycles per iteration for non-predicate"
-			   " operations = %f\n",
-			   sve_cycles_per_iter.as_double ());
-	  if (costs->sve_ops.pred_ops)
-	    dump_printf_loc (MSG_NOTE, vect_location, "  estimated cycles per"
-			     " iteration for predicate operations = %d\n",
-			     pred_cycles_per_iter.as_double ());
-	}
-
-      vector_cycles_per_iter = std::max (sve_cycles_per_iter,
-					 pred_cycles_per_iter);
+	dump_printf_loc (MSG_NOTE, vect_location, "SVE issue estimate:\n");
       vector_reduction_latency = costs->sve_ops.reduction_latency;
-
-      /* If the scalar version of the loop could issue at least as
-	 quickly as the predicate parts of the SVE loop, make the SVE loop
-	 prohibitively expensive.  In this case vectorization is adding an
-	 overhead that the original scalar code didn't have.
-
-	 This is mostly intended to detect cases in which WHILELOs dominate
-	 for very tight loops, which is something that normal latency-based
-	 costs would not model.  Adding this kind of cliffedge would be
-	 too drastic for scalar_cycles_per_iter vs. sve_cycles_per_iter;
-	 code later in the function handles that case in a more
-	 conservative way.  */
-      fractional_cost sve_estimate = pred_cycles_per_iter + 1;
-      if (scalar_cycles_per_iter < sve_estimate)
-	{
-	  unsigned int min_cost
-	    = orig_body_cost * estimated_poly_value (BYTES_PER_SVE_VECTOR);
-	  if (body_cost < min_cost)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, vect_location,
-				 "Increasing body cost to %d because the"
-				 " scalar code could issue within the limit"
-				 " imposed by predicate operations\n",
-				 min_cost);
-	      body_cost = min_cost;
-	      should_disparage = true;
-	    }
-	}
-
-      /* If it appears that the Advanced SIMD version of a loop could issue
-	 more quickly than the SVE one, increase the SVE cost in proportion
-	 to the difference.  The intention is to make Advanced SIMD preferable
-	 in cases where an Advanced SIMD version exists, without increasing
-	 the costs so much that SVE won't be used at all.
-
-	 The reasoning is similar to the scalar vs. predicate comparison above:
-	 if the issue rate of the SVE code is limited by predicate operations
-	 (i.e. if pred_cycles_per_iter > sve_cycles_per_iter), and if the
-	 Advanced SIMD code could issue within the limit imposed by the
-	 predicate operations, the predicate operations are adding an
-	 overhead that the original code didn't have and so we should prefer
-	 the Advanced SIMD version.  However, if the predicate operations
-	 do not dominate in this way, we should only increase the cost of
-	 the SVE code if sve_cycles_per_iter is strictly greater than
-	 advsimd_cycles_per_iter.  Given rounding effects, this should mean
-	 that Advanced SIMD is either better or at least no worse.  */
-      if (sve_cycles_per_iter >= pred_cycles_per_iter)
-	sve_estimate = sve_cycles_per_iter;
-      if (could_use_advsimd && advsimd_cycles_per_iter < sve_estimate)
-	{
-	  /* This ensures that min_cost > orig_body_cost * 2.  */
-	  unsigned int factor
-	    = fractional_cost::scale (1, sve_estimate,
-				      advsimd_cycles_per_iter);
-	  unsigned int min_cost = orig_body_cost * factor + 1;
-	  if (body_cost < min_cost)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, vect_location,
-				 "Increasing body cost to %d because Advanced"
-				 " SIMD code could issue as quickly\n",
-				 min_cost);
-	      body_cost = min_cost;
-	      should_disparage = true;
-	    }
-	}
+      vector_cycles_per_iter
+	= aarch64_adjust_body_cost_sve (costs, issue_info,
+					scalar_cycles_per_iter,
+					advsimd_cycles_per_iter,
+					could_use_advsimd, orig_body_cost,
+					&body_cost, &should_disparage);
     }
 
   /* Decide whether to stick to latency-based costs or whether to try to


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-08-03 12:01 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03 12:01 [gcc r12-2690] aarch64: Split out aarch64_adjust_body_cost_sve Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).