public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Kewen.Lin" <linkw@linux.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Bill Schmidt <wschmidt@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>
Subject: [PATCH] rs6000: Adjust rs6000_density_test for strided_load
Date: Fri, 7 May 2021 10:29:00 +0800	[thread overview]
Message-ID: <7b9f9bdf-1ed5-139b-de9c-511ee8454b85@linux.ibm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

Hi,

We noticed that SPEC2017 503.bwaves_r run time degrades by about 8%
on P8 and P9 if we enabled vectorization at O2 fast-math.

Comparing to Ofast, compiler doesn't do the loop interchange on the
innermost loop, it's not profitable to vectorize it then.  Since
with loop vectorization, the loop becomes very intensive (density
ratio is 83), there are many scalar loads and further to construct
vector, it's bad that the vector CTORs have to wait for the required
loads are ready.

Now we have the function rs6000_density_test to check this kind of
intensive case, but for this case, the threshold is too generic and
a bit high.  This patch is to tweak the density heuristics by
introducing some more thresholds for strided_load, avoid to affect
some potential bmks sensitive to DENSITY_PCT_THRESHOLD change which
is generic.

Bootstrapped/regtested on powerpc64le-linux-gnu P9.

Nothing remarkable was observed with SPEC2017 Power9 full run,
excepting for bwaves_r degradation has been fixed.

Is it ok for trunk?

BR,
Kewen
------
gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_density_test): Add new heuristics
	for strided_load density check.

[-- Attachment #2: 0002-rs6000-Adjust-rs6000_density_test-for-strided_load.patch --]
[-- Type: text/plain, Size: 4604 bytes --]

---
 gcc/config/rs6000/rs6000.c | 88 +++++++++++++++++++++++++++++++++-----
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ffdf10098a9..5ae40d6f4ce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5245,12 +5245,16 @@ rs6000_density_test (rs6000_cost_data *data)
   const int DENSITY_PCT_THRESHOLD = 85;
   const int DENSITY_SIZE_THRESHOLD = 70;
   const int DENSITY_PENALTY = 10;
+  const int DENSITY_LOAD_PCT_THRESHOLD = 80;
+  const int DENSITY_LOAD_FOR_CTOR_PCT_THRESHOLD = 65;
+  const int DENSITY_LOAD_SIZE_THRESHOLD = 20;
   struct loop *loop = data->loop_info;
   basic_block *bbs = get_loop_body (loop);
   int nbbs = loop->num_nodes;
   loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
   int vec_cost = data->cost[vect_body], not_vec_cost = 0;
   int i, density_pct;
+  unsigned int nload_total = 0, nctor_for_strided = 0, nload_for_ctor = 0;
 
   /* Only care about cost of vector version, so exclude scalar version here.  */
   if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
@@ -5272,21 +5276,83 @@ rs6000_density_test (rs6000_cost_data *data)
 	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
 	      && !STMT_VINFO_IN_PATTERN_P (stmt_info))
 	    not_vec_cost++;
+	  else
+	    {
+	      stmt_vec_info vstmt_info = vect_stmt_to_vectorize (stmt_info);
+	      if (STMT_VINFO_DATA_REF (vstmt_info)
+		  && DR_IS_READ (STMT_VINFO_DATA_REF (vstmt_info)))
+		{
+		  if (STMT_VINFO_STRIDED_P (vstmt_info))
+		    {
+		      unsigned int ncopies = 1;
+		      unsigned int nunits = 1;
+		      /* TODO: For VMAT_STRIDED_SLP, the total CTOR can be
+			 fewer due to group access.  Simply handle it here
+			 for now.  */
+		      if (!STMT_SLP_TYPE (vstmt_info))
+			{
+			  tree vectype = STMT_VINFO_VECTYPE (vstmt_info);
+			  ncopies = vect_get_num_copies (loop_vinfo, vectype);
+			  nunits = vect_nunits_for_cost (vectype);
+			}
+		      unsigned int nloads = ncopies * nunits;
+		      nload_for_ctor += nloads;
+		      nload_total += nloads;
+		      nctor_for_strided += ncopies;
+		    }
+		  else
+		    nload_total++;
+		}
+	    }
 	}
     }
-
   free (bbs);
-  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
 
-  if (density_pct > DENSITY_PCT_THRESHOLD
-      && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
-    {
-      data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "density %d%%, cost %d exceeds threshold, penalizing "
-			 "loop body cost by %d%%", density_pct,
-			 vec_cost + not_vec_cost, DENSITY_PENALTY);
+  if (vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
+    {
+      density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
+      if (density_pct > DENSITY_PCT_THRESHOLD)
+	{
+	  data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "density %d%%, cost %d exceeds threshold, "
+			     "penalizing loop body cost by %d%%.\n",
+			     density_pct, vec_cost + not_vec_cost,
+			     DENSITY_PENALTY);
+	}
+      /* For one loop which has a large proportion scalar loads of all
+	 loads fed into vector construction, if the density is high,
+	 the loads will have more stalls than usual, further affect
+	 the vector construction.  One typical case is the innermost
+	 loop of the hotspot of spec2017 503.bwaves_r without loop
+	 interchange.  Here we price more on the related vector
+	 construction and penalize the body cost.  */
+      else if (density_pct > DENSITY_LOAD_PCT_THRESHOLD
+	       && nload_total > DENSITY_LOAD_SIZE_THRESHOLD)
+	{
+	  int load_for_ctor_pct = (nload_for_ctor * 100) / nload_total;
+	  /* Large proportion of scalar loads fed to vector CTOR.  */
+	  if (load_for_ctor_pct > DENSITY_LOAD_FOR_CTOR_PCT_THRESHOLD)
+	    {
+	      vec_cost += nctor_for_strided;
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Found high density loop with a large "
+				 "proportion %d%% of scalar loads fed to "
+				 "vector ctor, add cost %d.\n",
+				 load_for_ctor_pct, nctor_for_strided);
+
+	      data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "density %d%%, cost %d exceeds threshold, "
+				 "penalizing loop body cost by %d%% for "
+				 "load.\n",
+				 density_pct, vec_cost + not_vec_cost,
+				 DENSITY_PENALTY);
+	    }
+	}
     }
 }
 
-- 
2.17.1


             reply	other threads:[~2021-05-07  2:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  2:29 Kewen.Lin [this message]
2021-05-26  2:59 ` [PATCH v2] rs6000: Add load density heuristic Kewen.Lin
2021-06-09  2:26   ` PING^1 " Kewen.Lin
2021-06-28  7:01     ` PING^2 " Kewen.Lin
2021-07-15  1:59       ` PING^3 " Kewen.Lin
2021-07-27 22:25   ` will schmidt
2021-07-28  2:59     ` Kewen.Lin
2021-09-06 23:43       ` Segher Boessenkool
2021-09-08  7:01         ` Kewen.Lin
2021-07-28  5:22   ` [PATCH v3] " Kewen.Lin
2021-09-03 15:57     ` Bill Schmidt
2021-09-08  6:57       ` [PATCH v4] " Kewen.Lin
2021-09-08  8:28         ` Kewen.Lin
2021-09-09 16:11         ` Segher Boessenkool
2021-09-09 17:19           ` Bill Schmidt
2021-09-09 17:39             ` Bill Schmidt
2021-09-09 18:24             ` Segher Boessenkool
2021-09-10  3:22             ` Kewen.Lin
2021-09-10  3:46               ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b9f9bdf-1ed5-139b-de9c-511ee8454b85@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).