public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] optimize costly division in rtx_cost
@ 2020-01-05 18:39 Alexander Monakov
  2020-01-10  9:47 ` [PING][PATCH] " Alexander Monakov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Alexander Monakov @ 2020-01-05 18:39 UTC (permalink / raw)
  To: gcc-patches

Hi,

I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as
any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4.
It's also evident that rtx_cost does redundant work for a SET rtx argument.

Obviously the variable named 'factor' rarely exceeds 1, so in the majority of
cases it can be computed with a well-predictable branch rather than a division.

This patch makes rtx_cost do the division only in case mode is wider than
UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy.
No functional change.

Bootstrapped on x86_64, ok for trunk?

To illustrate the improvement this buys, for tramp3d -O2 compilation, I got
    
    before:
           73887675319      instructions:u
    
           72438432200      cycles:u
             924298569      idq.ms_uops:u
          102603799255      uops_dispatched.thread:u
    
    after:
           73888371724      instructions:u
    
           72386986612      cycles:u
             802744775      idq.ms_uops:u
          102096987220      uops_dispatched.thread:u

(this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer,
so the unneeded division is responsible for a good fraction of them)

	* rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the
	mode is not wider than UNITS_PER_WORD.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 9a7afccefb8..c7ab86e228b 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
   const char *fmt;
   int total;
   int factor;
+  unsigned mode_size;
 
   if (x == 0)
     return 0;
 
-  if (GET_MODE (x) != VOIDmode)
+  if (GET_CODE (x) == SET)
+    /* A SET doesn't have a mode, so let's look at the SET_DEST to get
+       the mode for the factor.  */
+    mode = GET_MODE (SET_DEST (x));
+  else if (GET_MODE (x) != VOIDmode)
     mode = GET_MODE (x);
 
+  mode_size = estimated_poly_value (GET_MODE_SIZE (mode));
+
   /* A size N times larger than UNITS_PER_WORD likely needs N times as
      many insns, taking N times as long.  */
-  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
-  if (factor == 0)
-    factor = 1;
+  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;
 
   /* Compute the default costs of certain things.
      Note that targetm.rtx_costs can override the defaults.  */
@@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
       /* Used in combine.c as a marker.  */
       total = 0;
       break;
-    case SET:
-      /* A SET doesn't have a mode, so let's look at the SET_DEST to get
-	 the mode for the factor.  */
-      mode = GET_MODE (SET_DEST (x));
-      factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
-      if (factor == 0)
-	factor = 1;
-      /* FALLTHRU */
     default:
       total = factor * COSTS_N_INSNS (1);
     }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-13 13:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-05 18:39 [PATCH] optimize costly division in rtx_cost Alexander Monakov
2020-01-10  9:47 ` [PING][PATCH] " Alexander Monakov
2020-01-20  9:37 ` [PATCH] " Alexander Monakov
2020-02-13 13:46 ` [PING^3][PATCH] " Alexander Monakov
2020-02-13 13:55   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).