public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-4475] Try placing RTL folded constants in the constant pool.
@ 2021-10-18 11:18 Roger Sayle
  0 siblings, 0 replies; only message in thread
From: Roger Sayle @ 2021-10-18 11:18 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:247c407c83f0015f4b92d5f71e45b63192f6757e

commit r12-4475-g247c407c83f0015f4b92d5f71e45b63192f6757e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Oct 18 12:15:40 2021 +0100

    Try placing RTL folded constants in the constant pool.
    
    My recent attempts to come up with a testcase for my patch to evaluate
    ss_plus in simplify-rtx.c, identified a missed optimization opportunity
    (that's potentially a long-time regression): The RTL optimizers no longer
    place constants in the constant pool.
    
    The motivating x86_64 example is the simple program:
    
    typedef char v8qi __attribute__ ((vector_size (8)));
    
    v8qi foo()
    {
      v8qi tx = { 1, 0, 0, 0, 0, 0, 0, 0 };
      v8qi ty = { 2, 0, 0, 0, 0, 0, 0, 0 };
      v8qi t = __builtin_ia32_paddsb(tx, ty);
      return t;
    }
    
    which (with my previous patch) currently results in:
    foo:    movq    .LC0(%rip), %xmm0
            movq    .LC1(%rip), %xmm1
            paddsb  %xmm1, %xmm0
            ret
    
    even though the RTL contains the result in a REG_EQUAL note:
    
    (insn 7 6 12 2 (set (reg:V8QI 83)
            (ss_plus:V8QI (reg:V8QI 84)
                (reg:V8QI 85))) "ssaddqi3.c":7:12 1419 {*mmx_ssaddv8qi3}
         (expr_list:REG_DEAD (reg:V8QI 85)
            (expr_list:REG_DEAD (reg:V8QI 84)
                (expr_list:REG_EQUAL (const_vector:V8QI [
                            (const_int 3 [0x3])
                            (const_int 0 [0]) repeated x7
                        ])
                    (nil)))))
    
    Together with the patch below, GCC will now generate the much
    more sensible:
    foo:    movq    .LC2(%rip), %xmm0
            ret
    
    My first approach was to look in cse.c (where the REG_EQUAL note gets
    added) and notice that the constant pool handling functionality has been
    unreachable for a while.  A quick search for constant_pool_entries_cost
    shows that it's initialized to zero, but never set to a non-zero value,
    meaning that force_const_mem is never called.  This functionality used
    to work way back in 2003, but has been lost over time:
    https://gcc.gnu.org/pipermail/gcc-patches/2003-October/116435.html
    
    The changes to cse.c below restore this functionality (placing suitable
    constants in the constant pool) with two significant refinements;
    (i) it only attempts to do this if the function already uses a constant
    pool (thanks to the availability of crtl->uses_constant_pool since 2003).
    (ii) it allows different constants (i.e. modes) to have different costs,
    so that floating point "doubles" and 64-bit, 128-bit, 256-bit and 512-bit
    vectors don't all have the share the same cost.  Back in 2003, the
    assumption was that everything in a constant pool had the same
    cost, hence the global variable constant_pool_entries_cost.
    
    Although this is a useful CSE fix, it turns out that it doesn't cure
    my motivating problem above.  CSE only considers a single instruction,
    so determines that it's cheaper to perform the ss_plus (COSTS_N_INSNS(1))
    than read the result from the constant pool (COSTS_N_INSNS(2)).  It's
    only when the other reads from the constant pool are also eliminated,
    that this transformation is a win.  Hence a better place to perform
    this transformation is in combine, where after failing to "recog" the
    load of a suitable constant, it can retry after calling force_const_mem.
    This achieves the desired transformation and allows the backend insn_cost
    call-back to control whether or not using the constant pool is preferrable.
    
    Alas, it's rare to change code generation without affecting something in
    GCC's testsuite.  On x86_64-pc-linux-gnu there were two families of new
    failures (and I'd predict similar benign fallout on other platforms).
    One failure was gcc.target/i386/387-12.c (aka PR target/26915), where
    the test is missing an explicit -m32 flag.  On i686, it's very reasonable
    to materialize -1.0 using "fld1; fchs", but on x86_64-pc-linux-gnu we
    currently generate the awkward:
    testm1: fld1
            fchs
            fstpl   -8(%rsp)
            movsd   -8(%rsp), %xmm0
            ret
    
    which combine now very reasonably simplifies to just:
    testm1: movsd   .LC3(%rip), %xmm0
            ret
    
    The other class of x86_64-pc-linux-gnu failure was from materialization
    of vector constants using vpbroadcast (e.g. gcc.target/i386/pr90773-17.c)
    where the decision is finely balanced; the load of an integer register
    with an immediate constant, followed by a vpbroadcast is deemed to be
    COSTS_N_INSNS(2), whereas a load from the constant pool is also reported
    as COSTS_N_INSNS(2).  My solution is to tweak the i386.c's rtx_costs
    so that all other things being equal, an instruction (sequence) that
    accesses memory is fractionally more expensive than one that doesn't.
    
    2021-10-18  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            * combine.c (recog_for_combine): For an unrecognized move/set of
            a constant, try force_const_mem to place it in the constant pool.
            * cse.c (constant_pool_entries_cost, constant_pool_entries_regcost):
            Delete global variables (that are no longer assigned a cost value).
            (cse_insn): Simplify logic for deciding whether to place a folded
            constant in the constant pool using force_const_mem.
            (cse_main): Remove zero initialization of constant_pool_entries_cost
            and constant_pool_entries_regcost.
    
            * config/i386/i386.c (ix86_rtx_costs): Make memory accesses
            fractionally more expensive, when optimizing for speed.
    
    gcc/testsuite/ChangeLog
            * gcc.target/i386/387-12.c: Add explicit -m32 option.

Diff:
---
 gcc/combine.c                          | 22 +++++++++++++++-
 gcc/config/i386/i386.c                 |  7 +++++
 gcc/cse.c                              | 48 ++++++++++------------------------
 gcc/testsuite/gcc.target/i386/387-12.c |  2 +-
 4 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 892c834a160..03e9a780919 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11567,7 +11567,27 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
   bool changed = false;
 
   if (GET_CODE (pat) == SET)
-    changed = change_zero_ext (pat);
+    {
+      /* For an unrecognized single set of a constant, try placing it in
+	 the constant pool, if this function already uses one.  */
+      rtx src = SET_SRC (pat);
+      if (CONSTANT_P (src)
+	  && !CONST_INT_P (src)
+	  && crtl->uses_const_pool)
+	{
+	  machine_mode mode = GET_MODE (src);
+	  if (mode == VOIDmode)
+	    mode = GET_MODE (SET_DEST (pat));
+	  src = force_const_mem (mode, src);
+	  if (src)
+	    {
+	      SUBST (SET_SRC (pat), src);
+	      changed = true;
+	    }
+	}
+      else
+	changed = change_zero_ext (pat);
+    }
   else if (GET_CODE (pat) == PARALLEL)
     {
       int i;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9cc903e826b..3c3336d230a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20801,6 +20801,13 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	*total = cost->sse_op;
       return true;
 
+    case MEM:
+      /* An insn that accesses memory is slightly more expensive
+         than one that does not.  */
+      if (speed)
+        *total += 1;
+      return false;
+
     default:
       return false;
     }
diff --git a/gcc/cse.c b/gcc/cse.c
index 330c1e90ce0..4c3988ee430 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -491,14 +491,6 @@ static struct table_elt *table[HASH_SIZE];
 
 static struct table_elt *free_element_chain;
 
-/* Set to the cost of a constant pool reference if one was found for a
-   symbolic constant.  If this was found, it means we should try to
-   convert constants into constant pool entries if they don't fit in
-   the insn.  */
-
-static int constant_pool_entries_cost;
-static int constant_pool_entries_regcost;
-
 /* Trace a patch through the CFG.  */
 
 struct branch_path
@@ -4609,9 +4601,6 @@ cse_insn (rtx_insn *insn)
       int src_folded_regcost = MAX_COST;
       int src_related_regcost = MAX_COST;
       int src_elt_regcost = MAX_COST;
-      /* Set nonzero if we need to call force_const_mem on with the
-	 contents of src_folded before using it.  */
-      int src_folded_force_flag = 0;
       scalar_int_mode int_mode;
 
       dest = SET_DEST (sets[i].rtl);
@@ -5166,15 +5155,7 @@ cse_insn (rtx_insn *insn)
 			     src_related_cost, src_related_regcost) <= 0
 	      && preferable (src_folded_cost, src_folded_regcost,
 			     src_elt_cost, src_elt_regcost) <= 0)
-	    {
-	      trial = src_folded, src_folded_cost = MAX_COST;
-	      if (src_folded_force_flag)
-		{
-		  rtx forced = force_const_mem (mode, trial);
-		  if (forced)
-		    trial = forced;
-		}
-	    }
+	    trial = src_folded, src_folded_cost = MAX_COST;
 	  else if (src
 		   && preferable (src_cost, src_regcost,
 				  src_eqv_cost, src_eqv_regcost) <= 0
@@ -5361,23 +5342,24 @@ cse_insn (rtx_insn *insn)
 	      break;
 	    }
 
-	  /* If we previously found constant pool entries for
-	     constants and this is a constant, try making a
-	     pool entry.  Put it in src_folded unless we already have done
-	     this since that is where it likely came from.  */
+	  /* If the current function uses a constant pool and this is a
+	     constant, try making a pool entry. Put it in src_folded
+	     unless we already have done this since that is where it
+	     likely came from.  */
 
-	  else if (constant_pool_entries_cost
+	  else if (crtl->uses_const_pool
 		   && CONSTANT_P (trial)
-		   && (src_folded == 0
-		       || (!MEM_P (src_folded)
-			   && ! src_folded_force_flag))
+		   && !CONST_INT_P (trial)
+		   && (src_folded == 0 || !MEM_P (src_folded))
 		   && GET_MODE_CLASS (mode) != MODE_CC
 		   && mode != VOIDmode)
 	    {
-	      src_folded_force_flag = 1;
-	      src_folded = trial;
-	      src_folded_cost = constant_pool_entries_cost;
-	      src_folded_regcost = constant_pool_entries_regcost;
+	      src_folded = force_const_mem (mode, trial);
+	      if (src_folded)
+		{
+		  src_folded_cost = COST (src_folded, mode);
+		  src_folded_regcost = approx_reg_cost (src_folded);
+		}
 	    }
 	}
 
@@ -6630,8 +6612,6 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
   cse_cfg_altered = false;
   cse_jumps_altered = false;
   recorded_label_ref = false;
-  constant_pool_entries_cost = 0;
-  constant_pool_entries_regcost = 0;
   ebb_data.path_size = 0;
   ebb_data.nsets = 0;
   rtl_hooks = cse_rtl_hooks;
diff --git a/gcc/testsuite/gcc.target/i386/387-12.c b/gcc/testsuite/gcc.target/i386/387-12.c
index 62c1d483c27..7fe50a21981 100644
--- a/gcc/testsuite/gcc.target/i386/387-12.c
+++ b/gcc/testsuite/gcc.target/i386/387-12.c
@@ -1,6 +1,6 @@
 /* PR target/26915 */
 /* { dg-do compile } */
-/* { dg-options "-O -mfpmath=387 -mfancy-math-387" } */
+/* { dg-options "-O -m32 -mfpmath=387 -mfancy-math-387" } */
 
 double testm0(void)
 {


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-10-18 11:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 11:18 [gcc r12-4475] Try placing RTL folded constants in the constant pool Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).