public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] cselib: Reuse VALUEs on sp adjustments [PR92264]
@ 2020-04-02 10:29 Jakub Jelinek
  2020-04-02 10:44 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Jelinek @ 2020-04-02 10:29 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, Alexandre Oliva; +Cc: gcc-patches

Hi!

As discussed in the PR, if !ACCUMULATE_OUTGOING_ARGS on large functions we
can have hundreds of thousands of stack pointer adjustments and cselib
creates a new VALUE after each sp adjustment, which form extremely deep
VALUE chains, which is very harmful e.g. for find_base_term.
E.g. if we have
sp -= 4
sp -= 4
sp += 4
sp += 4
sp -= 4
sp += 4
that means 7 VALUEs, one for the sp at beginning (val1), than val2 = val1 -
4, then val3 = val2 - 4, then val4 = val3 + 4, then val5 = val4 + 4, then
val6 = val5 - 4, then val7 = val6 + 4.
This patch tweaks cselib, so that it is smarter about sp adjustments.
When cselib_lookup (stack_pointer_rtx, Pmode, 1, VOIDmode) and we know
nothing about sp yet (this happens at the start of the function, for
non-var-tracking also after cselib_reset_table and for var-tracking after
processing fp_setter insn where we forget about former sp values because
that is now hfp related while everything after it is sp related), we
look it up normally, but in addition to what we have been doing before
we mark the VALUE as SP_DERIVED_VALUE_P.  Further lookups of sp + offset
are then special cased, so that it is canonicalized to that
SP_DERIVED_VALUE_P VALUE + CONST_INT (if possible).  So, for the above,
we get val1 with SP_DERIVED_VALUE_P set, then val2 = val1 - 4, val3 = val1 -
8 (note, no longer val2 - 4!), then we get val2 again, val1 again, val2
again, val1 again.
In the find_base_term visited_vals.length () > 100 find_base_term
statistics during combined x86_64-linux and i686-linux bootstrap+regtest
cycle, without the patch I see:
			find_base_term > 100
			returning NULL	returning non-NULL
32-bit compilations	4229178		407
64-bit compilations	217523		0
with largest visited_vals.length () when returning non-NULL being 206.
With the patch the same numbers are:
32-bit compilations	1249588		135
64-bit compilations	3510		0
with largest visited_vals.length () when returning non-NULL being 173.
This shows significant reduction of the deep VALUE chains.
On powerpc64{,le}-linux, these stats didn't change at all, we have
			1008		0
for all of -m32, -m64 and little-endian -m64, just the
gcc.dg/pr85180.c and gcc.dg/pr87985.c testcases which are unrelated to sp.

My earlier version of the patch, which contained just the rtl.h and cselib.c
changes, regressed some tests:
gcc.dg/guality/{pr36728-{1,3},pr68860-{1,2}}.c
gcc.target/i386/{pr88416,sse-{13,23,24,25,26}}.c
The problem with the former tests was worse debug info, where with -m32
where arg7 was passed in a stack slot we though a push later on might have
invalidated it, when it couldn't.  This is something I've solved with the
var-tracking.c (vt_initialize) changes.  In those problematic functions, we
create a cfa_base VALUE (argp) and want to record that at the start of
the function the argp VALUE is sp + off and also record that current sp
VALUE is argp's VALUE - off.  The second permanent equivalence didn't make
it after the patch though, because cselib_add_permanent_equiv will
cselib_lookup the value of the expression it wants to add as the equivalence
and if it is the same VALUE as we are calling it on, it doesn't do anything;
and due to the cselib changes for sp based accesses that is exactly what
happened.  By reversing the order of the cselib_add_permanent_equiv calls we
get both equivalences though and thus are able to canonicalize the sp based
accesses in var-tracking to the cfa_base value + offset.
The i386 FAILs were all ICEs, where we had pushf instruction pushing flags
and then pop pseudo reading that value again.  With the cselib changes,
cselib during RTL DSE is able to see through the sp adjustment and wanted
to replace_read what was done pushf, by moving the flags register into a
pseudo and replace the memory read in the pop with that pseudo.  That is
wrong for two reasons: one is that the backend doesn't have an instruction
to move the flags hard register into some other register, but replace_read
has been validating just the mem -> pseudo replacement and not the insns
emitted by copy_to_mode_reg.  And the second issue is that it is obviously
wrong to replace a stack pop which contains stack post-increment by a copy
of pseudo into destination.  dse.c has some code to handle RTX_AUTOINC, but
only uses it when actually removing stores and only when there is REG_INC
note (stack RTX_AUTOINC does not have those), in check_for_inc_dec* where
it emits the reg adjustment(s) before the insn that is going to be deleted.
replace_read doesn't remove the insn, so if it e.g. contained REG_INC note,
it would be kept there and we might have the RTX_AUTOINC not just in *loc,
but other spots.
So, the dse.c changes try to validate the added insns and punt on all
RTX_AUTOINC in *loc.  Furthermore, it seems that with the cselib.c changes
on the gfortran.dg/pr87360.f90 and gcc.target/i386/pr88416.c testcases
check_for_inc_dec{,_1} happily throws stack pointer autoinc on the floor,
which is also wrong.  While we could perhaps do the for_each_inc_dec
call regardless of whether we have REG_INC note or not, we aren't prepared
to handle e.g. REG_ARGS_SIZE distribution and thus could end up with wrong
unwind info or ICEs during dwarf2cfi.c.  So the patch also punts on those,
after all, if we'd in theory managed to try to optimize such pushes before,
we'd create wrong-code.

On x86_64-linux and i686-linux, the patch has some minor debug info coverage
differences, but it doesn't appear very significant to me.
https://github.com/pmachata/dwlocstat tool gives (where before is vanilla
trunk + the rtl.h patch but not {cselib,var-tracking,dse}.c
--enable-checking=yes,rtl,extra bootstrapped, then {cselib,var-tracking,dse}.c
hunks applied and make cc1plus, while after is trunk with the whole patch
applied).

64-bit cc1plus
before
cov%	samples	cumul
0..10	1232756/48%	1232756/48%
11..20	31089/1%	1263845/49%
21..30	39172/1%	1303017/51%
31..40	38853/1%	1341870/52%
41..50	47473/1%	1389343/54%
51..60	45171/1%	1434514/56%
61..70	69393/2%	1503907/59%
71..80	61988/2%	1565895/61%
81..90	104528/4%	1670423/65%
91..100	875402/34%	2545825/100%
after
cov%	samples	cumul
0..10	1233238/48%	1233238/48%
11..20	31086/1%	1264324/49%
21..30	39157/1%	1303481/51%
31..40	38819/1%	1342300/52%
41..50	47447/1%	1389747/54%
51..60	45151/1%	1434898/56%
61..70	69379/2%	1504277/59%
71..80	61946/2%	1566223/61%
81..90	104508/4%	1670731/65%
91..100	875094/34%	2545825/100%

32-bit cc1plus
before
cov%	samples	cumul
0..10	1231221/48%	1231221/48%
11..20	30992/1%	1262213/49%
21..30	36422/1%	1298635/51%
31..40	35793/1%	1334428/52%
41..50	47102/1%	1381530/54%
51..60	41201/1%	1422731/56%
61..70	65467/2%	1488198/58%
71..80	59560/2%	1547758/61%
81..90	104076/4%	1651834/65%
91..100	881879/34%	2533713/100%
after
cov%	samples	cumul
0..10	1230469/48%	1230469/48%
11..20	30390/1%	1260859/49%
21..30	36362/1%	1297221/51%
31..40	36042/1%	1333263/52%
41..50	47619/1%	1380882/54%
51..60	41674/1%	1422556/56%
61..70	65849/2%	1488405/58%
71..80	59857/2%	1548262/61%
81..90	104178/4%	1652440/65%
91..100	881273/34%	2533713/100%

Bootstrapped/regtested on {x86_64,i686,powerpc64{,-le}}-linux, ok for trunk?

For the PR in question, my proposal would be to also lower
-param=max-find-base-term-values=
default from 2000 to 200 after this, at least in the above 4
bootstraps/regtests there is nothing that would ever result in
find_base_term returning non-NULL with more than 200 VALUEs being processed.

2020-04-02  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/92264
	* rtl.h (struct rtx_def): Mention that call bit is used as
	SP_DERIVED_VALUE_P in cselib.c.
	* cselib.c (SP_DERIVED_VALUE_P): Define.
	(PRESERVED_VALUE_P, SP_BASED_VALUE_P): Move definitions earlier.
	(cselib_hasher::equal): Handle equality between SP_DERIVED_VALUE_P
	val_rtx and sp based expression where offsets cancel each other.
	(preserve_constants_and_equivs): Formatting fix.
	(cselib_reset_table): Add reverse op loc to SP_DERIVED_VALUE_P
	locs list for cfa_base_preserved_val if needed.  Formatting fix.
	(autoinc_split): If the to be returned value is a REG, MEM or
	VALUE which has SP_DERIVED_VALUE_P + CONST_INT as one of its
	locs, return the SP_DERIVED_VALUE_P VALUE and adjust *off.
	(rtx_equal_for_cselib_1): Call autoinc_split even if both
	expressions are PLUS in Pmode with CONST_INT second operands.
	Handle SP_DERIVED_VALUE_P cases.
	(cselib_hash_plus_const_int): New function.
	(cselib_hash_rtx): Use it for PLUS in Pmode with CONST_INT
	second operand, as well as for PRE_DEC etc. that ought to be
	hashed the same way.
	(cselib_subst_to_values): Substitute PLUS with Pmode and
	CONST_INT operand if the first operand is a VALUE which has
	SP_DERIVED_VALUE_P + CONST_INT as one of its locs for the
	SP_DERIVED_VALUE_P + adjusted offset.
	(cselib_lookup_1): When creating a new VALUE for stack_pointer_rtx,
	set SP_DERIVED_VALUE_P on it.  Set PRESERVED_VALUE_P when adding
	SP_DERIVED_VALUE_P PRESERVED_VALUE_P subseted VALUE location.
	* var-tracking.c (vt_initialize): Call cselib_add_permanent_equiv
	on the sp value before calling cselib_add_permanent_equiv on the
	cfa_base value.
	* dse.c (check_for_inc_dec_1, check_for_inc_dec): Punt on RTX_AUTOINC
	in the insn without REG_INC note.
	(replace_read): Punt on RTX_AUTOINC in the *loc being replaced.
	Punt on invalid insns added by copy_to_mode_reg.  Formatting fixes.

--- gcc/rtl.h.jj	2020-04-01 09:43:45.346628694 +0200
+++ gcc/rtl.h	2020-04-01 17:46:15.160396309 +0200
@@ -331,6 +331,7 @@ struct GTY((desc("0"), tag("0"),
      1 in a MEM if it cannot trap.
      1 in a CALL_INSN logically equivalent to
        ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
+     1 in a VALUE is SP_DERIVED_VALUE_P in cselib.c.
      Dumped as "/c" in RTL dumps.  */
   unsigned int call : 1;
   /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
--- gcc/cselib.c.jj	2020-04-01 09:43:45.305629293 +0200
+++ gcc/cselib.c	2020-04-01 17:46:15.161396295 +0200
@@ -58,6 +58,16 @@ static void cselib_invalidate_regno (uns
 static void cselib_invalidate_mem (rtx);
 static void cselib_record_set (rtx, cselib_val *, cselib_val *);
 static void cselib_record_sets (rtx_insn *);
+static rtx autoinc_split (rtx, rtx *, machine_mode);
+
+#define PRESERVED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
+
+#define SP_BASED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
+
+#define SP_DERIVED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call)
 
 struct expand_value_data
 {
@@ -122,6 +132,13 @@ cselib_hasher::equal (const cselib_val *
   if (GET_CODE (x) == VALUE)
     return x == v->val_rtx;
 
+  if (SP_DERIVED_VALUE_P (v->val_rtx) && GET_MODE (x) == Pmode)
+    {
+      rtx xoff = NULL;
+      if (autoinc_split (x, &xoff, memmode) == v->val_rtx && xoff == NULL_RTX)
+	return true;
+    }
+
   /* We don't guarantee that distinct rtx's have different hash values,
      so we need to do a comparison.  */
   for (l = v->locs; l; l = l->next)
@@ -256,12 +273,6 @@ void (*cselib_discard_hook) (cselib_val
 void (*cselib_record_sets_hook) (rtx_insn *insn, struct cselib_set *sets,
 				 int n_sets);
 
-#define PRESERVED_VALUE_P(RTX) \
-  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
-
-#define SP_BASED_VALUE_P(RTX) \
-  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
-
 \f
 
 /* Allocate a struct elt_list and fill in its two elements with the
@@ -494,7 +505,7 @@ preserve_constants_and_equivs (cselib_va
       };
       cselib_val **slot
 	= cselib_preserved_hash_table->find_slot_with_hash (&lookup,
-							   v->hash, INSERT);
+							    v->hash, INSERT);
       gcc_assert (!*slot);
       *slot = v;
     }
@@ -532,6 +543,28 @@ cselib_reset_table (unsigned int num)
       max_value_regs
 	= hard_regno_nregs (regno,
 			    GET_MODE (cfa_base_preserved_val->locs->loc));
+
+      /* If cfa_base is sp + const_int, need to preserve also the
+	 SP_DERIVED_VALUE_P value.  */
+      for (struct elt_loc_list *l = cfa_base_preserved_val->locs;
+	   l; l = l->next)
+	if (GET_CODE (l->loc) == PLUS
+	    && GET_CODE (XEXP (l->loc, 0)) == VALUE
+	    && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
+	    && CONST_INT_P (XEXP (l->loc, 1)))
+	  {
+	    if (! invariant_or_equiv_p (CSELIB_VAL_PTR (XEXP (l->loc, 0))))
+	      {
+		rtx val = cfa_base_preserved_val->val_rtx;
+		rtx_insn *save_cselib_current_insn = cselib_current_insn;
+		cselib_current_insn = l->setting_insn;
+		new_elt_loc_list (CSELIB_VAL_PTR (XEXP (l->loc, 0)),
+				  plus_constant (Pmode, val,
+						 -UINTVAL (XEXP (l->loc, 1))));
+		cselib_current_insn = save_cselib_current_insn;
+	      }
+	    break;
+	  }
     }
   else
     {
@@ -541,8 +574,7 @@ cselib_reset_table (unsigned int num)
     }
 
   if (cselib_preserve_constants)
-    cselib_hash_table->traverse <void *, preserve_constants_and_equivs>
-      (NULL);
+    cselib_hash_table->traverse <void *, preserve_constants_and_equivs> (NULL);
   else
     {
       cselib_hash_table->empty ();
@@ -799,33 +831,66 @@ autoinc_split (rtx x, rtx *off, machine_
     {
     case PLUS:
       *off = XEXP (x, 1);
-      return XEXP (x, 0);
+      x = XEXP (x, 0);
+      break;
 
     case PRE_DEC:
       if (memmode == VOIDmode)
 	return x;
 
       *off = gen_int_mode (-GET_MODE_SIZE (memmode), GET_MODE (x));
-      return XEXP (x, 0);
+      x = XEXP (x, 0);
+      break;
 
     case PRE_INC:
       if (memmode == VOIDmode)
 	return x;
 
       *off = gen_int_mode (GET_MODE_SIZE (memmode), GET_MODE (x));
-      return XEXP (x, 0);
+      x = XEXP (x, 0);
+      break;
 
     case PRE_MODIFY:
-      return XEXP (x, 1);
+      x = XEXP (x, 1);
+      break;
 
     case POST_DEC:
     case POST_INC:
     case POST_MODIFY:
-      return XEXP (x, 0);
+      x = XEXP (x, 0);
+      break;
 
     default:
-      return x;
+      break;
     }
+
+  if (GET_MODE (x) == Pmode
+      && (REG_P (x) || MEM_P (x) || GET_CODE (x) == VALUE)
+      && (*off == NULL_RTX || CONST_INT_P (*off)))
+    {
+      cselib_val *e;
+      if (GET_CODE (x) == VALUE)
+	e = CSELIB_VAL_PTR (x);
+      else
+	e = cselib_lookup (x, GET_MODE (x), 0, memmode);
+      if (e)
+	for (struct elt_loc_list *l = e->locs; l; l = l->next)
+	  if (GET_CODE (l->loc) == PLUS
+	      && GET_CODE (XEXP (l->loc, 0)) == VALUE
+	      && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
+	      && CONST_INT_P (XEXP (l->loc, 1)))
+	    {
+	      if (*off == NULL_RTX)
+		*off = XEXP (l->loc, 1);
+	      else
+		*off = plus_constant (Pmode, *off,
+				      INTVAL (XEXP (l->loc, 1)));
+	      if (*off == const0_rtx)
+		*off = NULL_RTX;
+	      return XEXP (l->loc, 0);
+	    }
+    }
+  return x;
 }
 
 /* Return nonzero if we can prove that X and Y contain the same value,
@@ -868,6 +933,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
       if (GET_CODE (y) == VALUE)
 	return e == canonical_cselib_val (CSELIB_VAL_PTR (y));
 
+      if ((SP_DERIVED_VALUE_P (x)
+	   || SP_DERIVED_VALUE_P (e->val_rtx))
+	  && GET_MODE (y) == Pmode)
+	{
+	  rtx yoff = NULL;
+	  rtx yr = autoinc_split (y, &yoff, memmode);
+	  if ((yr == x || yr == e->val_rtx) && yoff == NULL_RTX)
+	    return 1;
+	}
+
       if (depth == 128)
 	return 0;
 
@@ -891,6 +966,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
       cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (y));
       struct elt_loc_list *l;
 
+      if ((SP_DERIVED_VALUE_P (y)
+	   || SP_DERIVED_VALUE_P (e->val_rtx))
+	  && GET_MODE (x) == Pmode)
+	{
+	  rtx xoff = NULL;
+	  rtx xr = autoinc_split (x, &xoff, memmode);
+	  if ((xr == y || xr == e->val_rtx) && xoff == NULL_RTX)
+	    return 1;
+	}
+
       if (depth == 128)
 	return 0;
 
@@ -910,7 +995,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
   if (GET_MODE (x) != GET_MODE (y))
     return 0;
 
-  if (GET_CODE (x) != GET_CODE (y))
+  if (GET_CODE (x) != GET_CODE (y)
+      || (GET_CODE (x) == PLUS
+	  && GET_MODE (x) == Pmode
+	  && CONST_INT_P (XEXP (x, 1))
+	  && CONST_INT_P (XEXP (y, 1))))
     {
       rtx xorig = x, yorig = y;
       rtx xoff = NULL, yoff = NULL;
@@ -918,17 +1007,20 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
       x = autoinc_split (x, &xoff, memmode);
       y = autoinc_split (y, &yoff, memmode);
 
-      if (!xoff != !yoff)
-	return 0;
-
-      if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
-	return 0;
-
       /* Don't recurse if nothing changed.  */
       if (x != xorig || y != yorig)
-	return rtx_equal_for_cselib_1 (x, y, memmode, depth);
+	{
+	  if (!xoff != !yoff)
+	    return 0;
 
-      return 0;
+	  if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
+	    return 0;
+
+	  return rtx_equal_for_cselib_1 (x, y, memmode, depth);
+	}
+
+      if (GET_CODE (xorig) != GET_CODE (yorig))
+	return 0;
     }
 
   /* These won't be handled correctly by the code below.  */
@@ -1042,6 +1134,41 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
   return 1;
 }
 
+/* Helper function for cselib_hash_rtx.  Arguments like for cselib_hash_rtx,
+   except that it hashes (plus:P x c).  */
+
+static unsigned int
+cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int create,
+			    machine_mode memmode)
+{
+  cselib_val *e = cselib_lookup (x, GET_MODE (x), create, memmode);
+  if (! e)
+    return 0;
+
+  if (! SP_DERIVED_VALUE_P (e->val_rtx))
+    for (struct elt_loc_list *l = e->locs; l; l = l->next)
+      if (GET_CODE (l->loc) == PLUS
+	  && GET_CODE (XEXP (l->loc, 0)) == VALUE
+	  && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
+	  && CONST_INT_P (XEXP (l->loc, 1)))
+	{
+	  e = CSELIB_VAL_PTR (XEXP (l->loc, 0));
+	  c = trunc_int_for_mode (c + UINTVAL (XEXP (l->loc, 1)), Pmode);
+	  break;
+	}
+  if (c == 0)
+    return e->hash;
+
+  unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x);
+  hash += e->hash;
+  unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode;
+  tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c;
+  if (tem_hash == 0)
+    tem_hash = (unsigned int) CONST_INT;
+  hash += tem_hash;
+  return hash ? hash : 1 + (unsigned int) PLUS;
+}
+
 /* Hash an rtx.  Return 0 if we couldn't hash the rtx.
    For registers and memory locations, we look up their cselib_val structure
    and return its VALUE element.
@@ -1209,10 +1336,21 @@ cselib_hash_rtx (rtx x, int create, mach
 	offset = -offset;
       /* Adjust the hash so that (mem:MEMMODE (pre_* (reg))) hashes
 	 like (mem:MEMMODE (plus (reg) (const_int I))).  */
-      hash += (unsigned) PLUS - (unsigned)code
-	+ cselib_hash_rtx (XEXP (x, 0), create, memmode)
-	+ cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
-			   create, memmode);
+      if (GET_MODE (x) == Pmode
+	  && (REG_P (XEXP (x, 0))
+	      || MEM_P (XEXP (x, 0))
+	      || GET_CODE (XEXP (x, 0)) == VALUE))
+	{
+	  HOST_WIDE_INT c;
+	  if (offset.is_constant (&c))
+	    return cselib_hash_plus_const_int (XEXP (x, 0),
+					       trunc_int_for_mode (c, Pmode),
+					       create, memmode);
+	}
+      hash = ((unsigned) PLUS + (unsigned) GET_MODE (x)
+	      + cselib_hash_rtx (XEXP (x, 0), create, memmode)
+	      + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
+				 create, memmode));
       return hash ? hash : 1 + (unsigned) PLUS;
 
     case PRE_MODIFY:
@@ -1237,6 +1375,16 @@ cselib_hash_rtx (rtx x, int create, mach
 
       break;
 
+    case PLUS:
+      if (GET_MODE (x) == Pmode
+	  && (REG_P (XEXP (x, 0))
+	      || MEM_P (XEXP (x, 0))
+	      || GET_CODE (XEXP (x, 0)) == VALUE)
+	  && CONST_INT_P (XEXP (x, 1)))
+	return cselib_hash_plus_const_int (XEXP (x, 0), INTVAL (XEXP (x, 1)),
+					   create, memmode);
+      break;
+
     default:
       break;
     }
@@ -1927,6 +2075,26 @@ cselib_subst_to_values (rtx x, machine_m
       gcc_assert (memmode != VOIDmode);
       return cselib_subst_to_values (XEXP (x, 0), memmode);
 
+    case PLUS:
+      if (GET_MODE (x) == Pmode && CONST_INT_P (XEXP (x, 1)))
+	{
+	  rtx t = cselib_subst_to_values (XEXP (x, 0), memmode);
+	  if (GET_CODE (t) == VALUE)
+	    for (struct elt_loc_list *l = CSELIB_VAL_PTR (t)->locs;
+		 l; l = l->next)
+	      if (GET_CODE (l->loc) == PLUS
+		  && GET_CODE (XEXP (l->loc, 0)) == VALUE
+		  && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
+		  && CONST_INT_P (XEXP (l->loc, 1)))
+		return plus_constant (Pmode, l->loc, INTVAL (XEXP (x, 1)));
+	  if (t != XEXP (x, 0))
+	    {
+	      copy = shallow_copy_rtx (x);
+	      XEXP (copy, 0) = t;
+	    }
+	  return copy;
+	}
+
     default:
       break;
     }
@@ -2030,6 +2198,8 @@ cselib_lookup_1 (rtx x, machine_mode mod
 	}
 
       e = new_cselib_val (next_uid, GET_MODE (x), x);
+      if (GET_MODE (x) == Pmode && x == stack_pointer_rtx)
+	SP_DERIVED_VALUE_P (e->val_rtx) = 1;
       new_elt_loc_list (e, x);
 
       scalar_int_mode int_mode;
@@ -2107,7 +2277,14 @@ cselib_lookup_1 (rtx x, machine_mode mod
      the hash table is inconsistent until we do so, and
      cselib_subst_to_values will need to do lookups.  */
   *slot = e;
-  new_elt_loc_list (e, cselib_subst_to_values (x, memmode));
+  rtx v = cselib_subst_to_values (x, memmode);
+
+  /* If cselib_preserve_constants, we might get a SP_DERIVED_VALUE_P
+     VALUE that isn't in the hash tables anymore.  */
+  if (GET_CODE (v) == VALUE && SP_DERIVED_VALUE_P (v) && PRESERVED_VALUE_P (v))
+    PRESERVED_VALUE_P (e->val_rtx) = 1;
+
+  new_elt_loc_list (e, v);
   return e;
 }
 
--- gcc/var-tracking.c.jj	2020-03-26 09:15:29.825500110 +0100
+++ gcc/var-tracking.c	2020-04-01 19:09:30.145421270 +0200
@@ -10045,19 +10045,20 @@ vt_initialize (void)
       preserve_value (val);
       if (reg != hard_frame_pointer_rtx && fixed_regs[REGNO (reg)])
 	cselib_preserve_cfa_base_value (val, REGNO (reg));
-      expr = plus_constant (GET_MODE (stack_pointer_rtx),
-			    stack_pointer_rtx, -ofst);
-      cselib_add_permanent_equiv (val, expr, get_insns ());
-
       if (ofst)
 	{
-	  val = cselib_lookup_from_insn (stack_pointer_rtx,
-					 GET_MODE (stack_pointer_rtx), 1,
-					 VOIDmode, get_insns ());
-	  preserve_value (val);
+	  cselib_val *valsp
+	    = cselib_lookup_from_insn (stack_pointer_rtx,
+				       GET_MODE (stack_pointer_rtx), 1,
+				       VOIDmode, get_insns ());
+	  preserve_value (valsp);
 	  expr = plus_constant (GET_MODE (reg), reg, ofst);
-	  cselib_add_permanent_equiv (val, expr, get_insns ());
+	  cselib_add_permanent_equiv (valsp, expr, get_insns ());
 	}
+
+      expr = plus_constant (GET_MODE (stack_pointer_rtx),
+			    stack_pointer_rtx, -ofst);
+      cselib_add_permanent_equiv (val, expr, get_insns ());
     }
 
   /* In order to factor out the adjustments made to the stack pointer or to
--- gcc/dse.c.jj	2020-03-03 11:04:46.365821937 +0100
+++ gcc/dse.c	2020-04-02 10:48:08.042260842 +0200
@@ -846,6 +846,17 @@ check_for_inc_dec_1 (insn_info_t insn_in
   if (note)
     return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
 			     insn_info) == 0;
+
+  /* Punt on stack pushes, those don't have REG_INC notes and we are
+     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
+    {
+      const_rtx x = *iter;
+      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
+	return false;
+    }
+
   return true;
 }
 
@@ -866,6 +877,17 @@ check_for_inc_dec (rtx_insn *insn)
   if (note)
     return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
 			     &insn_info) == 0;
+
+  /* Punt on stack pushes, those don't have REG_INC notes and we are
+     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
+    {
+      const_rtx x = *iter;
+      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
+	return false;
+    }
+
   return true;
 }
 
@@ -1980,16 +2002,30 @@ replace_read (store_info *store_info, in
 	 point.  This does occasionally happen, see PR 37922.  */
       bitmap regs_set = BITMAP_ALLOC (&reg_obstack);
 
-      for (this_insn = insns; this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn))
-	note_stores (this_insn, look_for_hardregs, regs_set);
+      for (this_insn = insns;
+	   this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn))
+	{
+	  if (insn_invalid_p (this_insn, false))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, " -- replacing the loaded MEM with ");
+		  print_simple_rtl (dump_file, read_reg);
+		  fprintf (dump_file, " led to an invalid instruction\n");
+		}
+	      BITMAP_FREE (regs_set);
+	      return false;
+	    }
+	  note_stores (this_insn, look_for_hardregs, regs_set);
+	}
 
       bitmap_and_into (regs_set, regs_live);
       if (!bitmap_empty_p (regs_set))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    {
-	      fprintf (dump_file,
-		       "abandoning replacement because sequence clobbers live hardregs:");
+	      fprintf (dump_file, "abandoning replacement because sequence "
+				  "clobbers live hardregs:");
 	      df_print_regset (dump_file, regs_set);
 	    }
 
@@ -1999,6 +2035,19 @@ replace_read (store_info *store_info, in
       BITMAP_FREE (regs_set);
     }
 
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
+    {
+      const_rtx x = *iter;
+      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, " -- replacing the MEM failed due to address "
+				"side-effects\n");
+	  return false;
+	}
+    }
+
   if (validate_change (read_insn->insn, loc, read_reg, 0))
     {
       deferred_change *change = deferred_change_pool.allocate ();

	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cselib: Reuse VALUEs on sp adjustments [PR92264]
  2020-04-02 10:29 [PATCH] cselib: Reuse VALUEs on sp adjustments [PR92264] Jakub Jelinek
@ 2020-04-02 10:44 ` Richard Biener
  2020-04-02 12:35   ` [committed] params: Decrease -param=max-find-base-term-values= default [PR92264] Jakub Jelinek
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2020-04-02 10:44 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Alexandre Oliva, gcc-patches

On Thu, 2 Apr 2020, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, if !ACCUMULATE_OUTGOING_ARGS on large functions we
> can have hundreds of thousands of stack pointer adjustments and cselib
> creates a new VALUE after each sp adjustment, which form extremely deep
> VALUE chains, which is very harmful e.g. for find_base_term.
> E.g. if we have
> sp -= 4
> sp -= 4
> sp += 4
> sp += 4
> sp -= 4
> sp += 4
> that means 7 VALUEs, one for the sp at beginning (val1), than val2 = val1 -
> 4, then val3 = val2 - 4, then val4 = val3 + 4, then val5 = val4 + 4, then
> val6 = val5 - 4, then val7 = val6 + 4.
> This patch tweaks cselib, so that it is smarter about sp adjustments.
> When cselib_lookup (stack_pointer_rtx, Pmode, 1, VOIDmode) and we know
> nothing about sp yet (this happens at the start of the function, for
> non-var-tracking also after cselib_reset_table and for var-tracking after
> processing fp_setter insn where we forget about former sp values because
> that is now hfp related while everything after it is sp related), we
> look it up normally, but in addition to what we have been doing before
> we mark the VALUE as SP_DERIVED_VALUE_P.  Further lookups of sp + offset
> are then special cased, so that it is canonicalized to that
> SP_DERIVED_VALUE_P VALUE + CONST_INT (if possible).  So, for the above,
> we get val1 with SP_DERIVED_VALUE_P set, then val2 = val1 - 4, val3 = val1 -
> 8 (note, no longer val2 - 4!), then we get val2 again, val1 again, val2
> again, val1 again.
> In the find_base_term visited_vals.length () > 100 find_base_term
> statistics during combined x86_64-linux and i686-linux bootstrap+regtest
> cycle, without the patch I see:
> 			find_base_term > 100
> 			returning NULL	returning non-NULL
> 32-bit compilations	4229178		407
> 64-bit compilations	217523		0
> with largest visited_vals.length () when returning non-NULL being 206.
> With the patch the same numbers are:
> 32-bit compilations	1249588		135
> 64-bit compilations	3510		0
> with largest visited_vals.length () when returning non-NULL being 173.
> This shows significant reduction of the deep VALUE chains.
> On powerpc64{,le}-linux, these stats didn't change at all, we have
> 			1008		0
> for all of -m32, -m64 and little-endian -m64, just the
> gcc.dg/pr85180.c and gcc.dg/pr87985.c testcases which are unrelated to sp.
> 
> My earlier version of the patch, which contained just the rtl.h and cselib.c
> changes, regressed some tests:
> gcc.dg/guality/{pr36728-{1,3},pr68860-{1,2}}.c
> gcc.target/i386/{pr88416,sse-{13,23,24,25,26}}.c
> The problem with the former tests was worse debug info, where with -m32
> where arg7 was passed in a stack slot we though a push later on might have
> invalidated it, when it couldn't.  This is something I've solved with the
> var-tracking.c (vt_initialize) changes.  In those problematic functions, we
> create a cfa_base VALUE (argp) and want to record that at the start of
> the function the argp VALUE is sp + off and also record that current sp
> VALUE is argp's VALUE - off.  The second permanent equivalence didn't make
> it after the patch though, because cselib_add_permanent_equiv will
> cselib_lookup the value of the expression it wants to add as the equivalence
> and if it is the same VALUE as we are calling it on, it doesn't do anything;
> and due to the cselib changes for sp based accesses that is exactly what
> happened.  By reversing the order of the cselib_add_permanent_equiv calls we
> get both equivalences though and thus are able to canonicalize the sp based
> accesses in var-tracking to the cfa_base value + offset.

I think this warrants a comment in the code.

> The i386 FAILs were all ICEs, where we had pushf instruction pushing flags
> and then pop pseudo reading that value again.  With the cselib changes,
> cselib during RTL DSE is able to see through the sp adjustment and wanted
> to replace_read what was done pushf, by moving the flags register into a
> pseudo and replace the memory read in the pop with that pseudo.  That is
> wrong for two reasons: one is that the backend doesn't have an instruction
> to move the flags hard register into some other register, but replace_read
> has been validating just the mem -> pseudo replacement and not the insns
> emitted by copy_to_mode_reg.  And the second issue is that it is obviously
> wrong to replace a stack pop which contains stack post-increment by a copy
> of pseudo into destination.  dse.c has some code to handle RTX_AUTOINC, but
> only uses it when actually removing stores and only when there is REG_INC
> note (stack RTX_AUTOINC does not have those), in check_for_inc_dec* where
> it emits the reg adjustment(s) before the insn that is going to be deleted.
> replace_read doesn't remove the insn, so if it e.g. contained REG_INC note,
> it would be kept there and we might have the RTX_AUTOINC not just in *loc,
> but other spots.
> So, the dse.c changes try to validate the added insns and punt on all
> RTX_AUTOINC in *loc.  Furthermore, it seems that with the cselib.c changes
> on the gfortran.dg/pr87360.f90 and gcc.target/i386/pr88416.c testcases
> check_for_inc_dec{,_1} happily throws stack pointer autoinc on the floor,
> which is also wrong.  While we could perhaps do the for_each_inc_dec
> call regardless of whether we have REG_INC note or not, we aren't prepared
> to handle e.g. REG_ARGS_SIZE distribution and thus could end up with wrong
> unwind info or ICEs during dwarf2cfi.c.  So the patch also punts on those,
> after all, if we'd in theory managed to try to optimize such pushes before,
> we'd create wrong-code.
> 
> On x86_64-linux and i686-linux, the patch has some minor debug info coverage
> differences, but it doesn't appear very significant to me.
> https://github.com/pmachata/dwlocstat tool gives (where before is vanilla
> trunk + the rtl.h patch but not {cselib,var-tracking,dse}.c
> --enable-checking=yes,rtl,extra bootstrapped, then {cselib,var-tracking,dse}.c
> hunks applied and make cc1plus, while after is trunk with the whole patch
> applied).
> 
> 64-bit cc1plus
> before
> cov%	samples	cumul
> 0..10	1232756/48%	1232756/48%
> 11..20	31089/1%	1263845/49%
> 21..30	39172/1%	1303017/51%
> 31..40	38853/1%	1341870/52%
> 41..50	47473/1%	1389343/54%
> 51..60	45171/1%	1434514/56%
> 61..70	69393/2%	1503907/59%
> 71..80	61988/2%	1565895/61%
> 81..90	104528/4%	1670423/65%
> 91..100	875402/34%	2545825/100%
> after
> cov%	samples	cumul
> 0..10	1233238/48%	1233238/48%
> 11..20	31086/1%	1264324/49%
> 21..30	39157/1%	1303481/51%
> 31..40	38819/1%	1342300/52%
> 41..50	47447/1%	1389747/54%
> 51..60	45151/1%	1434898/56%
> 61..70	69379/2%	1504277/59%
> 71..80	61946/2%	1566223/61%
> 81..90	104508/4%	1670731/65%
> 91..100	875094/34%	2545825/100%
> 
> 32-bit cc1plus
> before
> cov%	samples	cumul
> 0..10	1231221/48%	1231221/48%
> 11..20	30992/1%	1262213/49%
> 21..30	36422/1%	1298635/51%
> 31..40	35793/1%	1334428/52%
> 41..50	47102/1%	1381530/54%
> 51..60	41201/1%	1422731/56%
> 61..70	65467/2%	1488198/58%
> 71..80	59560/2%	1547758/61%
> 81..90	104076/4%	1651834/65%
> 91..100	881879/34%	2533713/100%
> after
> cov%	samples	cumul
> 0..10	1230469/48%	1230469/48%
> 11..20	30390/1%	1260859/49%
> 21..30	36362/1%	1297221/51%
> 31..40	36042/1%	1333263/52%
> 41..50	47619/1%	1380882/54%
> 51..60	41674/1%	1422556/56%
> 61..70	65849/2%	1488405/58%
> 71..80	59857/2%	1548262/61%
> 81..90	104178/4%	1652440/65%
> 91..100	881273/34%	2533713/100%
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64{,-le}}-linux, ok for trunk?

OK.

> For the PR in question, my proposal would be to also lower
> -param=max-find-base-term-values=
> default from 2000 to 200 after this, at least in the above 4
> bootstraps/regtests there is nothing that would ever result in
> find_base_term returning non-NULL with more than 200 VALUEs being processed.

Sounds good to me.

Thanks,
Richard.

> 2020-04-02  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/92264
> 	* rtl.h (struct rtx_def): Mention that call bit is used as
> 	SP_DERIVED_VALUE_P in cselib.c.
> 	* cselib.c (SP_DERIVED_VALUE_P): Define.
> 	(PRESERVED_VALUE_P, SP_BASED_VALUE_P): Move definitions earlier.
> 	(cselib_hasher::equal): Handle equality between SP_DERIVED_VALUE_P
> 	val_rtx and sp based expression where offsets cancel each other.
> 	(preserve_constants_and_equivs): Formatting fix.
> 	(cselib_reset_table): Add reverse op loc to SP_DERIVED_VALUE_P
> 	locs list for cfa_base_preserved_val if needed.  Formatting fix.
> 	(autoinc_split): If the to be returned value is a REG, MEM or
> 	VALUE which has SP_DERIVED_VALUE_P + CONST_INT as one of its
> 	locs, return the SP_DERIVED_VALUE_P VALUE and adjust *off.
> 	(rtx_equal_for_cselib_1): Call autoinc_split even if both
> 	expressions are PLUS in Pmode with CONST_INT second operands.
> 	Handle SP_DERIVED_VALUE_P cases.
> 	(cselib_hash_plus_const_int): New function.
> 	(cselib_hash_rtx): Use it for PLUS in Pmode with CONST_INT
> 	second operand, as well as for PRE_DEC etc. that ought to be
> 	hashed the same way.
> 	(cselib_subst_to_values): Substitute PLUS with Pmode and
> 	CONST_INT operand if the first operand is a VALUE which has
> 	SP_DERIVED_VALUE_P + CONST_INT as one of its locs for the
> 	SP_DERIVED_VALUE_P + adjusted offset.
> 	(cselib_lookup_1): When creating a new VALUE for stack_pointer_rtx,
> 	set SP_DERIVED_VALUE_P on it.  Set PRESERVED_VALUE_P when adding
> 	SP_DERIVED_VALUE_P PRESERVED_VALUE_P subseted VALUE location.
> 	* var-tracking.c (vt_initialize): Call cselib_add_permanent_equiv
> 	on the sp value before calling cselib_add_permanent_equiv on the
> 	cfa_base value.
> 	* dse.c (check_for_inc_dec_1, check_for_inc_dec): Punt on RTX_AUTOINC
> 	in the insn without REG_INC note.
> 	(replace_read): Punt on RTX_AUTOINC in the *loc being replaced.
> 	Punt on invalid insns added by copy_to_mode_reg.  Formatting fixes.
> 
> --- gcc/rtl.h.jj	2020-04-01 09:43:45.346628694 +0200
> +++ gcc/rtl.h	2020-04-01 17:46:15.160396309 +0200
> @@ -331,6 +331,7 @@ struct GTY((desc("0"), tag("0"),
>       1 in a MEM if it cannot trap.
>       1 in a CALL_INSN logically equivalent to
>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
> +     1 in a VALUE is SP_DERIVED_VALUE_P in cselib.c.
>       Dumped as "/c" in RTL dumps.  */
>    unsigned int call : 1;
>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
> --- gcc/cselib.c.jj	2020-04-01 09:43:45.305629293 +0200
> +++ gcc/cselib.c	2020-04-01 17:46:15.161396295 +0200
> @@ -58,6 +58,16 @@ static void cselib_invalidate_regno (uns
>  static void cselib_invalidate_mem (rtx);
>  static void cselib_record_set (rtx, cselib_val *, cselib_val *);
>  static void cselib_record_sets (rtx_insn *);
> +static rtx autoinc_split (rtx, rtx *, machine_mode);
> +
> +#define PRESERVED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
> +
> +#define SP_BASED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
> +
> +#define SP_DERIVED_VALUE_P(RTX) \
> +  (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call)
>  
>  struct expand_value_data
>  {
> @@ -122,6 +132,13 @@ cselib_hasher::equal (const cselib_val *
>    if (GET_CODE (x) == VALUE)
>      return x == v->val_rtx;
>  
> +  if (SP_DERIVED_VALUE_P (v->val_rtx) && GET_MODE (x) == Pmode)
> +    {
> +      rtx xoff = NULL;
> +      if (autoinc_split (x, &xoff, memmode) == v->val_rtx && xoff == NULL_RTX)
> +	return true;
> +    }
> +
>    /* We don't guarantee that distinct rtx's have different hash values,
>       so we need to do a comparison.  */
>    for (l = v->locs; l; l = l->next)
> @@ -256,12 +273,6 @@ void (*cselib_discard_hook) (cselib_val
>  void (*cselib_record_sets_hook) (rtx_insn *insn, struct cselib_set *sets,
>  				 int n_sets);
>  
> -#define PRESERVED_VALUE_P(RTX) \
> -  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
> -
> -#define SP_BASED_VALUE_P(RTX) \
> -  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
> -
>  \f
>  
>  /* Allocate a struct elt_list and fill in its two elements with the
> @@ -494,7 +505,7 @@ preserve_constants_and_equivs (cselib_va
>        };
>        cselib_val **slot
>  	= cselib_preserved_hash_table->find_slot_with_hash (&lookup,
> -							   v->hash, INSERT);
> +							    v->hash, INSERT);
>        gcc_assert (!*slot);
>        *slot = v;
>      }
> @@ -532,6 +543,28 @@ cselib_reset_table (unsigned int num)
>        max_value_regs
>  	= hard_regno_nregs (regno,
>  			    GET_MODE (cfa_base_preserved_val->locs->loc));
> +
> +      /* If cfa_base is sp + const_int, need to preserve also the
> +	 SP_DERIVED_VALUE_P value.  */
> +      for (struct elt_loc_list *l = cfa_base_preserved_val->locs;
> +	   l; l = l->next)
> +	if (GET_CODE (l->loc) == PLUS
> +	    && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +	    && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +	    && CONST_INT_P (XEXP (l->loc, 1)))
> +	  {
> +	    if (! invariant_or_equiv_p (CSELIB_VAL_PTR (XEXP (l->loc, 0))))
> +	      {
> +		rtx val = cfa_base_preserved_val->val_rtx;
> +		rtx_insn *save_cselib_current_insn = cselib_current_insn;
> +		cselib_current_insn = l->setting_insn;
> +		new_elt_loc_list (CSELIB_VAL_PTR (XEXP (l->loc, 0)),
> +				  plus_constant (Pmode, val,
> +						 -UINTVAL (XEXP (l->loc, 1))));
> +		cselib_current_insn = save_cselib_current_insn;
> +	      }
> +	    break;
> +	  }
>      }
>    else
>      {
> @@ -541,8 +574,7 @@ cselib_reset_table (unsigned int num)
>      }
>  
>    if (cselib_preserve_constants)
> -    cselib_hash_table->traverse <void *, preserve_constants_and_equivs>
> -      (NULL);
> +    cselib_hash_table->traverse <void *, preserve_constants_and_equivs> (NULL);
>    else
>      {
>        cselib_hash_table->empty ();
> @@ -799,33 +831,66 @@ autoinc_split (rtx x, rtx *off, machine_
>      {
>      case PLUS:
>        *off = XEXP (x, 1);
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_DEC:
>        if (memmode == VOIDmode)
>  	return x;
>  
>        *off = gen_int_mode (-GET_MODE_SIZE (memmode), GET_MODE (x));
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_INC:
>        if (memmode == VOIDmode)
>  	return x;
>  
>        *off = gen_int_mode (GET_MODE_SIZE (memmode), GET_MODE (x));
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      case PRE_MODIFY:
> -      return XEXP (x, 1);
> +      x = XEXP (x, 1);
> +      break;
>  
>      case POST_DEC:
>      case POST_INC:
>      case POST_MODIFY:
> -      return XEXP (x, 0);
> +      x = XEXP (x, 0);
> +      break;
>  
>      default:
> -      return x;
> +      break;
>      }
> +
> +  if (GET_MODE (x) == Pmode
> +      && (REG_P (x) || MEM_P (x) || GET_CODE (x) == VALUE)
> +      && (*off == NULL_RTX || CONST_INT_P (*off)))
> +    {
> +      cselib_val *e;
> +      if (GET_CODE (x) == VALUE)
> +	e = CSELIB_VAL_PTR (x);
> +      else
> +	e = cselib_lookup (x, GET_MODE (x), 0, memmode);
> +      if (e)
> +	for (struct elt_loc_list *l = e->locs; l; l = l->next)
> +	  if (GET_CODE (l->loc) == PLUS
> +	      && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +	      && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +	      && CONST_INT_P (XEXP (l->loc, 1)))
> +	    {
> +	      if (*off == NULL_RTX)
> +		*off = XEXP (l->loc, 1);
> +	      else
> +		*off = plus_constant (Pmode, *off,
> +				      INTVAL (XEXP (l->loc, 1)));
> +	      if (*off == const0_rtx)
> +		*off = NULL_RTX;
> +	      return XEXP (l->loc, 0);
> +	    }
> +    }
> +  return x;
>  }
>  
>  /* Return nonzero if we can prove that X and Y contain the same value,
> @@ -868,6 +933,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        if (GET_CODE (y) == VALUE)
>  	return e == canonical_cselib_val (CSELIB_VAL_PTR (y));
>  
> +      if ((SP_DERIVED_VALUE_P (x)
> +	   || SP_DERIVED_VALUE_P (e->val_rtx))
> +	  && GET_MODE (y) == Pmode)
> +	{
> +	  rtx yoff = NULL;
> +	  rtx yr = autoinc_split (y, &yoff, memmode);
> +	  if ((yr == x || yr == e->val_rtx) && yoff == NULL_RTX)
> +	    return 1;
> +	}
> +
>        if (depth == 128)
>  	return 0;
>  
> @@ -891,6 +966,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (y));
>        struct elt_loc_list *l;
>  
> +      if ((SP_DERIVED_VALUE_P (y)
> +	   || SP_DERIVED_VALUE_P (e->val_rtx))
> +	  && GET_MODE (x) == Pmode)
> +	{
> +	  rtx xoff = NULL;
> +	  rtx xr = autoinc_split (x, &xoff, memmode);
> +	  if ((xr == y || xr == e->val_rtx) && xoff == NULL_RTX)
> +	    return 1;
> +	}
> +
>        if (depth == 128)
>  	return 0;
>  
> @@ -910,7 +995,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>    if (GET_MODE (x) != GET_MODE (y))
>      return 0;
>  
> -  if (GET_CODE (x) != GET_CODE (y))
> +  if (GET_CODE (x) != GET_CODE (y)
> +      || (GET_CODE (x) == PLUS
> +	  && GET_MODE (x) == Pmode
> +	  && CONST_INT_P (XEXP (x, 1))
> +	  && CONST_INT_P (XEXP (y, 1))))
>      {
>        rtx xorig = x, yorig = y;
>        rtx xoff = NULL, yoff = NULL;
> @@ -918,17 +1007,20 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>        x = autoinc_split (x, &xoff, memmode);
>        y = autoinc_split (y, &yoff, memmode);
>  
> -      if (!xoff != !yoff)
> -	return 0;
> -
> -      if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
> -	return 0;
> -
>        /* Don't recurse if nothing changed.  */
>        if (x != xorig || y != yorig)
> -	return rtx_equal_for_cselib_1 (x, y, memmode, depth);
> +	{
> +	  if (!xoff != !yoff)
> +	    return 0;
>  
> -      return 0;
> +	  if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
> +	    return 0;
> +
> +	  return rtx_equal_for_cselib_1 (x, y, memmode, depth);
> +	}
> +
> +      if (GET_CODE (xorig) != GET_CODE (yorig))
> +	return 0;
>      }
>  
>    /* These won't be handled correctly by the code below.  */
> @@ -1042,6 +1134,41 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
>    return 1;
>  }
>  
> +/* Helper function for cselib_hash_rtx.  Arguments like for cselib_hash_rtx,
> +   except that it hashes (plus:P x c).  */
> +
> +static unsigned int
> +cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int create,
> +			    machine_mode memmode)
> +{
> +  cselib_val *e = cselib_lookup (x, GET_MODE (x), create, memmode);
> +  if (! e)
> +    return 0;
> +
> +  if (! SP_DERIVED_VALUE_P (e->val_rtx))
> +    for (struct elt_loc_list *l = e->locs; l; l = l->next)
> +      if (GET_CODE (l->loc) == PLUS
> +	  && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +	  && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +	  && CONST_INT_P (XEXP (l->loc, 1)))
> +	{
> +	  e = CSELIB_VAL_PTR (XEXP (l->loc, 0));
> +	  c = trunc_int_for_mode (c + UINTVAL (XEXP (l->loc, 1)), Pmode);
> +	  break;
> +	}
> +  if (c == 0)
> +    return e->hash;
> +
> +  unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x);
> +  hash += e->hash;
> +  unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode;
> +  tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c;
> +  if (tem_hash == 0)
> +    tem_hash = (unsigned int) CONST_INT;
> +  hash += tem_hash;
> +  return hash ? hash : 1 + (unsigned int) PLUS;
> +}
> +
>  /* Hash an rtx.  Return 0 if we couldn't hash the rtx.
>     For registers and memory locations, we look up their cselib_val structure
>     and return its VALUE element.
> @@ -1209,10 +1336,21 @@ cselib_hash_rtx (rtx x, int create, mach
>  	offset = -offset;
>        /* Adjust the hash so that (mem:MEMMODE (pre_* (reg))) hashes
>  	 like (mem:MEMMODE (plus (reg) (const_int I))).  */
> -      hash += (unsigned) PLUS - (unsigned)code
> -	+ cselib_hash_rtx (XEXP (x, 0), create, memmode)
> -	+ cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
> -			   create, memmode);
> +      if (GET_MODE (x) == Pmode
> +	  && (REG_P (XEXP (x, 0))
> +	      || MEM_P (XEXP (x, 0))
> +	      || GET_CODE (XEXP (x, 0)) == VALUE))
> +	{
> +	  HOST_WIDE_INT c;
> +	  if (offset.is_constant (&c))
> +	    return cselib_hash_plus_const_int (XEXP (x, 0),
> +					       trunc_int_for_mode (c, Pmode),
> +					       create, memmode);
> +	}
> +      hash = ((unsigned) PLUS + (unsigned) GET_MODE (x)
> +	      + cselib_hash_rtx (XEXP (x, 0), create, memmode)
> +	      + cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
> +				 create, memmode));
>        return hash ? hash : 1 + (unsigned) PLUS;
>  
>      case PRE_MODIFY:
> @@ -1237,6 +1375,16 @@ cselib_hash_rtx (rtx x, int create, mach
>  
>        break;
>  
> +    case PLUS:
> +      if (GET_MODE (x) == Pmode
> +	  && (REG_P (XEXP (x, 0))
> +	      || MEM_P (XEXP (x, 0))
> +	      || GET_CODE (XEXP (x, 0)) == VALUE)
> +	  && CONST_INT_P (XEXP (x, 1)))
> +	return cselib_hash_plus_const_int (XEXP (x, 0), INTVAL (XEXP (x, 1)),
> +					   create, memmode);
> +      break;
> +
>      default:
>        break;
>      }
> @@ -1927,6 +2075,26 @@ cselib_subst_to_values (rtx x, machine_m
>        gcc_assert (memmode != VOIDmode);
>        return cselib_subst_to_values (XEXP (x, 0), memmode);
>  
> +    case PLUS:
> +      if (GET_MODE (x) == Pmode && CONST_INT_P (XEXP (x, 1)))
> +	{
> +	  rtx t = cselib_subst_to_values (XEXP (x, 0), memmode);
> +	  if (GET_CODE (t) == VALUE)
> +	    for (struct elt_loc_list *l = CSELIB_VAL_PTR (t)->locs;
> +		 l; l = l->next)
> +	      if (GET_CODE (l->loc) == PLUS
> +		  && GET_CODE (XEXP (l->loc, 0)) == VALUE
> +		  && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
> +		  && CONST_INT_P (XEXP (l->loc, 1)))
> +		return plus_constant (Pmode, l->loc, INTVAL (XEXP (x, 1)));
> +	  if (t != XEXP (x, 0))
> +	    {
> +	      copy = shallow_copy_rtx (x);
> +	      XEXP (copy, 0) = t;
> +	    }
> +	  return copy;
> +	}
> +
>      default:
>        break;
>      }
> @@ -2030,6 +2198,8 @@ cselib_lookup_1 (rtx x, machine_mode mod
>  	}
>  
>        e = new_cselib_val (next_uid, GET_MODE (x), x);
> +      if (GET_MODE (x) == Pmode && x == stack_pointer_rtx)
> +	SP_DERIVED_VALUE_P (e->val_rtx) = 1;
>        new_elt_loc_list (e, x);
>  
>        scalar_int_mode int_mode;
> @@ -2107,7 +2277,14 @@ cselib_lookup_1 (rtx x, machine_mode mod
>       the hash table is inconsistent until we do so, and
>       cselib_subst_to_values will need to do lookups.  */
>    *slot = e;
> -  new_elt_loc_list (e, cselib_subst_to_values (x, memmode));
> +  rtx v = cselib_subst_to_values (x, memmode);
> +
> +  /* If cselib_preserve_constants, we might get a SP_DERIVED_VALUE_P
> +     VALUE that isn't in the hash tables anymore.  */
> +  if (GET_CODE (v) == VALUE && SP_DERIVED_VALUE_P (v) && PRESERVED_VALUE_P (v))
> +    PRESERVED_VALUE_P (e->val_rtx) = 1;
> +
> +  new_elt_loc_list (e, v);
>    return e;
>  }
>  
> --- gcc/var-tracking.c.jj	2020-03-26 09:15:29.825500110 +0100
> +++ gcc/var-tracking.c	2020-04-01 19:09:30.145421270 +0200
> @@ -10045,19 +10045,20 @@ vt_initialize (void)
>        preserve_value (val);
>        if (reg != hard_frame_pointer_rtx && fixed_regs[REGNO (reg)])
>  	cselib_preserve_cfa_base_value (val, REGNO (reg));
> -      expr = plus_constant (GET_MODE (stack_pointer_rtx),
> -			    stack_pointer_rtx, -ofst);
> -      cselib_add_permanent_equiv (val, expr, get_insns ());
> -
>        if (ofst)
>  	{
> -	  val = cselib_lookup_from_insn (stack_pointer_rtx,
> -					 GET_MODE (stack_pointer_rtx), 1,
> -					 VOIDmode, get_insns ());
> -	  preserve_value (val);
> +	  cselib_val *valsp
> +	    = cselib_lookup_from_insn (stack_pointer_rtx,
> +				       GET_MODE (stack_pointer_rtx), 1,
> +				       VOIDmode, get_insns ());
> +	  preserve_value (valsp);
>  	  expr = plus_constant (GET_MODE (reg), reg, ofst);
> -	  cselib_add_permanent_equiv (val, expr, get_insns ());
> +	  cselib_add_permanent_equiv (valsp, expr, get_insns ());
>  	}
> +
> +      expr = plus_constant (GET_MODE (stack_pointer_rtx),
> +			    stack_pointer_rtx, -ofst);
> +      cselib_add_permanent_equiv (val, expr, get_insns ());
>      }
>  
>    /* In order to factor out the adjustments made to the stack pointer or to
> --- gcc/dse.c.jj	2020-03-03 11:04:46.365821937 +0100
> +++ gcc/dse.c	2020-04-02 10:48:08.042260842 +0200
> @@ -846,6 +846,17 @@ check_for_inc_dec_1 (insn_info_t insn_in
>    if (note)
>      return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
>  			     insn_info) == 0;
> +
> +  /* Punt on stack pushes, those don't have REG_INC notes and we are
> +     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +	return false;
> +    }
> +
>    return true;
>  }
>  
> @@ -866,6 +877,17 @@ check_for_inc_dec (rtx_insn *insn)
>    if (note)
>      return for_each_inc_dec (PATTERN (insn), emit_inc_dec_insn_before,
>  			     &insn_info) == 0;
> +
> +  /* Punt on stack pushes, those don't have REG_INC notes and we are
> +     unprepared to deal with distribution of REG_ARGS_SIZE notes etc.  */
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +	return false;
> +    }
> +
>    return true;
>  }
>  
> @@ -1980,16 +2002,30 @@ replace_read (store_info *store_info, in
>  	 point.  This does occasionally happen, see PR 37922.  */
>        bitmap regs_set = BITMAP_ALLOC (&reg_obstack);
>  
> -      for (this_insn = insns; this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn))
> -	note_stores (this_insn, look_for_hardregs, regs_set);
> +      for (this_insn = insns;
> +	   this_insn != NULL_RTX; this_insn = NEXT_INSN (this_insn))
> +	{
> +	  if (insn_invalid_p (this_insn, false))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +		{
> +		  fprintf (dump_file, " -- replacing the loaded MEM with ");
> +		  print_simple_rtl (dump_file, read_reg);
> +		  fprintf (dump_file, " led to an invalid instruction\n");
> +		}
> +	      BITMAP_FREE (regs_set);
> +	      return false;
> +	    }
> +	  note_stores (this_insn, look_for_hardregs, regs_set);
> +	}
>  
>        bitmap_and_into (regs_set, regs_live);
>        if (!bitmap_empty_p (regs_set))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    {
> -	      fprintf (dump_file,
> -		       "abandoning replacement because sequence clobbers live hardregs:");
> +	      fprintf (dump_file, "abandoning replacement because sequence "
> +				  "clobbers live hardregs:");
>  	      df_print_regset (dump_file, regs_set);
>  	    }
>  
> @@ -1999,6 +2035,19 @@ replace_read (store_info *store_info, in
>        BITMAP_FREE (regs_set);
>      }
>  
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
> +    {
> +      const_rtx x = *iter;
> +      if (GET_RTX_CLASS (GET_CODE (x)) == RTX_AUTOINC)
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file, " -- replacing the MEM failed due to address "
> +				"side-effects\n");
> +	  return false;
> +	}
> +    }
> +
>    if (validate_change (read_insn->insn, loc, read_reg, 0))
>      {
>        deferred_change *change = deferred_change_pool.allocate ();
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [committed] params: Decrease -param=max-find-base-term-values= default [PR92264]
  2020-04-02 10:44 ` Richard Biener
@ 2020-04-02 12:35   ` Jakub Jelinek
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Jelinek @ 2020-04-02 12:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, Apr 02, 2020 at 12:44:32PM +0200, Richard Biener wrote:
> > For the PR in question, my proposal would be to also lower
> > -param=max-find-base-term-values=
> > default from 2000 to 200 after this, at least in the above 4
> > bootstraps/regtests there is nothing that would ever result in
> > find_base_term returning non-NULL with more than 200 VALUEs being processed.
> 
> Sounds good to me.

Here is what I've committed after another bootstrap/regtest on x86_64-linux
and i686-linux.

2020-04-02  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/92264
	* params.opt (-param=max-find-base-term-values=): Decrease default
	from 2000 to 200.

--- gcc/params.opt.jj	2020-03-21 18:29:59.102158469 +0100
+++ gcc/params.opt	2020-04-02 13:05:14.433729117 +0200
@@ -663,7 +663,7 @@ Common Joined UInteger Var(param_max_var
 Max. size of var tracking hash tables.
 
 -param=max-find-base-term-values=
-Common Joined UInteger Var(param_max_find_base_term_values) Init(2000) Param Optimization
+Common Joined UInteger Var(param_max_find_base_term_values) Init(200) Param Optimization
 Maximum number of VALUEs handled during a single find_base_term call.
 
 -param=max-vrp-switch-assertions=


	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-02 12:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 10:29 [PATCH] cselib: Reuse VALUEs on sp adjustments [PR92264] Jakub Jelinek
2020-04-02 10:44 ` Richard Biener
2020-04-02 12:35   ` [committed] params: Decrease -param=max-find-base-term-values= default [PR92264] Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).