public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
@ 2019-04-11  7:50 Jakub Jelinek
  2019-04-11  7:58 ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2019-04-11  7:50 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, Eric Botcazou; +Cc: gcc-patches

Hi!

The following testcase is miscompiled, because the result of
a DImode (doubleword) right shift is used in a QImode subreg as well as
later on pushed into a stack slot as an argument to a const function
whose result isn't used.  The RA because of the weirdo target tuning
reuses the REG_EQUIV argument slot (as it wants the value in a non-Q_REGS
register), so it first pushes it to the stack slot and then loads from the
stack slot again (according to Vlad, this can happen with both LRA and old
reload).  Later on during DCE we determine the call is not needed and try to
find all the argument pushes and don't mark those as needed, so we
effectively DCE the right shift, push to the argument slot as well as
other slots and the call, and end up keeping just the load from the
uninitialized argument slot.

The following patch just punts if we find loads from stack slots in between
where they are pushed and the const/pure call.  In addition to that, I've
outlined the same largish sequence that had 3 copies in the function already
and I'd need to add 4th copy, so in the end the dce.c changes are removing
more than adding:
 1 file changed, 118 insertions(+), 135 deletions(-)

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

During those 2 bootstraps/regtests, data.load_found has been set just
on the new testcase on ia32.

2019-04-11  Jakub Jelinek  <jakub@redhat.com>
	
	PR rtl-optimization/89965
	* dce.c: Include rtl-iter.h.
	(sp_based_mem_offset): New function.
	(struct check_argument_load_data): New type.
	(check_argument_load): New function.
	(find_call_stack_args): Use sp_based_mem_offset, check for loads
	from stack slots still tracked in sp_bytes and punt if any is
	found.

	* gcc.target/i386/pr89965.c: New test.

--- gcc/dce.c.jj	2019-02-15 00:11:13.209111382 +0100
+++ gcc/dce.c	2019-04-10 14:18:21.111763533 +0200
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "valtrack.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "rtl-iter.h"
 
 
 /* -------------------------------------------------------------------------
@@ -265,6 +266,100 @@ check_argument_store (HOST_WIDE_INT size
   return true;
 }
 
+/* If MEM has sp address, return 0, if it has sp + const address,
+   return that const, if it has reg address where reg is set to sp + const
+   and FAST is false, return const, otherwise return
+   INTTYPE_MINUMUM (HOST_WIDE_INT).  */
+
+static HOST_WIDE_INT
+sp_based_mem_offset (rtx_call_insn *call_insn, const_rtx mem, bool fast)
+{
+  HOST_WIDE_INT off = 0;
+  rtx addr = XEXP (mem, 0);
+  if (GET_CODE (addr) == PLUS
+      && REG_P (XEXP (addr, 0))
+      && CONST_INT_P (XEXP (addr, 1)))
+    {
+      off = INTVAL (XEXP (addr, 1));
+      addr = XEXP (addr, 0);
+    }
+  if (addr == stack_pointer_rtx)
+    return off;
+
+  if (!REG_P (addr) || fast)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  /* If not fast, use chains to see if addr wasn't set to sp + offset.  */
+  df_ref use;
+  FOR_EACH_INSN_USE (use, call_insn)
+  if (rtx_equal_p (addr, DF_REF_REG (use)))
+    break;
+
+  if (use == NULL)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  struct df_link *defs;
+  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+    if (! DF_REF_IS_ARTIFICIAL (defs->ref))
+      break;
+
+  if (defs == NULL)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  rtx set = single_set (DF_REF_INSN (defs->ref));
+  if (!set)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  if (GET_CODE (SET_SRC (set)) != PLUS
+      || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
+      || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  off += INTVAL (XEXP (SET_SRC (set), 1));
+  return off;
+}
+
+/* Data for check_argument_load called via note_uses.  */
+struct check_argument_load_data {
+  bitmap sp_bytes;
+  HOST_WIDE_INT min_sp_off, max_sp_off;
+  rtx_call_insn *call_insn;
+  bool fast;
+  bool load_found;
+};
+
+/* Helper function for find_call_stack_args.  Check if there are
+   any loads from the argument slots in between the const/pure call
+   and store to the argument slot, set LOAD_FOUND if any is found.  */
+
+static void
+check_argument_load (rtx *loc, void *data)
+{
+  struct check_argument_load_data *d
+    = (struct check_argument_load_data *) data;
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
+    {
+      const_rtx mem = *iter;
+      HOST_WIDE_INT size;
+      if (MEM_P (mem)
+	  && MEM_SIZE_KNOWN_P (mem)
+	  && MEM_SIZE (mem).is_constant (&size))
+	{
+	  HOST_WIDE_INT off = sp_based_mem_offset (d->call_insn, mem, d->fast);
+	  if (off != INTTYPE_MINIMUM (HOST_WIDE_INT)
+	      && off < d->max_sp_off
+	      && off + size > d->min_sp_off)
+	    for (HOST_WIDE_INT byte = MAX (off, d->min_sp_off);
+		 byte < MIN (off + size, d->max_sp_off); byte++)
+	      if (bitmap_bit_p (d->sp_bytes, byte - d->min_sp_off))
+		{
+		  d->load_found = true;
+		  return;
+		}
+        }
+    }
+}
 
 /* Try to find all stack stores of CALL_INSN arguments if
    ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
@@ -302,58 +397,13 @@ find_call_stack_args (rtx_call_insn *cal
     if (GET_CODE (XEXP (p, 0)) == USE
 	&& MEM_P (XEXP (XEXP (p, 0), 0)))
       {
-	rtx mem = XEXP (XEXP (p, 0), 0), addr;
-	HOST_WIDE_INT off = 0, size;
+	rtx mem = XEXP (XEXP (p, 0), 0);
+	HOST_WIDE_INT size;
 	if (!MEM_SIZE_KNOWN_P (mem) || !MEM_SIZE (mem).is_constant (&size))
 	  return false;
-	addr = XEXP (mem, 0);
-	if (GET_CODE (addr) == PLUS
-	    && REG_P (XEXP (addr, 0))
-	    && CONST_INT_P (XEXP (addr, 1)))
-	  {
-	    off = INTVAL (XEXP (addr, 1));
-	    addr = XEXP (addr, 0);
-	  }
-	if (addr != stack_pointer_rtx)
-	  {
-	    if (!REG_P (addr))
-	      return false;
-	    /* If not fast, use chains to see if addr wasn't set to
-	       sp + offset.  */
-	    if (!fast)
-	      {
-		df_ref use;
-		struct df_link *defs;
-		rtx set;
-
-		FOR_EACH_INSN_USE (use, call_insn)
-		  if (rtx_equal_p (addr, DF_REF_REG (use)))
-		    break;
-
-		if (use == NULL)
-		  return false;
-
-		for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-		  if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		    break;
-
-		if (defs == NULL)
-		  return false;
-
-		set = single_set (DF_REF_INSN (defs->ref));
-		if (!set)
-		  return false;
-
-		if (GET_CODE (SET_SRC (set)) != PLUS
-		    || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
-		    || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
-		  return false;
-
-		off += INTVAL (XEXP (SET_SRC (set), 1));
-	      }
-	    else
-	      return false;
-	  }
+        HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+        if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
+	  return false;
 	min_sp_off = MIN (min_sp_off, off);
 	max_sp_off = MAX (max_sp_off, off + size);
       }
@@ -369,51 +419,24 @@ find_call_stack_args (rtx_call_insn *cal
     if (GET_CODE (XEXP (p, 0)) == USE
 	&& MEM_P (XEXP (XEXP (p, 0), 0)))
       {
-	rtx mem = XEXP (XEXP (p, 0), 0), addr;
-	HOST_WIDE_INT off = 0, byte, size;
+	rtx mem = XEXP (XEXP (p, 0), 0);
 	/* Checked in the previous iteration.  */
-	size = MEM_SIZE (mem).to_constant ();
-	addr = XEXP (mem, 0);
-	if (GET_CODE (addr) == PLUS
-	    && REG_P (XEXP (addr, 0))
-	    && CONST_INT_P (XEXP (addr, 1)))
-	  {
-	    off = INTVAL (XEXP (addr, 1));
-	    addr = XEXP (addr, 0);
-	  }
-	if (addr != stack_pointer_rtx)
-	  {
-	    df_ref use;
-	    struct df_link *defs;
-	    rtx set;
-
-	    FOR_EACH_INSN_USE (use, call_insn)
-	      if (rtx_equal_p (addr, DF_REF_REG (use)))
-		break;
-
-	    for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-	      if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		break;
-
-	    set = single_set (DF_REF_INSN (defs->ref));
-	    off += INTVAL (XEXP (SET_SRC (set), 1));
-	  }
-	for (byte = off; byte < off + size; byte++)
-	  {
-	    if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
-	      gcc_unreachable ();
-	  }
+	HOST_WIDE_INT size = MEM_SIZE (mem).to_constant ();
+	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+	gcc_checking_assert (off != INTTYPE_MINIMUM (HOST_WIDE_INT));
+	for (HOST_WIDE_INT byte = off; byte < off + size; byte++)
+	  if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
+	    gcc_unreachable ();
       }
 
   /* Walk backwards, looking for argument stores.  The search stops
      when seeing another call, sp adjustment or memory store other than
      argument store.  */
+  struct check_argument_load_data data
+    = { sp_bytes, min_sp_off, max_sp_off, call_insn, fast, false };
   ret = false;
   for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
     {
-      rtx set, mem, addr;
-      HOST_WIDE_INT off;
-
       if (insn == BB_HEAD (BLOCK_FOR_INSN (call_insn)))
 	prev_insn = NULL;
       else
@@ -425,61 +448,21 @@ find_call_stack_args (rtx_call_insn *cal
       if (!NONDEBUG_INSN_P (insn))
 	continue;
 
-      set = single_set (insn);
+      rtx set = single_set (insn);
       if (!set || SET_DEST (set) == stack_pointer_rtx)
 	break;
 
+      note_uses (&PATTERN (insn), check_argument_load, &data);
+      if (data.load_found)
+	break;
+
       if (!MEM_P (SET_DEST (set)))
 	continue;
 
-      mem = SET_DEST (set);
-      addr = XEXP (mem, 0);
-      off = 0;
-      if (GET_CODE (addr) == PLUS
-	  && REG_P (XEXP (addr, 0))
-	  && CONST_INT_P (XEXP (addr, 1)))
-	{
-	  off = INTVAL (XEXP (addr, 1));
-	  addr = XEXP (addr, 0);
-	}
-      if (addr != stack_pointer_rtx)
-	{
-	  if (!REG_P (addr))
-	    break;
-	  if (!fast)
-	    {
-	      df_ref use;
-	      struct df_link *defs;
-	      rtx set;
-
-	      FOR_EACH_INSN_USE (use, insn)
-		if (rtx_equal_p (addr, DF_REF_REG (use)))
-		  break;
-
-	      if (use == NULL)
-		break;
-
-	      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-		if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		  break;
-
-	      if (defs == NULL)
-		break;
-
-	      set = single_set (DF_REF_INSN (defs->ref));
-	      if (!set)
-		break;
-
-	      if (GET_CODE (SET_SRC (set)) != PLUS
-		  || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
-		  || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
-		break;
-
-	      off += INTVAL (XEXP (SET_SRC (set), 1));
-	    }
-	  else
-	    break;
-	}
+      rtx mem = SET_DEST (set);
+      HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+      if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
+	break;
 
       HOST_WIDE_INT size;
       if (!MEM_SIZE_KNOWN_P (mem)
--- gcc/testsuite/gcc.target/i386/pr89965.c.jj	2019-04-10 14:04:07.708517723 +0200
+++ gcc/testsuite/gcc.target/i386/pr89965.c	2019-04-10 14:09:55.300911221 +0200
@@ -0,0 +1,39 @@
+/* PR rtl-optimization/89965 */
+/* { dg-do run } */
+/* { dg-options "-O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter" } */
+/* { dg-additional-options "-march=i386" { target ia32 } } */
+
+int a;
+
+__attribute__ ((noipa)) unsigned long long
+foo (unsigned char c, unsigned d, unsigned e, unsigned long long f,
+     unsigned char g, unsigned h, unsigned long long i)
+{
+  (void) d;
+  unsigned short j = __builtin_mul_overflow_p (~0, h, c);
+  e <<= e;
+  i >>= 7;
+  c *= i;
+  i /= 12;
+  a = __builtin_popcount (c);
+  __builtin_add_overflow (e, a, &f);
+  return c + f + g + j + h;
+}
+
+__attribute__ ((noipa)) void
+bar (void)
+{
+  char buf[64];
+  __builtin_memset (buf, 0x55, sizeof buf);
+  asm volatile ("" : : "r" (&buf[0]) : "memory");
+}
+
+int
+main (void)
+{
+  bar ();
+  unsigned long long x = foo (2, 0, 0, 0, 0, 0, 0);
+  if (x != 0)
+    __builtin_abort ();
+  return 0;
+}

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-11  7:50 [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jakub Jelinek
@ 2019-04-11  7:58 ` Richard Biener
  2019-04-11  8:48   ` Jakub Jelinek
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Richard Biener @ 2019-04-11  7:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Eric Botcazou, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 12884 bytes --]

On Thu, 11 Apr 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because the result of
> a DImode (doubleword) right shift is used in a QImode subreg as well as
> later on pushed into a stack slot as an argument to a const function
> whose result isn't used.  The RA because of the weirdo target tuning
> reuses the REG_EQUIV argument slot (as it wants the value in a non-Q_REGS
> register), so it first pushes it to the stack slot and then loads from the
> stack slot again (according to Vlad, this can happen with both LRA and old
> reload).  Later on during DCE we determine the call is not needed and try to
> find all the argument pushes and don't mark those as needed, so we
> effectively DCE the right shift, push to the argument slot as well as
> other slots and the call, and end up keeping just the load from the
> uninitialized argument slot.
> 
> The following patch just punts if we find loads from stack slots in between
> where they are pushed and the const/pure call.  In addition to that, I've
> outlined the same largish sequence that had 3 copies in the function already
> and I'd need to add 4th copy, so in the end the dce.c changes are removing
> more than adding:
>  1 file changed, 118 insertions(+), 135 deletions(-)

The refactoring itself is probably OK (and if done separately
makes reviewing easier).

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> During those 2 bootstraps/regtests, data.load_found has been set just
> on the new testcase on ia32.

Hmm, I wonder whether we really need to DCE calls after reload?
That said, I'm not familiar enough with the code to check if the
patch makes sense (can there ever be uses of the argument slots
_after_ the call?).

Richard.

> 2019-04-11  Jakub Jelinek  <jakub@redhat.com>
> 	
> 	PR rtl-optimization/89965
> 	* dce.c: Include rtl-iter.h.
> 	(sp_based_mem_offset): New function.
> 	(struct check_argument_load_data): New type.
> 	(check_argument_load): New function.
> 	(find_call_stack_args): Use sp_based_mem_offset, check for loads
> 	from stack slots still tracked in sp_bytes and punt if any is
> 	found.
> 
> 	* gcc.target/i386/pr89965.c: New test.
> 
> --- gcc/dce.c.jj	2019-02-15 00:11:13.209111382 +0100
> +++ gcc/dce.c	2019-04-10 14:18:21.111763533 +0200
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
>  #include "valtrack.h"
>  #include "tree-pass.h"
>  #include "dbgcnt.h"
> +#include "rtl-iter.h"
>  
>  
>  /* -------------------------------------------------------------------------
> @@ -265,6 +266,100 @@ check_argument_store (HOST_WIDE_INT size
>    return true;
>  }
>  
> +/* If MEM has sp address, return 0, if it has sp + const address,
> +   return that const, if it has reg address where reg is set to sp + const
> +   and FAST is false, return const, otherwise return
> +   INTTYPE_MINUMUM (HOST_WIDE_INT).  */
> +
> +static HOST_WIDE_INT
> +sp_based_mem_offset (rtx_call_insn *call_insn, const_rtx mem, bool fast)
> +{
> +  HOST_WIDE_INT off = 0;
> +  rtx addr = XEXP (mem, 0);
> +  if (GET_CODE (addr) == PLUS
> +      && REG_P (XEXP (addr, 0))
> +      && CONST_INT_P (XEXP (addr, 1)))
> +    {
> +      off = INTVAL (XEXP (addr, 1));
> +      addr = XEXP (addr, 0);
> +    }
> +  if (addr == stack_pointer_rtx)
> +    return off;
> +
> +  if (!REG_P (addr) || fast)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  /* If not fast, use chains to see if addr wasn't set to sp + offset.  */
> +  df_ref use;
> +  FOR_EACH_INSN_USE (use, call_insn)
> +  if (rtx_equal_p (addr, DF_REF_REG (use)))
> +    break;
> +
> +  if (use == NULL)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  struct df_link *defs;
> +  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> +    if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> +      break;
> +
> +  if (defs == NULL)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  rtx set = single_set (DF_REF_INSN (defs->ref));
> +  if (!set)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  if (GET_CODE (SET_SRC (set)) != PLUS
> +      || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> +      || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  off += INTVAL (XEXP (SET_SRC (set), 1));
> +  return off;
> +}
> +
> +/* Data for check_argument_load called via note_uses.  */
> +struct check_argument_load_data {
> +  bitmap sp_bytes;
> +  HOST_WIDE_INT min_sp_off, max_sp_off;
> +  rtx_call_insn *call_insn;
> +  bool fast;
> +  bool load_found;
> +};
> +
> +/* Helper function for find_call_stack_args.  Check if there are
> +   any loads from the argument slots in between the const/pure call
> +   and store to the argument slot, set LOAD_FOUND if any is found.  */
> +
> +static void
> +check_argument_load (rtx *loc, void *data)
> +{
> +  struct check_argument_load_data *d
> +    = (struct check_argument_load_data *) data;
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
> +    {
> +      const_rtx mem = *iter;
> +      HOST_WIDE_INT size;
> +      if (MEM_P (mem)
> +	  && MEM_SIZE_KNOWN_P (mem)
> +	  && MEM_SIZE (mem).is_constant (&size))
> +	{
> +	  HOST_WIDE_INT off = sp_based_mem_offset (d->call_insn, mem, d->fast);
> +	  if (off != INTTYPE_MINIMUM (HOST_WIDE_INT)
> +	      && off < d->max_sp_off
> +	      && off + size > d->min_sp_off)
> +	    for (HOST_WIDE_INT byte = MAX (off, d->min_sp_off);
> +		 byte < MIN (off + size, d->max_sp_off); byte++)
> +	      if (bitmap_bit_p (d->sp_bytes, byte - d->min_sp_off))
> +		{
> +		  d->load_found = true;
> +		  return;
> +		}
> +        }
> +    }
> +}
>  
>  /* Try to find all stack stores of CALL_INSN arguments if
>     ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
> @@ -302,58 +397,13 @@ find_call_stack_args (rtx_call_insn *cal
>      if (GET_CODE (XEXP (p, 0)) == USE
>  	&& MEM_P (XEXP (XEXP (p, 0), 0)))
>        {
> -	rtx mem = XEXP (XEXP (p, 0), 0), addr;
> -	HOST_WIDE_INT off = 0, size;
> +	rtx mem = XEXP (XEXP (p, 0), 0);
> +	HOST_WIDE_INT size;
>  	if (!MEM_SIZE_KNOWN_P (mem) || !MEM_SIZE (mem).is_constant (&size))
>  	  return false;
> -	addr = XEXP (mem, 0);
> -	if (GET_CODE (addr) == PLUS
> -	    && REG_P (XEXP (addr, 0))
> -	    && CONST_INT_P (XEXP (addr, 1)))
> -	  {
> -	    off = INTVAL (XEXP (addr, 1));
> -	    addr = XEXP (addr, 0);
> -	  }
> -	if (addr != stack_pointer_rtx)
> -	  {
> -	    if (!REG_P (addr))
> -	      return false;
> -	    /* If not fast, use chains to see if addr wasn't set to
> -	       sp + offset.  */
> -	    if (!fast)
> -	      {
> -		df_ref use;
> -		struct df_link *defs;
> -		rtx set;
> -
> -		FOR_EACH_INSN_USE (use, call_insn)
> -		  if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		    break;
> -
> -		if (use == NULL)
> -		  return false;
> -
> -		for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -		  if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		    break;
> -
> -		if (defs == NULL)
> -		  return false;
> -
> -		set = single_set (DF_REF_INSN (defs->ref));
> -		if (!set)
> -		  return false;
> -
> -		if (GET_CODE (SET_SRC (set)) != PLUS
> -		    || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> -		    || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> -		  return false;
> -
> -		off += INTVAL (XEXP (SET_SRC (set), 1));
> -	      }
> -	    else
> -	      return false;
> -	  }
> +        HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +        if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
> +	  return false;
>  	min_sp_off = MIN (min_sp_off, off);
>  	max_sp_off = MAX (max_sp_off, off + size);
>        }
> @@ -369,51 +419,24 @@ find_call_stack_args (rtx_call_insn *cal
>      if (GET_CODE (XEXP (p, 0)) == USE
>  	&& MEM_P (XEXP (XEXP (p, 0), 0)))
>        {
> -	rtx mem = XEXP (XEXP (p, 0), 0), addr;
> -	HOST_WIDE_INT off = 0, byte, size;
> +	rtx mem = XEXP (XEXP (p, 0), 0);
>  	/* Checked in the previous iteration.  */
> -	size = MEM_SIZE (mem).to_constant ();
> -	addr = XEXP (mem, 0);
> -	if (GET_CODE (addr) == PLUS
> -	    && REG_P (XEXP (addr, 0))
> -	    && CONST_INT_P (XEXP (addr, 1)))
> -	  {
> -	    off = INTVAL (XEXP (addr, 1));
> -	    addr = XEXP (addr, 0);
> -	  }
> -	if (addr != stack_pointer_rtx)
> -	  {
> -	    df_ref use;
> -	    struct df_link *defs;
> -	    rtx set;
> -
> -	    FOR_EACH_INSN_USE (use, call_insn)
> -	      if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		break;
> -
> -	    for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -	      if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		break;
> -
> -	    set = single_set (DF_REF_INSN (defs->ref));
> -	    off += INTVAL (XEXP (SET_SRC (set), 1));
> -	  }
> -	for (byte = off; byte < off + size; byte++)
> -	  {
> -	    if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
> -	      gcc_unreachable ();
> -	  }
> +	HOST_WIDE_INT size = MEM_SIZE (mem).to_constant ();
> +	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +	gcc_checking_assert (off != INTTYPE_MINIMUM (HOST_WIDE_INT));
> +	for (HOST_WIDE_INT byte = off; byte < off + size; byte++)
> +	  if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
> +	    gcc_unreachable ();
>        }
>  
>    /* Walk backwards, looking for argument stores.  The search stops
>       when seeing another call, sp adjustment or memory store other than
>       argument store.  */
> +  struct check_argument_load_data data
> +    = { sp_bytes, min_sp_off, max_sp_off, call_insn, fast, false };
>    ret = false;
>    for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
>      {
> -      rtx set, mem, addr;
> -      HOST_WIDE_INT off;
> -
>        if (insn == BB_HEAD (BLOCK_FOR_INSN (call_insn)))
>  	prev_insn = NULL;
>        else
> @@ -425,61 +448,21 @@ find_call_stack_args (rtx_call_insn *cal
>        if (!NONDEBUG_INSN_P (insn))
>  	continue;
>  
> -      set = single_set (insn);
> +      rtx set = single_set (insn);
>        if (!set || SET_DEST (set) == stack_pointer_rtx)
>  	break;
>  
> +      note_uses (&PATTERN (insn), check_argument_load, &data);
> +      if (data.load_found)
> +	break;
> +
>        if (!MEM_P (SET_DEST (set)))
>  	continue;
>  
> -      mem = SET_DEST (set);
> -      addr = XEXP (mem, 0);
> -      off = 0;
> -      if (GET_CODE (addr) == PLUS
> -	  && REG_P (XEXP (addr, 0))
> -	  && CONST_INT_P (XEXP (addr, 1)))
> -	{
> -	  off = INTVAL (XEXP (addr, 1));
> -	  addr = XEXP (addr, 0);
> -	}
> -      if (addr != stack_pointer_rtx)
> -	{
> -	  if (!REG_P (addr))
> -	    break;
> -	  if (!fast)
> -	    {
> -	      df_ref use;
> -	      struct df_link *defs;
> -	      rtx set;
> -
> -	      FOR_EACH_INSN_USE (use, insn)
> -		if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		  break;
> -
> -	      if (use == NULL)
> -		break;
> -
> -	      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -		if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		  break;
> -
> -	      if (defs == NULL)
> -		break;
> -
> -	      set = single_set (DF_REF_INSN (defs->ref));
> -	      if (!set)
> -		break;
> -
> -	      if (GET_CODE (SET_SRC (set)) != PLUS
> -		  || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> -		  || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> -		break;
> -
> -	      off += INTVAL (XEXP (SET_SRC (set), 1));
> -	    }
> -	  else
> -	    break;
> -	}
> +      rtx mem = SET_DEST (set);
> +      HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +      if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
> +	break;
>  
>        HOST_WIDE_INT size;
>        if (!MEM_SIZE_KNOWN_P (mem)
> --- gcc/testsuite/gcc.target/i386/pr89965.c.jj	2019-04-10 14:04:07.708517723 +0200
> +++ gcc/testsuite/gcc.target/i386/pr89965.c	2019-04-10 14:09:55.300911221 +0200
> @@ -0,0 +1,39 @@
> +/* PR rtl-optimization/89965 */
> +/* { dg-do run } */
> +/* { dg-options "-O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter" } */
> +/* { dg-additional-options "-march=i386" { target ia32 } } */
> +
> +int a;
> +
> +__attribute__ ((noipa)) unsigned long long
> +foo (unsigned char c, unsigned d, unsigned e, unsigned long long f,
> +     unsigned char g, unsigned h, unsigned long long i)
> +{
> +  (void) d;
> +  unsigned short j = __builtin_mul_overflow_p (~0, h, c);
> +  e <<= e;
> +  i >>= 7;
> +  c *= i;
> +  i /= 12;
> +  a = __builtin_popcount (c);
> +  __builtin_add_overflow (e, a, &f);
> +  return c + f + g + j + h;
> +}
> +
> +__attribute__ ((noipa)) void
> +bar (void)
> +{
> +  char buf[64];
> +  __builtin_memset (buf, 0x55, sizeof buf);
> +  asm volatile ("" : : "r" (&buf[0]) : "memory");
> +}
> +
> +int
> +main (void)
> +{
> +  bar ();
> +  unsigned long long x = foo (2, 0, 0, 0, 0, 0, 0);
> +  if (x != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-11  7:58 ` Richard Biener
@ 2019-04-11  8:48   ` Jakub Jelinek
  2019-04-11  8:55     ` Richard Biener
  2019-04-11  9:05   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2) Jakub Jelinek
  2019-04-12 15:29   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jeff Law
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2019-04-11  8:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Eric Botcazou, gcc-patches

On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
> > The following patch just punts if we find loads from stack slots in between
> > where they are pushed and the const/pure call.  In addition to that, I've
> > outlined the same largish sequence that had 3 copies in the function already
> > and I'd need to add 4th copy, so in the end the dce.c changes are removing
> > more than adding:
> >  1 file changed, 118 insertions(+), 135 deletions(-)
> 
> The refactoring itself is probably OK (and if done separately
> makes reviewing easier).

Here is just the refactoring, ok for trunk?

2019-04-11  Jakub Jelinek  <jakub@redhat.com>
	
	PR rtl-optimization/89965
	* dce.c (sp_based_mem_offset): New function.
	(find_call_stack_args): Use sp_based_mem_offset.

--- gcc/dce.c.jj	2019-04-11 10:26:18.509365653 +0200
+++ gcc/dce.c	2019-04-11 10:30:00.436705065 +0200
@@ -272,6 +272,58 @@ check_argument_store (HOST_WIDE_INT size
   return true;
 }
 
+/* If MEM has sp address, return 0, if it has sp + const address,
+   return that const, if it has reg address where reg is set to sp + const
+   and FAST is false, return const, otherwise return
+   INTTYPE_MINUMUM (HOST_WIDE_INT).  */
+
+static HOST_WIDE_INT
+sp_based_mem_offset (rtx_call_insn *call_insn, const_rtx mem, bool fast)
+{
+  HOST_WIDE_INT off = 0;
+  rtx addr = XEXP (mem, 0);
+  if (GET_CODE (addr) == PLUS
+      && REG_P (XEXP (addr, 0))
+      && CONST_INT_P (XEXP (addr, 1)))
+    {
+      off = INTVAL (XEXP (addr, 1));
+      addr = XEXP (addr, 0);
+    }
+  if (addr == stack_pointer_rtx)
+    return off;
+
+  if (!REG_P (addr) || fast)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  /* If not fast, use chains to see if addr wasn't set to sp + offset.  */
+  df_ref use;
+  FOR_EACH_INSN_USE (use, call_insn)
+  if (rtx_equal_p (addr, DF_REF_REG (use)))
+    break;
+
+  if (use == NULL)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  struct df_link *defs;
+  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+    if (! DF_REF_IS_ARTIFICIAL (defs->ref))
+      break;
+
+  if (defs == NULL)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  rtx set = single_set (DF_REF_INSN (defs->ref));
+  if (!set)
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  if (GET_CODE (SET_SRC (set)) != PLUS
+      || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
+      || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
+    return INTTYPE_MINIMUM (HOST_WIDE_INT);
+
+  off += INTVAL (XEXP (SET_SRC (set), 1));
+  return off;
+}
 
 /* Try to find all stack stores of CALL_INSN arguments if
    ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
@@ -309,58 +361,13 @@ find_call_stack_args (rtx_call_insn *cal
     if (GET_CODE (XEXP (p, 0)) == USE
 	&& MEM_P (XEXP (XEXP (p, 0), 0)))
       {
-	rtx mem = XEXP (XEXP (p, 0), 0), addr;
-	HOST_WIDE_INT off = 0, size;
+	rtx mem = XEXP (XEXP (p, 0), 0);
+	HOST_WIDE_INT size;
 	if (!MEM_SIZE_KNOWN_P (mem) || !MEM_SIZE (mem).is_constant (&size))
 	  return false;
-	addr = XEXP (mem, 0);
-	if (GET_CODE (addr) == PLUS
-	    && REG_P (XEXP (addr, 0))
-	    && CONST_INT_P (XEXP (addr, 1)))
-	  {
-	    off = INTVAL (XEXP (addr, 1));
-	    addr = XEXP (addr, 0);
-	  }
-	if (addr != stack_pointer_rtx)
-	  {
-	    if (!REG_P (addr))
-	      return false;
-	    /* If not fast, use chains to see if addr wasn't set to
-	       sp + offset.  */
-	    if (!fast)
-	      {
-		df_ref use;
-		struct df_link *defs;
-		rtx set;
-
-		FOR_EACH_INSN_USE (use, call_insn)
-		  if (rtx_equal_p (addr, DF_REF_REG (use)))
-		    break;
-
-		if (use == NULL)
-		  return false;
-
-		for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-		  if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		    break;
-
-		if (defs == NULL)
-		  return false;
-
-		set = single_set (DF_REF_INSN (defs->ref));
-		if (!set)
-		  return false;
-
-		if (GET_CODE (SET_SRC (set)) != PLUS
-		    || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
-		    || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
-		  return false;
-
-		off += INTVAL (XEXP (SET_SRC (set), 1));
-	      }
-	    else
-	      return false;
-	  }
+	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+	if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
+	  return false;
 	min_sp_off = MIN (min_sp_off, off);
 	max_sp_off = MAX (max_sp_off, off + size);
       }
@@ -376,51 +383,19 @@ find_call_stack_args (rtx_call_insn *cal
     if (GET_CODE (XEXP (p, 0)) == USE
 	&& MEM_P (XEXP (XEXP (p, 0), 0)))
       {
-	rtx mem = XEXP (XEXP (p, 0), 0), addr;
-	HOST_WIDE_INT off = 0, byte, size;
+	rtx mem = XEXP (XEXP (p, 0), 0);
 	/* Checked in the previous iteration.  */
-	size = MEM_SIZE (mem).to_constant ();
-	addr = XEXP (mem, 0);
-	if (GET_CODE (addr) == PLUS
-	    && REG_P (XEXP (addr, 0))
-	    && CONST_INT_P (XEXP (addr, 1)))
-	  {
-	    off = INTVAL (XEXP (addr, 1));
-	    addr = XEXP (addr, 0);
-	  }
-	if (addr != stack_pointer_rtx)
-	  {
-	    df_ref use;
-	    struct df_link *defs;
-	    rtx set;
-
-	    FOR_EACH_INSN_USE (use, call_insn)
-	      if (rtx_equal_p (addr, DF_REF_REG (use)))
-		break;
-
-	    for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-	      if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		break;
-
-	    set = single_set (DF_REF_INSN (defs->ref));
-	    off += INTVAL (XEXP (SET_SRC (set), 1));
-	  }
-	for (byte = off; byte < off + size; byte++)
-	  {
-	    if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
-	      gcc_unreachable ();
-	  }
+	HOST_WIDE_INT size = MEM_SIZE (mem).to_constant ();
+	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+	gcc_checking_assert (off != INTTYPE_MINIMUM (HOST_WIDE_INT));
+	for (HOST_WIDE_INT byte = off; byte < off + size; byte++)
+	  if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
+	    gcc_unreachable ();
       }
 
-  /* Walk backwards, looking for argument stores.  The search stops
-     when seeing another call, sp adjustment or memory store other than
-     argument store.  */
   ret = false;
   for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
     {
-      rtx set, mem, addr;
-      HOST_WIDE_INT off;
-
       if (insn == BB_HEAD (BLOCK_FOR_INSN (call_insn)))
 	prev_insn = NULL;
       else
@@ -432,61 +407,17 @@ find_call_stack_args (rtx_call_insn *cal
       if (!NONDEBUG_INSN_P (insn))
 	continue;
 
-      set = single_set (insn);
+      rtx set = single_set (insn);
       if (!set || SET_DEST (set) == stack_pointer_rtx)
 	break;
 
       if (!MEM_P (SET_DEST (set)))
 	continue;
 
-      mem = SET_DEST (set);
-      addr = XEXP (mem, 0);
-      off = 0;
-      if (GET_CODE (addr) == PLUS
-	  && REG_P (XEXP (addr, 0))
-	  && CONST_INT_P (XEXP (addr, 1)))
-	{
-	  off = INTVAL (XEXP (addr, 1));
-	  addr = XEXP (addr, 0);
-	}
-      if (addr != stack_pointer_rtx)
-	{
-	  if (!REG_P (addr))
-	    break;
-	  if (!fast)
-	    {
-	      df_ref use;
-	      struct df_link *defs;
-	      rtx set;
-
-	      FOR_EACH_INSN_USE (use, insn)
-		if (rtx_equal_p (addr, DF_REF_REG (use)))
-		  break;
-
-	      if (use == NULL)
-		break;
-
-	      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
-		if (! DF_REF_IS_ARTIFICIAL (defs->ref))
-		  break;
-
-	      if (defs == NULL)
-		break;
-
-	      set = single_set (DF_REF_INSN (defs->ref));
-	      if (!set)
-		break;
-
-	      if (GET_CODE (SET_SRC (set)) != PLUS
-		  || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
-		  || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
-		break;
-
-	      off += INTVAL (XEXP (SET_SRC (set), 1));
-	    }
-	  else
-	    break;
-	}
+      rtx mem = SET_DEST (set);
+      HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
+      if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
+	break;
 
       HOST_WIDE_INT size;
       if (!MEM_SIZE_KNOWN_P (mem)


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-11  8:48   ` Jakub Jelinek
@ 2019-04-11  8:55     ` Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2019-04-11  8:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Eric Botcazou, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8530 bytes --]

On Thu, 11 Apr 2019, Jakub Jelinek wrote:

> On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
> > > The following patch just punts if we find loads from stack slots in between
> > > where they are pushed and the const/pure call.  In addition to that, I've
> > > outlined the same largish sequence that had 3 copies in the function already
> > > and I'd need to add 4th copy, so in the end the dce.c changes are removing
> > > more than adding:
> > >  1 file changed, 118 insertions(+), 135 deletions(-)
> > 
> > The refactoring itself is probably OK (and if done separately
> > makes reviewing easier).
> 
> Here is just the refactoring, ok for trunk?

OK.

Thanks,
Richard.

> 2019-04-11  Jakub Jelinek  <jakub@redhat.com>
> 	
> 	PR rtl-optimization/89965
> 	* dce.c (sp_based_mem_offset): New function.
> 	(find_call_stack_args): Use sp_based_mem_offset.
> 
> --- gcc/dce.c.jj	2019-04-11 10:26:18.509365653 +0200
> +++ gcc/dce.c	2019-04-11 10:30:00.436705065 +0200
> @@ -272,6 +272,58 @@ check_argument_store (HOST_WIDE_INT size
>    return true;
>  }
>  
> +/* If MEM has sp address, return 0, if it has sp + const address,
> +   return that const, if it has reg address where reg is set to sp + const
> +   and FAST is false, return const, otherwise return
> +   INTTYPE_MINUMUM (HOST_WIDE_INT).  */
> +
> +static HOST_WIDE_INT
> +sp_based_mem_offset (rtx_call_insn *call_insn, const_rtx mem, bool fast)
> +{
> +  HOST_WIDE_INT off = 0;
> +  rtx addr = XEXP (mem, 0);
> +  if (GET_CODE (addr) == PLUS
> +      && REG_P (XEXP (addr, 0))
> +      && CONST_INT_P (XEXP (addr, 1)))
> +    {
> +      off = INTVAL (XEXP (addr, 1));
> +      addr = XEXP (addr, 0);
> +    }
> +  if (addr == stack_pointer_rtx)
> +    return off;
> +
> +  if (!REG_P (addr) || fast)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  /* If not fast, use chains to see if addr wasn't set to sp + offset.  */
> +  df_ref use;
> +  FOR_EACH_INSN_USE (use, call_insn)
> +  if (rtx_equal_p (addr, DF_REF_REG (use)))
> +    break;
> +
> +  if (use == NULL)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  struct df_link *defs;
> +  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> +    if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> +      break;
> +
> +  if (defs == NULL)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  rtx set = single_set (DF_REF_INSN (defs->ref));
> +  if (!set)
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  if (GET_CODE (SET_SRC (set)) != PLUS
> +      || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> +      || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> +    return INTTYPE_MINIMUM (HOST_WIDE_INT);
> +
> +  off += INTVAL (XEXP (SET_SRC (set), 1));
> +  return off;
> +}
>  
>  /* Try to find all stack stores of CALL_INSN arguments if
>     ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
> @@ -309,58 +361,13 @@ find_call_stack_args (rtx_call_insn *cal
>      if (GET_CODE (XEXP (p, 0)) == USE
>  	&& MEM_P (XEXP (XEXP (p, 0), 0)))
>        {
> -	rtx mem = XEXP (XEXP (p, 0), 0), addr;
> -	HOST_WIDE_INT off = 0, size;
> +	rtx mem = XEXP (XEXP (p, 0), 0);
> +	HOST_WIDE_INT size;
>  	if (!MEM_SIZE_KNOWN_P (mem) || !MEM_SIZE (mem).is_constant (&size))
>  	  return false;
> -	addr = XEXP (mem, 0);
> -	if (GET_CODE (addr) == PLUS
> -	    && REG_P (XEXP (addr, 0))
> -	    && CONST_INT_P (XEXP (addr, 1)))
> -	  {
> -	    off = INTVAL (XEXP (addr, 1));
> -	    addr = XEXP (addr, 0);
> -	  }
> -	if (addr != stack_pointer_rtx)
> -	  {
> -	    if (!REG_P (addr))
> -	      return false;
> -	    /* If not fast, use chains to see if addr wasn't set to
> -	       sp + offset.  */
> -	    if (!fast)
> -	      {
> -		df_ref use;
> -		struct df_link *defs;
> -		rtx set;
> -
> -		FOR_EACH_INSN_USE (use, call_insn)
> -		  if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		    break;
> -
> -		if (use == NULL)
> -		  return false;
> -
> -		for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -		  if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		    break;
> -
> -		if (defs == NULL)
> -		  return false;
> -
> -		set = single_set (DF_REF_INSN (defs->ref));
> -		if (!set)
> -		  return false;
> -
> -		if (GET_CODE (SET_SRC (set)) != PLUS
> -		    || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> -		    || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> -		  return false;
> -
> -		off += INTVAL (XEXP (SET_SRC (set), 1));
> -	      }
> -	    else
> -	      return false;
> -	  }
> +	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +	if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
> +	  return false;
>  	min_sp_off = MIN (min_sp_off, off);
>  	max_sp_off = MAX (max_sp_off, off + size);
>        }
> @@ -376,51 +383,19 @@ find_call_stack_args (rtx_call_insn *cal
>      if (GET_CODE (XEXP (p, 0)) == USE
>  	&& MEM_P (XEXP (XEXP (p, 0), 0)))
>        {
> -	rtx mem = XEXP (XEXP (p, 0), 0), addr;
> -	HOST_WIDE_INT off = 0, byte, size;
> +	rtx mem = XEXP (XEXP (p, 0), 0);
>  	/* Checked in the previous iteration.  */
> -	size = MEM_SIZE (mem).to_constant ();
> -	addr = XEXP (mem, 0);
> -	if (GET_CODE (addr) == PLUS
> -	    && REG_P (XEXP (addr, 0))
> -	    && CONST_INT_P (XEXP (addr, 1)))
> -	  {
> -	    off = INTVAL (XEXP (addr, 1));
> -	    addr = XEXP (addr, 0);
> -	  }
> -	if (addr != stack_pointer_rtx)
> -	  {
> -	    df_ref use;
> -	    struct df_link *defs;
> -	    rtx set;
> -
> -	    FOR_EACH_INSN_USE (use, call_insn)
> -	      if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		break;
> -
> -	    for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -	      if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		break;
> -
> -	    set = single_set (DF_REF_INSN (defs->ref));
> -	    off += INTVAL (XEXP (SET_SRC (set), 1));
> -	  }
> -	for (byte = off; byte < off + size; byte++)
> -	  {
> -	    if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
> -	      gcc_unreachable ();
> -	  }
> +	HOST_WIDE_INT size = MEM_SIZE (mem).to_constant ();
> +	HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +	gcc_checking_assert (off != INTTYPE_MINIMUM (HOST_WIDE_INT));
> +	for (HOST_WIDE_INT byte = off; byte < off + size; byte++)
> +	  if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
> +	    gcc_unreachable ();
>        }
>  
> -  /* Walk backwards, looking for argument stores.  The search stops
> -     when seeing another call, sp adjustment or memory store other than
> -     argument store.  */
>    ret = false;
>    for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
>      {
> -      rtx set, mem, addr;
> -      HOST_WIDE_INT off;
> -
>        if (insn == BB_HEAD (BLOCK_FOR_INSN (call_insn)))
>  	prev_insn = NULL;
>        else
> @@ -432,61 +407,17 @@ find_call_stack_args (rtx_call_insn *cal
>        if (!NONDEBUG_INSN_P (insn))
>  	continue;
>  
> -      set = single_set (insn);
> +      rtx set = single_set (insn);
>        if (!set || SET_DEST (set) == stack_pointer_rtx)
>  	break;
>  
>        if (!MEM_P (SET_DEST (set)))
>  	continue;
>  
> -      mem = SET_DEST (set);
> -      addr = XEXP (mem, 0);
> -      off = 0;
> -      if (GET_CODE (addr) == PLUS
> -	  && REG_P (XEXP (addr, 0))
> -	  && CONST_INT_P (XEXP (addr, 1)))
> -	{
> -	  off = INTVAL (XEXP (addr, 1));
> -	  addr = XEXP (addr, 0);
> -	}
> -      if (addr != stack_pointer_rtx)
> -	{
> -	  if (!REG_P (addr))
> -	    break;
> -	  if (!fast)
> -	    {
> -	      df_ref use;
> -	      struct df_link *defs;
> -	      rtx set;
> -
> -	      FOR_EACH_INSN_USE (use, insn)
> -		if (rtx_equal_p (addr, DF_REF_REG (use)))
> -		  break;
> -
> -	      if (use == NULL)
> -		break;
> -
> -	      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
> -		if (! DF_REF_IS_ARTIFICIAL (defs->ref))
> -		  break;
> -
> -	      if (defs == NULL)
> -		break;
> -
> -	      set = single_set (DF_REF_INSN (defs->ref));
> -	      if (!set)
> -		break;
> -
> -	      if (GET_CODE (SET_SRC (set)) != PLUS
> -		  || XEXP (SET_SRC (set), 0) != stack_pointer_rtx
> -		  || !CONST_INT_P (XEXP (SET_SRC (set), 1)))
> -		break;
> -
> -	      off += INTVAL (XEXP (SET_SRC (set), 1));
> -	    }
> -	  else
> -	    break;
> -	}
> +      rtx mem = SET_DEST (set);
> +      HOST_WIDE_INT off = sp_based_mem_offset (call_insn, mem, fast);
> +      if (off == INTTYPE_MINIMUM (HOST_WIDE_INT))
> +	break;
>  
>        HOST_WIDE_INT size;
>        if (!MEM_SIZE_KNOWN_P (mem)
> 
> 
> 	Jakub
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2)
  2019-04-11  7:58 ` Richard Biener
  2019-04-11  8:48   ` Jakub Jelinek
@ 2019-04-11  9:05   ` Jakub Jelinek
  2019-04-11  9:54     ` Richard Biener
  2019-04-12 16:10     ` Jeff Law
  2019-04-12 15:29   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jeff Law
  2 siblings, 2 replies; 16+ messages in thread
From: Jakub Jelinek @ 2019-04-11  9:05 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, Eric Botcazou; +Cc: gcc-patches

On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > During those 2 bootstraps/regtests, data.load_found has been set just
> > on the new testcase on ia32.
> 
> Hmm, I wonder whether we really need to DCE calls after reload?
> That said, I'm not familiar enough with the code to check if the
> patch makes sense (can there ever be uses of the argument slots
> _after_ the call?).

And here is the patch on top of the refactoring patch.
As for the argument slots after the call, hope Jeff and Vlad won't mind if I
copy'n'paste what they said on IRC yesterday:
law: vmakarov: in PR89965, is it valid RTL to store something in stack argument slot and read it again before the call that uses that stack slot?
law: vmakarov: if yes, I have a rough idea what to do, but if stack argument slots may be only stored and can be read only by a callee and not by something else in the caller, then it is a RA issue
<vmakarov> jakub: i think it is not defined (or described). But the old reload used equiv memory for long time to store value in memory.  LRA just uses the same approach.
<law> i think you're allowed to read it up to the call point, who "owns" the contents of the slot after the call point is subject to debate :-)
<law> not sure if we defined things once a REG_EQUIV is on the MEM, but that would tend to imply the memory is unchanging across the call
<law> (though one could ask how in the world the caller would know the callee didn't clobber the slot....)

2019-04-11  Jakub Jelinek  <jakub@redhat.com>
	
	PR rtl-optimization/89965
	* dce.c: Include rtl-iter.h.
	(struct check_argument_load_data): New type.
	(check_argument_load): New function.
	(find_call_stack_args): Check for loads from stack slots still tracked
	in sp_bytes and punt if any is found.

	* gcc.target/i386/pr89965.c: New test.

--- gcc/dce.c.jj	2019-04-11 10:30:00.436705065 +0200
+++ gcc/dce.c	2019-04-11 10:43:00.926828390 +0200
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "valtrack.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "rtl-iter.h"
 
 
 /* -------------------------------------------------------------------------
@@ -325,6 +326,48 @@ sp_based_mem_offset (rtx_call_insn *call
   return off;
 }
 
+/* Data for check_argument_load called via note_uses.  */
+struct check_argument_load_data {
+  bitmap sp_bytes;
+  HOST_WIDE_INT min_sp_off, max_sp_off;
+  rtx_call_insn *call_insn;
+  bool fast;
+  bool load_found;
+};
+
+/* Helper function for find_call_stack_args.  Check if there are
+   any loads from the argument slots in between the const/pure call
+   and store to the argument slot, set LOAD_FOUND if any is found.  */
+
+static void
+check_argument_load (rtx *loc, void *data)
+{
+  struct check_argument_load_data *d
+    = (struct check_argument_load_data *) data;
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
+    {
+      const_rtx mem = *iter;
+      HOST_WIDE_INT size;
+      if (MEM_P (mem)
+	  && MEM_SIZE_KNOWN_P (mem)
+	  && MEM_SIZE (mem).is_constant (&size))
+	{
+	  HOST_WIDE_INT off = sp_based_mem_offset (d->call_insn, mem, d->fast);
+	  if (off != INTTYPE_MINIMUM (HOST_WIDE_INT)
+	      && off < d->max_sp_off
+	      && off + size > d->min_sp_off)
+	    for (HOST_WIDE_INT byte = MAX (off, d->min_sp_off);
+		 byte < MIN (off + size, d->max_sp_off); byte++)
+	      if (bitmap_bit_p (d->sp_bytes, byte - d->min_sp_off))
+		{
+		  d->load_found = true;
+		  return;
+		}
+	}
+    }
+}
+
 /* Try to find all stack stores of CALL_INSN arguments if
    ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
    and it is therefore safe to eliminate the call, return true,
@@ -396,6 +439,8 @@ find_call_stack_args (rtx_call_insn *cal
   /* Walk backwards, looking for argument stores.  The search stops
      when seeing another call, sp adjustment or memory store other than
      argument store.  */
+  struct check_argument_load_data data
+    = { sp_bytes, min_sp_off, max_sp_off, call_insn, fast, false };
   ret = false;
   for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
     {
@@ -414,6 +459,10 @@ find_call_stack_args (rtx_call_insn *cal
       if (!set || SET_DEST (set) == stack_pointer_rtx)
 	break;
 
+      note_uses (&PATTERN (insn), check_argument_load, &data);
+      if (data.load_found)
+	break;
+
       if (!MEM_P (SET_DEST (set)))
 	continue;
 
--- gcc/testsuite/gcc.target/i386/pr89965.c.jj	2019-04-11 10:28:56.211764660 +0200
+++ gcc/testsuite/gcc.target/i386/pr89965.c	2019-04-11 10:28:56.211764660 +0200
@@ -0,0 +1,39 @@
+/* PR rtl-optimization/89965 */
+/* { dg-do run } */
+/* { dg-options "-O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter" } */
+/* { dg-additional-options "-march=i386" { target ia32 } } */
+
+int a;
+
+__attribute__ ((noipa)) unsigned long long
+foo (unsigned char c, unsigned d, unsigned e, unsigned long long f,
+     unsigned char g, unsigned h, unsigned long long i)
+{
+  (void) d;
+  unsigned short j = __builtin_mul_overflow_p (~0, h, c);
+  e <<= e;
+  i >>= 7;
+  c *= i;
+  i /= 12;
+  a = __builtin_popcount (c);
+  __builtin_add_overflow (e, a, &f);
+  return c + f + g + j + h;
+}
+
+__attribute__ ((noipa)) void
+bar (void)
+{
+  char buf[64];
+  __builtin_memset (buf, 0x55, sizeof buf);
+  asm volatile ("" : : "r" (&buf[0]) : "memory");
+}
+
+int
+main (void)
+{
+  bar ();
+  unsigned long long x = foo (2, 0, 0, 0, 0, 0, 0);
+  if (x != 0)
+    __builtin_abort ();
+  return 0;
+}

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2)
  2019-04-11  9:05   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2) Jakub Jelinek
@ 2019-04-11  9:54     ` Richard Biener
  2019-04-12 16:08       ` Jeff Law
  2019-04-12 16:10     ` Jeff Law
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Biener @ 2019-04-11  9:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Eric Botcazou, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6833 bytes --]

On Thu, 11 Apr 2019, Jakub Jelinek wrote:

> On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > 
> > > During those 2 bootstraps/regtests, data.load_found has been set just
> > > on the new testcase on ia32.
> > 
> > Hmm, I wonder whether we really need to DCE calls after reload?
> > That said, I'm not familiar enough with the code to check if the
> > patch makes sense (can there ever be uses of the argument slots
> > _after_ the call?).
> 
> And here is the patch on top of the refactoring patch.
> As for the argument slots after the call, hope Jeff and Vlad won't mind if I
> copy'n'paste what they said on IRC yesterday:
> law: vmakarov: in PR89965, is it valid RTL to store something in stack argument slot and read it again before the call that uses that stack slot?
> law: vmakarov: if yes, I have a rough idea what to do, but if stack argument slots may be only stored and can be read only by a callee and not by something else in the caller, then it is a RA issue
> <vmakarov> jakub: i think it is not defined (or described). But the old reload used equiv memory for long time to store value in memory.  LRA just uses the same approach.
> <law> i think you're allowed to read it up to the call point, who "owns" the contents of the slot after the call point is subject to debate :-)
> <law> not sure if we defined things once a REG_EQUIV is on the MEM, but that would tend to imply the memory is unchanging across the call
> <law> (though one could ask how in the world the caller would know the callee didn't clobber the slot....)

Ok, so that says "well, nothing prevents it from being used" for example
for a const/pure call (depending on what alias analysis does here...).
Who owns the argument slots should be defined by the ABI I guess.

So iff alias-analysis doesn't leave us with the chance to pick up
the stack slot contents after the call in any case we're of course fine.
And if we do have the chance but should not then the call instruction
needs some explicit clobbering of the argument area (or the RTL IL
needs to be documented that such clobbering is implicit and the
alias machinery then needs to honor that).

Still leaving review of the patch to somebody else (it's clearly
closing at least part of the hole).

Richard.

> 2019-04-11  Jakub Jelinek  <jakub@redhat.com>
> 	
> 	PR rtl-optimization/89965
> 	* dce.c: Include rtl-iter.h.
> 	(struct check_argument_load_data): New type.
> 	(check_argument_load): New function.
> 	(find_call_stack_args): Check for loads from stack slots still tracked
> 	in sp_bytes and punt if any is found.
> 
> 	* gcc.target/i386/pr89965.c: New test.
> 
> --- gcc/dce.c.jj	2019-04-11 10:30:00.436705065 +0200
> +++ gcc/dce.c	2019-04-11 10:43:00.926828390 +0200
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
>  #include "valtrack.h"
>  #include "tree-pass.h"
>  #include "dbgcnt.h"
> +#include "rtl-iter.h"
>  
>  
>  /* -------------------------------------------------------------------------
> @@ -325,6 +326,48 @@ sp_based_mem_offset (rtx_call_insn *call
>    return off;
>  }
>  
> +/* Data for check_argument_load called via note_uses.  */
> +struct check_argument_load_data {
> +  bitmap sp_bytes;
> +  HOST_WIDE_INT min_sp_off, max_sp_off;
> +  rtx_call_insn *call_insn;
> +  bool fast;
> +  bool load_found;
> +};
> +
> +/* Helper function for find_call_stack_args.  Check if there are
> +   any loads from the argument slots in between the const/pure call
> +   and store to the argument slot, set LOAD_FOUND if any is found.  */
> +
> +static void
> +check_argument_load (rtx *loc, void *data)
> +{
> +  struct check_argument_load_data *d
> +    = (struct check_argument_load_data *) data;
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, *loc, NONCONST)
> +    {
> +      const_rtx mem = *iter;
> +      HOST_WIDE_INT size;
> +      if (MEM_P (mem)
> +	  && MEM_SIZE_KNOWN_P (mem)
> +	  && MEM_SIZE (mem).is_constant (&size))
> +	{
> +	  HOST_WIDE_INT off = sp_based_mem_offset (d->call_insn, mem, d->fast);
> +	  if (off != INTTYPE_MINIMUM (HOST_WIDE_INT)
> +	      && off < d->max_sp_off
> +	      && off + size > d->min_sp_off)
> +	    for (HOST_WIDE_INT byte = MAX (off, d->min_sp_off);
> +		 byte < MIN (off + size, d->max_sp_off); byte++)
> +	      if (bitmap_bit_p (d->sp_bytes, byte - d->min_sp_off))
> +		{
> +		  d->load_found = true;
> +		  return;
> +		}
> +	}
> +    }
> +}
> +
>  /* Try to find all stack stores of CALL_INSN arguments if
>     ACCUMULATE_OUTGOING_ARGS.  If all stack stores have been found
>     and it is therefore safe to eliminate the call, return true,
> @@ -396,6 +439,8 @@ find_call_stack_args (rtx_call_insn *cal
>    /* Walk backwards, looking for argument stores.  The search stops
>       when seeing another call, sp adjustment or memory store other than
>       argument store.  */
> +  struct check_argument_load_data data
> +    = { sp_bytes, min_sp_off, max_sp_off, call_insn, fast, false };
>    ret = false;
>    for (insn = PREV_INSN (call_insn); insn; insn = prev_insn)
>      {
> @@ -414,6 +459,10 @@ find_call_stack_args (rtx_call_insn *cal
>        if (!set || SET_DEST (set) == stack_pointer_rtx)
>  	break;
>  
> +      note_uses (&PATTERN (insn), check_argument_load, &data);
> +      if (data.load_found)
> +	break;
> +
>        if (!MEM_P (SET_DEST (set)))
>  	continue;
>  
> --- gcc/testsuite/gcc.target/i386/pr89965.c.jj	2019-04-11 10:28:56.211764660 +0200
> +++ gcc/testsuite/gcc.target/i386/pr89965.c	2019-04-11 10:28:56.211764660 +0200
> @@ -0,0 +1,39 @@
> +/* PR rtl-optimization/89965 */
> +/* { dg-do run } */
> +/* { dg-options "-O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter" } */
> +/* { dg-additional-options "-march=i386" { target ia32 } } */
> +
> +int a;
> +
> +__attribute__ ((noipa)) unsigned long long
> +foo (unsigned char c, unsigned d, unsigned e, unsigned long long f,
> +     unsigned char g, unsigned h, unsigned long long i)
> +{
> +  (void) d;
> +  unsigned short j = __builtin_mul_overflow_p (~0, h, c);
> +  e <<= e;
> +  i >>= 7;
> +  c *= i;
> +  i /= 12;
> +  a = __builtin_popcount (c);
> +  __builtin_add_overflow (e, a, &f);
> +  return c + f + g + j + h;
> +}
> +
> +__attribute__ ((noipa)) void
> +bar (void)
> +{
> +  char buf[64];
> +  __builtin_memset (buf, 0x55, sizeof buf);
> +  asm volatile ("" : : "r" (&buf[0]) : "memory");
> +}
> +
> +int
> +main (void)
> +{
> +  bar ();
> +  unsigned long long x = foo (2, 0, 0, 0, 0, 0, 0);
> +  if (x != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-11  7:58 ` Richard Biener
  2019-04-11  8:48   ` Jakub Jelinek
  2019-04-11  9:05   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2) Jakub Jelinek
@ 2019-04-12 15:29   ` Jeff Law
  2019-04-12 15:40     ` Alexander Monakov
  2 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2019-04-12 15:29 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: Eric Botcazou, gcc-patches

On 4/11/19 1:54 AM, Richard Biener wrote:
> On Thu, 11 Apr 2019, Jakub Jelinek wrote:
> 
>> Hi!
>>
>> The following testcase is miscompiled, because the result of
>> a DImode (doubleword) right shift is used in a QImode subreg as well as
>> later on pushed into a stack slot as an argument to a const function
>> whose result isn't used.  The RA because of the weirdo target tuning
>> reuses the REG_EQUIV argument slot (as it wants the value in a non-Q_REGS
>> register), so it first pushes it to the stack slot and then loads from the
>> stack slot again (according to Vlad, this can happen with both LRA and old
>> reload).  Later on during DCE we determine the call is not needed and try to
>> find all the argument pushes and don't mark those as needed, so we
>> effectively DCE the right shift, push to the argument slot as well as
>> other slots and the call, and end up keeping just the load from the
>> uninitialized argument slot.
>>
>> The following patch just punts if we find loads from stack slots in between
>> where they are pushed and the const/pure call.  In addition to that, I've
>> outlined the same largish sequence that had 3 copies in the function already
>> and I'd need to add 4th copy, so in the end the dce.c changes are removing
>> more than adding:
>>  1 file changed, 118 insertions(+), 135 deletions(-)
> 
> The refactoring itself is probably OK (and if done separately
> makes reviewing easier).
> 
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> During those 2 bootstraps/regtests, data.load_found has been set just
>> on the new testcase on ia32.
> 
> Hmm, I wonder whether we really need to DCE calls after reload?
> That said, I'm not familiar enough with the code to check if the
> patch makes sense (can there ever be uses of the argument slots
> _after_ the call?).
We've been through the "who owns the argument slots, caller or callee"
discussion a few times and I don't think we've ever reached any kind of
conclusion in the general case.

I think this case side-steps the general case.

We've got an argument slot for a const/pure call.  Because of the
const/pure designation the caller can assume the callee doesn't modify
the argument slot.  That may in turn allow the caller to place a
REG_EQUIV note on the store to the slot.

Once that happens we can replace one form with the other.  So we could
well end up with a use of the slot after the call.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-12 15:29   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jeff Law
@ 2019-04-12 15:40     ` Alexander Monakov
  2019-04-12 15:45       ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2019-04-12 15:40 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Jakub Jelinek, Eric Botcazou, gcc-patches

On Fri, 12 Apr 2019, Jeff Law wrote:
> We've been through the "who owns the argument slots, caller or callee"
> discussion a few times and I don't think we've ever reached any kind of
> conclusion in the general case.

Callee must own the slots for tail calls to be possible.

> I think this case side-steps the general case.
> 
> We've got an argument slot for a const/pure call.  Because of the
> const/pure designation the caller can assume the callee doesn't modify
> the argument slot.  That may in turn allow the caller to place a
> REG_EQUIV note on the store to the slot.

I don't think this follows. Imagine a pure foo tailcalling a pure bar.
To make the tailcall, foo may need to change some of its argument slots
to pass new arguments to bar.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-12 15:40     ` Alexander Monakov
@ 2019-04-12 15:45       ` Jeff Law
  2019-04-15 13:12         ` Michael Matz
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2019-04-12 15:45 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Richard Biener, Jakub Jelinek, Eric Botcazou, gcc-patches

On 4/12/19 9:32 AM, Alexander Monakov wrote:
> On Fri, 12 Apr 2019, Jeff Law wrote:
>> We've been through the "who owns the argument slots, caller or callee"
>> discussion a few times and I don't think we've ever reached any kind of
>> conclusion in the general case.
> 
> Callee must own the slots for tail calls to be possible.
> 
>> I think this case side-steps the general case.
>>
>> We've got an argument slot for a const/pure call.  Because of the
>> const/pure designation the caller can assume the callee doesn't modify
>> the argument slot.  That may in turn allow the caller to place a
>> REG_EQUIV note on the store to the slot.
> 
> I don't think this follows. Imagine a pure foo tailcalling a pure bar.
> To make the tailcall, foo may need to change some of its argument slots
> to pass new arguments to bar.
I'd claim that a pure/const call can't tail call another function as
that would potentially modify the argument slots.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2)
  2019-04-11  9:54     ` Richard Biener
@ 2019-04-12 16:08       ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2019-04-12 16:08 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: Eric Botcazou, gcc-patches

On 4/11/19 3:26 AM, Richard Biener wrote:
> On Thu, 11 Apr 2019, Jakub Jelinek wrote:
> 
>> On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>>>
>>>> During those 2 bootstraps/regtests, data.load_found has been set just
>>>> on the new testcase on ia32.
>>>
>>> Hmm, I wonder whether we really need to DCE calls after reload?
>>> That said, I'm not familiar enough with the code to check if the
>>> patch makes sense (can there ever be uses of the argument slots
>>> _after_ the call?).
>>
>> And here is the patch on top of the refactoring patch.
>> As for the argument slots after the call, hope Jeff and Vlad won't mind if I
>> copy'n'paste what they said on IRC yesterday:
>> law: vmakarov: in PR89965, is it valid RTL to store something in stack argument slot and read it again before the call that uses that stack slot?
>> law: vmakarov: if yes, I have a rough idea what to do, but if stack argument slots may be only stored and can be read only by a callee and not by something else in the caller, then it is a RA issue
>> <vmakarov> jakub: i think it is not defined (or described). But the old reload used equiv memory for long time to store value in memory.  LRA just uses the same approach.
>> <law> i think you're allowed to read it up to the call point, who "owns" the contents of the slot after the call point is subject to debate :-)
>> <law> not sure if we defined things once a REG_EQUIV is on the MEM, but that would tend to imply the memory is unchanging across the call
>> <law> (though one could ask how in the world the caller would know the callee didn't clobber the slot....)
> 
> Ok, so that says "well, nothing prevents it from being used" for example
> for a const/pure call (depending on what alias analysis does here...).
Agreed.


> Who owns the argument slots should be defined by the ABI I guess.
Agreed, but I'm not sure it's consistently defined by ABIs.


> 
> So iff alias-analysis doesn't leave us with the chance to pick up
> the stack slot contents after the call in any case we're of course fine.
> And if we do have the chance but should not then the call instruction
> needs some explicit clobbering of the argument area (or the RTL IL
> needs to be documented that such clobbering is implicit and the
> alias machinery then needs to honor that).
I suspect the only way we currently get into this state is around
const/pure calls.  One could imagine a day where we do strong alias
analysis across calls and could make more precise annotations about how
the callee uses/modifies specific memory locations.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2)
  2019-04-11  9:05   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2) Jakub Jelinek
  2019-04-11  9:54     ` Richard Biener
@ 2019-04-12 16:10     ` Jeff Law
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Law @ 2019-04-12 16:10 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Eric Botcazou; +Cc: gcc-patches

On 4/11/19 3:04 AM, Jakub Jelinek wrote:
> On Thu, Apr 11, 2019 at 09:54:24AM +0200, Richard Biener wrote:
>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>>
>>> During those 2 bootstraps/regtests, data.load_found has been set just
>>> on the new testcase on ia32.
>>
>> Hmm, I wonder whether we really need to DCE calls after reload?
>> That said, I'm not familiar enough with the code to check if the
>> patch makes sense (can there ever be uses of the argument slots
>> _after_ the call?).
> 
> And here is the patch on top of the refactoring patch.
> As for the argument slots after the call, hope Jeff and Vlad won't mind if I
> copy'n'paste what they said on IRC yesterday:
> law: vmakarov: in PR89965, is it valid RTL to store something in stack argument slot and read it again before the call that uses that stack slot?
> law: vmakarov: if yes, I have a rough idea what to do, but if stack argument slots may be only stored and can be read only by a callee and not by something else in the caller, then it is a RA issue
> <vmakarov> jakub: i think it is not defined (or described). But the old reload used equiv memory for long time to store value in memory.  LRA just uses the same approach.
> <law> i think you're allowed to read it up to the call point, who "owns" the contents of the slot after the call point is subject to debate :-)
> <law> not sure if we defined things once a REG_EQUIV is on the MEM, but that would tend to imply the memory is unchanging across the call
> <law> (though one could ask how in the world the caller would know the callee didn't clobber the slot....)
> 
> 2019-04-11  Jakub Jelinek  <jakub@redhat.com>
> 	
> 	PR rtl-optimization/89965
> 	* dce.c: Include rtl-iter.h.
> 	(struct check_argument_load_data): New type.
> 	(check_argument_load): New function.
> 	(find_call_stack_args): Check for loads from stack slots still tracked
> 	in sp_bytes and punt if any is found.
> 
> 	* gcc.target/i386/pr89965.c: New test.
OK.  I'd probably update this comment:

>   /* Walk backwards, looking for argument stores.  The search stops
>      when seeing another call, sp adjustment or memory store other than
>      argument store.  */

After this patch we'll also stop if we hit a load from an argument slot.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-12 15:45       ` Jeff Law
@ 2019-04-15 13:12         ` Michael Matz
  2019-04-16 15:35           ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2019-04-15 13:12 UTC (permalink / raw)
  To: Jeff Law
  Cc: Alexander Monakov, Richard Biener, Jakub Jelinek, Eric Botcazou,
	gcc-patches

Hi,

On Fri, 12 Apr 2019, Jeff Law wrote:

> > I don't think this follows. Imagine a pure foo tailcalling a pure bar.
> > To make the tailcall, foo may need to change some of its argument slots
> > to pass new arguments to bar.
> I'd claim that a pure/const call can't tail call another function as
> that would potentially modify the argument slots.

I still don't think that what you want follows.  Imagine this:

  int foo (int i) { ++i; return i; }

To claim that this function is anything else than const+pure seems weird 
(in fact this function doesn't access anything that must lie in memory at 
all).  Now take your off-the-mill ABI that passes arguments on stack, and 
an only slightly bad code generator, i.e. -O0 on i386.  You will get an 
modification of the argument slot:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        addl    $1, 8(%ebp)
        movl    8(%ebp), %eax
        popl    %ebp
        ret

So, if anything then the ownership of argument slots is a property of the 
psABI.  And while we may have been through this discussion a couple times 
over the years, I'm pretty sure that at least I consistently argued to 
declare all psABIs that leave argument slot ownerships with the callers 
(after the call actually happens) to be seriously broken^Wmisguided (and 
yes, also because it can prevent tail calls that otherwise would be 
perfectly valid).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-16 15:35           ` Jeff Law
@ 2019-04-16 15:35             ` Michael Matz
  2019-04-16 16:10               ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2019-04-16 15:35 UTC (permalink / raw)
  To: Jeff Law
  Cc: Alexander Monakov, Richard Biener, Jakub Jelinek, Eric Botcazou,
	gcc-patches

Hi,

On Tue, 16 Apr 2019, Jeff Law wrote:

> > I still don't think that what you want follows.  Imagine this:
> > 
> >   int foo (int i) { ++i; return i; }
> > 
> > To claim that this function is anything else than const+pure seems weird 
> > (in fact this function doesn't access anything that must lie in memory at 
> > all).  Now take your off-the-mill ABI that passes arguments on stack, and 
> > an only slightly bad code generator, i.e. -O0 on i386.  You will get an 
> > modification of the argument slot:
> Ugh.  Good point.  But then we're back to the problem with handling of
> REG_EQUIV notes for stores into the argument space.  If the callee owns,
> has been declared pure/const, but can clobber things like you've shown,
> then a REG_EQUIV note on those slots must be avoided.

Yes, if we want to be extra precise a REG_EQUAL note would be okay before 
the call insn (but not very useful post regalloc), but a REG_EQUIV note is 
plain wrong for argument slots if there are uses of the destination 
register after the call.

> Certainly wouldn't disagree that the ABIs ought to own defining this 
> stuff.  It's just an area where they've been weak in the past.

Oh yes, indeed.  Even relatively new psABIs don't specify this explicitely 
:-/


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-15 13:12         ` Michael Matz
@ 2019-04-16 15:35           ` Jeff Law
  2019-04-16 15:35             ` Michael Matz
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2019-04-16 15:35 UTC (permalink / raw)
  To: Michael Matz
  Cc: Alexander Monakov, Richard Biener, Jakub Jelinek, Eric Botcazou,
	gcc-patches

On 4/15/19 6:51 AM, Michael Matz wrote:
> Hi,
> 
> On Fri, 12 Apr 2019, Jeff Law wrote:
> 
>>> I don't think this follows. Imagine a pure foo tailcalling a pure bar.
>>> To make the tailcall, foo may need to change some of its argument slots
>>> to pass new arguments to bar.
>> I'd claim that a pure/const call can't tail call another function as
>> that would potentially modify the argument slots.
> 
> I still don't think that what you want follows.  Imagine this:
> 
>   int foo (int i) { ++i; return i; }
> 
> To claim that this function is anything else than const+pure seems weird 
> (in fact this function doesn't access anything that must lie in memory at 
> all).  Now take your off-the-mill ABI that passes arguments on stack, and 
> an only slightly bad code generator, i.e. -O0 on i386.  You will get an 
> modification of the argument slot:
Ugh.  Good point.  But then we're back to the problem with handling of
REG_EQUIV notes for stores into the argument space.  If the callee owns,
has been declared pure/const, but can clobber things like you've shown,
then a REG_EQUIV note on those slots must be avoided.

> 
> foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         addl    $1, 8(%ebp)
>         movl    8(%ebp), %eax
>         popl    %ebp
>         ret
> 
> So, if anything then the ownership of argument slots is a property of the 
> psABI.  And while we may have been through this discussion a couple times 
> over the years, I'm pretty sure that at least I consistently argued to 
> declare all psABIs that leave argument slot ownerships with the callers 
> (after the call actually happens) to be seriously broken^Wmisguided (and 
> yes, also because it can prevent tail calls that otherwise would be 
> perfectly valid).
Certainly wouldn't disagree that the ABIs ought to own defining this
stuff.  It's just an area where they've been weak in the past.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-16 15:35             ` Michael Matz
@ 2019-04-16 16:10               ` Jeff Law
  2019-04-17 11:53                 ` Michael Matz
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2019-04-16 16:10 UTC (permalink / raw)
  To: Michael Matz
  Cc: Alexander Monakov, Richard Biener, Jakub Jelinek, Eric Botcazou,
	gcc-patches

On 4/16/19 9:35 AM, Michael Matz wrote:
> Hi,
> 
> On Tue, 16 Apr 2019, Jeff Law wrote:
> 
>>> I still don't think that what you want follows.  Imagine this:
>>>
>>>   int foo (int i) { ++i; return i; }
>>>
>>> To claim that this function is anything else than const+pure seems weird 
>>> (in fact this function doesn't access anything that must lie in memory at 
>>> all).  Now take your off-the-mill ABI that passes arguments on stack, and 
>>> an only slightly bad code generator, i.e. -O0 on i386.  You will get an 
>>> modification of the argument slot:
>> Ugh.  Good point.  But then we're back to the problem with handling of
>> REG_EQUIV notes for stores into the argument space.  If the callee owns,
>> has been declared pure/const, but can clobber things like you've shown,
>> then a REG_EQUIV note on those slots must be avoided.
> 
> Yes, if we want to be extra precise a REG_EQUAL note would be okay before 
> the call insn (but not very useful post regalloc), but a REG_EQUIV note is 
> plain wrong for argument slots if there are uses of the destination 
> register after the call.
Right.  We could have a REG_EQUAL note, but I doubt it's helpful at all.


So going back to Jakub's patch...  I think the discussion points to
avoiding the REG_EQUIV notes for outgoing argument slots.

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965)
  2019-04-16 16:10               ` Jeff Law
@ 2019-04-17 11:53                 ` Michael Matz
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Matz @ 2019-04-17 11:53 UTC (permalink / raw)
  To: Jeff Law
  Cc: Alexander Monakov, Richard Biener, Jakub Jelinek, Eric Botcazou,
	gcc-patches

Hi,

On Tue, 16 Apr 2019, Jeff Law wrote:

> So going back to Jakub's patch...  I think the discussion points to 
> avoiding the REG_EQUIV notes for outgoing argument slots.

In the long run definitely, but maybe his current solution is more 
amenable to stage 4, no idea.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-04-17 11:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11  7:50 [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jakub Jelinek
2019-04-11  7:58 ` Richard Biener
2019-04-11  8:48   ` Jakub Jelinek
2019-04-11  8:55     ` Richard Biener
2019-04-11  9:05   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965, take 2) Jakub Jelinek
2019-04-11  9:54     ` Richard Biener
2019-04-12 16:08       ` Jeff Law
2019-04-12 16:10     ` Jeff Law
2019-04-12 15:29   ` [PATCH] Fix up RTL DCE find_call_stack_args (PR rtl-optimization/89965) Jeff Law
2019-04-12 15:40     ` Alexander Monakov
2019-04-12 15:45       ` Jeff Law
2019-04-15 13:12         ` Michael Matz
2019-04-16 15:35           ` Jeff Law
2019-04-16 15:35             ` Michael Matz
2019-04-16 16:10               ` Jeff Law
2019-04-17 11:53                 ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).