public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [tree-tailcall] Check if function returns it's argument
@ 2016-11-24  7:18 Prathamesh Kulkarni
  2016-11-24  8:37 ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-24  7:18 UTC (permalink / raw)
  To: gcc Patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

Hi,
Consider following test-case:

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

return a1 can be considered equivalent to return value of memcpy,
and the call could be emitted as a tail-call.
gcc doesn't emit the above call to memcpy as a tail-call,
but if it is changed to:

void *t1 = __builtin_memcpy (a1, a2, a3);
return t1;

Then memcpy is emitted as a tail-call.
The attached patch tries to handle the former case.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross tested on arm*-*-*, aarch64*-*-*
Does this patch look OK ?

Thanks,
Prathamesh

[-- Attachment #2: tcwg-963-1.txt --]
[-- Type: text/plain, Size: 2457 bytes --]

2016-11-24  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* gimple.c (gimple_call_return_arg): New function.
	* gimple.h (gimple_call_return_arg): Declare.
	* tree-tailcall.c (find_tail_calls): Call gimple_call_return_arg.

testsuite/
	* gcc.dg/tree-ssa/tailcall-8.c: New test.

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 0a3dc72..ec460fc 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2561,6 +2561,23 @@ gimple_call_combined_fn (const gimple *stmt)
   return CFN_LAST;
 }
 
+/* Return arg, if function returns it's argument or NULL if it doesn't.  */
+tree
+gimple_call_return_arg (gcall *call_stmt)
+{
+  unsigned rf = gimple_call_return_flags (call_stmt);
+  if (rf & ERF_RETURNS_ARG)
+    {
+      unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+      if (argnum < gimple_call_num_args (call_stmt))
+	{
+	  tree arg = gimple_call_arg (call_stmt, argnum);
+	  return arg;
+	}
+    }
+  return NULL_TREE;
+}
+
 /* Return true if STMT clobbers memory.  STMT is required to be a
    GIMPLE_ASM.  */
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0d0296e..ebccbe1 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1520,6 +1520,7 @@ extern combined_fn gimple_call_combined_fn (const gimple *);
 extern bool gimple_call_builtin_p (const gimple *);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_class);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_function);
+extern tree gimple_call_return_arg (gcall *);
 extern bool gimple_asm_clobbers_memory_p (const gasm *);
 extern void dump_decl_set (FILE *, bitmap);
 extern bool nonfreeing_call_p (gimple *);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index f97541d..3396473 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -422,6 +422,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    ass_var = gimple_call_return_arg (call);
 	  break;
 	}
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24  7:18 [tree-tailcall] Check if function returns it's argument Prathamesh Kulkarni
@ 2016-11-24  8:37 ` Richard Biener
  2016-11-24 12:15   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-11-24  8:37 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> Hi,
> Consider following test-case:
> 
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   __builtin_memcpy (a1, a2, a3);
>   return a1;
> }
> 
> return a1 can be considered equivalent to return value of memcpy,
> and the call could be emitted as a tail-call.
> gcc doesn't emit the above call to memcpy as a tail-call,
> but if it is changed to:
> 
> void *t1 = __builtin_memcpy (a1, a2, a3);
> return t1;
> 
> Then memcpy is emitted as a tail-call.
> The attached patch tries to handle the former case.
> 
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross tested on arm*-*-*, aarch64*-*-*
> Does this patch look OK ?

+/* Return arg, if function returns it's argument or NULL if it doesn't.  
*/
+tree
+gimple_call_return_arg (gcall *call_stmt)
+{


Please just inline it at the single use - the name is not terribly
informative.

I'm not sure you can rely on code-generation working if you not
effectively change the IL to

  a1 = __builtin_memcpy (a1, a2, a3);
  return a1;

someone more familiar with RTL expansion plus tail call emission on
RTL needs to chime in.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24  8:37 ` Richard Biener
@ 2016-11-24 12:15   ` Prathamesh Kulkarni
  2016-11-24 12:18     ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-24 12:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc Patches

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>
>> Hi,
>> Consider following test-case:
>>
>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> {
>>   __builtin_memcpy (a1, a2, a3);
>>   return a1;
>> }
>>
>> return a1 can be considered equivalent to return value of memcpy,
>> and the call could be emitted as a tail-call.
>> gcc doesn't emit the above call to memcpy as a tail-call,
>> but if it is changed to:
>>
>> void *t1 = __builtin_memcpy (a1, a2, a3);
>> return t1;
>>
>> Then memcpy is emitted as a tail-call.
>> The attached patch tries to handle the former case.
>>
>> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> Cross tested on arm*-*-*, aarch64*-*-*
>> Does this patch look OK ?
>
> +/* Return arg, if function returns it's argument or NULL if it doesn't.
> */
> +tree
> +gimple_call_return_arg (gcall *call_stmt)
> +{
>
>
> Please just inline it at the single use - the name is not terribly
> informative.
>
> I'm not sure you can rely on code-generation working if you not
> effectively change the IL to
>
>   a1 = __builtin_memcpy (a1, a2, a3);
>   return a1;
>
> someone more familiar with RTL expansion plus tail call emission on
> RTL needs to chime in.
Well I was trying to copy-propagate function's argument into uses of
it's return value if
function returned that argument, so the assignment to lhs of call
could be made redundant.

eg:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

After patch, copyprop transformed it into:
t1 = __builtin_memcpy (a1, a2, a3);
return a1;

But this now interferes with tail-call optimization, because it is not
able to emit memcpy
as tail-call anymore due to which the patch regressed 20050503-1.c.
I am not sure how to workaround this.

Thanks,
Prathamesh
>
> Richard.

[-- Attachment #2: tcwg-957-1.diff --]
[-- Type: text/plain, Size: 3549 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
new file mode 100644
index 0000000..8319e9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-copyprop-slim" } */
+
+void f(char *dest, const char *src, __SIZE_TYPE__ n)
+{
+  char *t1 = __builtin_strcpy (dest, src);
+  if (t1 != dest)
+    __builtin_abort ();
+
+  char *t2 = __builtin_strncpy (dest, src, n);
+  if (t2 != dest)
+    __builtin_abort ();
+
+  char *t3 = __builtin_strcat (dest, src);
+  if (t3 != dest)
+    __builtin_abort ();
+
+  char *t4 = __builtin_strncat (dest, src, n);
+  if (t4 != dest)
+    __builtin_abort ();
+
+  void *t5 = __builtin_memmove (dest, src, n);
+  if (t5 != dest)
+    __builtin_abort ();
+
+  void *t6 = __builtin_memcpy (dest, src, n);
+  if (t6 != dest)
+    __builtin_abort ();
+
+  void *t7 = __builtin_memset (dest, 0, n);
+  if (t7 != dest)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_abort" "copyprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c
new file mode 100644
index 0000000..199074d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-copyprop" } */
+
+char *f(char *dest, const char *src)
+{
+  char *t1 = __builtin_strcpy (dest, src);
+  return t1; 
+}
+
+/* { dg-final { scan-tree-dump "return dest_\[0-9\]*\\(D\\);" "copyprop1" } } */
diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c
index abc6205..77f4ad7 100644
--- a/gcc/tree-ssa-copy.c
+++ b/gcc/tree-ssa-copy.c
@@ -80,6 +80,9 @@ stmt_may_generate_copy (gimple *stmt)
   if (gimple_code (stmt) == GIMPLE_PHI)
     return !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (stmt));
 
+  if (gimple_code (stmt) == GIMPLE_CALL)
+    return true;
+
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
     return false;
 
@@ -252,6 +255,40 @@ copy_prop_visit_cond_stmt (gimple *stmt, edge *taken_edge_p)
   return retval;
 }
 
+/* Check if call returns one of it's arguments, and in that
+   case set copy-of (lhs) to the returned argument.  */
+
+static enum ssa_prop_result
+copy_prop_visit_call (gcall *call_stmt, tree *result_p)
+{
+  tree lhs = gimple_call_lhs (call_stmt);
+  if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+    return SSA_PROP_VARYING;
+
+  unsigned rf = gimple_call_return_flags (call_stmt);
+  if (rf & ERF_RETURNS_ARG)
+    {
+      unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+      if (argnum < gimple_call_num_args (call_stmt))
+	{
+	  tree arg = gimple_call_arg (call_stmt, argnum);
+	  if (TREE_CODE (arg) == SSA_NAME)
+	    {
+	      arg = valueize_val (arg);
+	      if (!may_propagate_copy (lhs, arg))
+		return SSA_PROP_VARYING;
+
+	      *result_p = lhs;
+	      if (set_copy_of_val (*result_p, arg))
+		return SSA_PROP_INTERESTING;
+	      else
+		return SSA_PROP_NOT_INTERESTING;
+	    }
+	}
+    }
+
+  return SSA_PROP_VARYING;
+}
 
 /* Evaluate statement STMT.  If the statement produces a new output
    value, return SSA_PROP_INTERESTING and store the SSA_NAME holding
@@ -290,6 +327,8 @@ copy_prop_visit_stmt (gimple *stmt, edge *taken_edge_p, tree *result_p)
 	 jump.  */
       retval = copy_prop_visit_cond_stmt (stmt, taken_edge_p);
     }
+  else if (gcall *call_stmt = dyn_cast<gcall *> (stmt))
+    retval = copy_prop_visit_call (call_stmt, result_p);
   else
     retval = SSA_PROP_VARYING;
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:15   ` Prathamesh Kulkarni
@ 2016-11-24 12:18     ` Richard Biener
  2016-11-24 12:21       ` Jakub Jelinek
  2016-11-24 12:32       ` Prathamesh Kulkarni
  0 siblings, 2 replies; 32+ messages in thread
From: Richard Biener @ 2016-11-24 12:18 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> Consider following test-case:
> >>
> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> {
> >>   __builtin_memcpy (a1, a2, a3);
> >>   return a1;
> >> }
> >>
> >> return a1 can be considered equivalent to return value of memcpy,
> >> and the call could be emitted as a tail-call.
> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> but if it is changed to:
> >>
> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> return t1;
> >>
> >> Then memcpy is emitted as a tail-call.
> >> The attached patch tries to handle the former case.
> >>
> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> Does this patch look OK ?
> >
> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
> > */
> > +tree
> > +gimple_call_return_arg (gcall *call_stmt)
> > +{
> >
> >
> > Please just inline it at the single use - the name is not terribly
> > informative.
> >
> > I'm not sure you can rely on code-generation working if you not
> > effectively change the IL to
> >
> >   a1 = __builtin_memcpy (a1, a2, a3);
> >   return a1;
> >
> > someone more familiar with RTL expansion plus tail call emission on
> > RTL needs to chime in.
> Well I was trying to copy-propagate function's argument into uses of
> it's return value if
> function returned that argument, so the assignment to lhs of call
> could be made redundant.
> 
> eg:
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   void *t1 = __builtin_memcpy (a1, a2, a3);
>   return t1;
> }
>
> After patch, copyprop transformed it into:
> t1 = __builtin_memcpy (a1, a2, a3);
> return a1;

But that's a bad transform -- if we know that t1 == a1 then it's
better to use t1 as that's readily available in the return register
while the register for a1 might have been clobbered and thus we
need to spill it for the later return.
 
> But this now interferes with tail-call optimization, because it is not
> able to emit memcpy
> as tail-call anymore due to which the patch regressed 20050503-1.c.
> I am not sure how to workaround this.
> 
> Thanks,
> Prathamesh
> >
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:18     ` Richard Biener
@ 2016-11-24 12:21       ` Jakub Jelinek
  2016-11-24 12:23         ` Richard Biener
  2016-11-24 12:32       ` Prathamesh Kulkarni
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2016-11-24 12:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc Patches

On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote:
> > eg:
> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> > {
> >   void *t1 = __builtin_memcpy (a1, a2, a3);
> >   return t1;
> > }
> >
> > After patch, copyprop transformed it into:
> > t1 = __builtin_memcpy (a1, a2, a3);
> > return a1;
> 
> But that's a bad transform -- if we know that t1 == a1 then it's
> better to use t1 as that's readily available in the return register
> while the register for a1 might have been clobbered and thus we
> need to spill it for the later return.

I thought that the RA is aware of this and is able to undo it.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:21       ` Jakub Jelinek
@ 2016-11-24 12:23         ` Richard Biener
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Biener @ 2016-11-24 12:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Prathamesh Kulkarni, gcc Patches

On Thu, 24 Nov 2016, Jakub Jelinek wrote:

> On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote:
> > > eg:
> > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> > > {
> > >   void *t1 = __builtin_memcpy (a1, a2, a3);
> > >   return t1;
> > > }
> > >
> > > After patch, copyprop transformed it into:
> > > t1 = __builtin_memcpy (a1, a2, a3);
> > > return a1;
> > 
> > But that's a bad transform -- if we know that t1 == a1 then it's
> > better to use t1 as that's readily available in the return register
> > while the register for a1 might have been clobbered and thus we
> > need to spill it for the later return.
> 
> I thought that the RA is aware of this and is able to undo it.

The RA has some feature here, yes.  Not sure what it can do
exactly and if it can do something when the return value is
optimized away.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:18     ` Richard Biener
  2016-11-24 12:21       ` Jakub Jelinek
@ 2016-11-24 12:32       ` Prathamesh Kulkarni
  2016-11-24 12:38         ` Richard Biener
  1 sibling, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-24 12:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc Patches

On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> Hi,
>> >> Consider following test-case:
>> >>
>> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> {
>> >>   __builtin_memcpy (a1, a2, a3);
>> >>   return a1;
>> >> }
>> >>
>> >> return a1 can be considered equivalent to return value of memcpy,
>> >> and the call could be emitted as a tail-call.
>> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> but if it is changed to:
>> >>
>> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> return t1;
>> >>
>> >> Then memcpy is emitted as a tail-call.
>> >> The attached patch tries to handle the former case.
>> >>
>> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> Does this patch look OK ?
>> >
>> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
>> > */
>> > +tree
>> > +gimple_call_return_arg (gcall *call_stmt)
>> > +{
>> >
>> >
>> > Please just inline it at the single use - the name is not terribly
>> > informative.
>> >
>> > I'm not sure you can rely on code-generation working if you not
>> > effectively change the IL to
>> >
>> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >   return a1;
>> >
>> > someone more familiar with RTL expansion plus tail call emission on
>> > RTL needs to chime in.
>> Well I was trying to copy-propagate function's argument into uses of
>> it's return value if
>> function returned that argument, so the assignment to lhs of call
>> could be made redundant.
>>
>> eg:
>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> {
>>   void *t1 = __builtin_memcpy (a1, a2, a3);
>>   return t1;
>> }
>>
>> After patch, copyprop transformed it into:
>> t1 = __builtin_memcpy (a1, a2, a3);
>> return a1;
>
> But that's a bad transform -- if we know that t1 == a1 then it's
> better to use t1 as that's readily available in the return register
> while the register for a1 might have been clobbered and thus we
> need to spill it for the later return.
Oh I didn't realize this could possibly pessimize RA.
For test-case:

void *t1 = memcpy (dest, src, n);
if (t1 != dest)
  __builtin_abort ();

we could copy-propagate t1 into cond_expr and make the condition redundant.
However I suppose this particular case could be handled with VRP instead
(t1 and dest should be marked equivalent) ?

Thanks,
Prathamesh
>
>> But this now interferes with tail-call optimization, because it is not
>> able to emit memcpy
>> as tail-call anymore due to which the patch regressed 20050503-1.c.
>> I am not sure how to workaround this.
>>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:32       ` Prathamesh Kulkarni
@ 2016-11-24 12:38         ` Richard Biener
  2016-11-25  5:16           ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-11-24 12:38 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> Hi,
> >> >> Consider following test-case:
> >> >>
> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> {
> >> >>   __builtin_memcpy (a1, a2, a3);
> >> >>   return a1;
> >> >> }
> >> >>
> >> >> return a1 can be considered equivalent to return value of memcpy,
> >> >> and the call could be emitted as a tail-call.
> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> >> but if it is changed to:
> >> >>
> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> return t1;
> >> >>
> >> >> Then memcpy is emitted as a tail-call.
> >> >> The attached patch tries to handle the former case.
> >> >>
> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> >> Does this patch look OK ?
> >> >
> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
> >> > */
> >> > +tree
> >> > +gimple_call_return_arg (gcall *call_stmt)
> >> > +{
> >> >
> >> >
> >> > Please just inline it at the single use - the name is not terribly
> >> > informative.
> >> >
> >> > I'm not sure you can rely on code-generation working if you not
> >> > effectively change the IL to
> >> >
> >> >   a1 = __builtin_memcpy (a1, a2, a3);
> >> >   return a1;
> >> >
> >> > someone more familiar with RTL expansion plus tail call emission on
> >> > RTL needs to chime in.
> >> Well I was trying to copy-propagate function's argument into uses of
> >> it's return value if
> >> function returned that argument, so the assignment to lhs of call
> >> could be made redundant.
> >>
> >> eg:
> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> {
> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
> >>   return t1;
> >> }
> >>
> >> After patch, copyprop transformed it into:
> >> t1 = __builtin_memcpy (a1, a2, a3);
> >> return a1;
> >
> > But that's a bad transform -- if we know that t1 == a1 then it's
> > better to use t1 as that's readily available in the return register
> > while the register for a1 might have been clobbered and thus we
> > need to spill it for the later return.
> Oh I didn't realize this could possibly pessimize RA.
> For test-case:
> 
> void *t1 = memcpy (dest, src, n);
> if (t1 != dest)
>   __builtin_abort ();
> 
> we could copy-propagate t1 into cond_expr and make the condition redundant.
> However I suppose this particular case could be handled with VRP instead
> (t1 and dest should be marked equivalent) ?

Yeah, exposing this to value-numbering in general can enable some
optimizations (but I wouldn't put it in copyprop).  Note it's then
difficult to avoid copy-propgating things...

The user can also write

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

so it's good to improve code-gen for that (for the tailcall issue).

Richard.

> Thanks,
> Prathamesh
> >
> >> But this now interferes with tail-call optimization, because it is not
> >> able to emit memcpy
> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
> >> I am not sure how to workaround this.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Richard.
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-24 12:38         ` Richard Biener
@ 2016-11-25  5:16           ` Prathamesh Kulkarni
  2016-11-25  8:07             ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-25  5:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc Patches

On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> Hi,
>> >> >> Consider following test-case:
>> >> >>
>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> {
>> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >>   return a1;
>> >> >> }
>> >> >>
>> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> and the call could be emitted as a tail-call.
>> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> but if it is changed to:
>> >> >>
>> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> return t1;
>> >> >>
>> >> >> Then memcpy is emitted as a tail-call.
>> >> >> The attached patch tries to handle the former case.
>> >> >>
>> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> Does this patch look OK ?
>> >> >
>> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
>> >> > */
>> >> > +tree
>> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> > +{
>> >> >
>> >> >
>> >> > Please just inline it at the single use - the name is not terribly
>> >> > informative.
>> >> >
>> >> > I'm not sure you can rely on code-generation working if you not
>> >> > effectively change the IL to
>> >> >
>> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >   return a1;
>> >> >
>> >> > someone more familiar with RTL expansion plus tail call emission on
>> >> > RTL needs to chime in.
>> >> Well I was trying to copy-propagate function's argument into uses of
>> >> it's return value if
>> >> function returned that argument, so the assignment to lhs of call
>> >> could be made redundant.
>> >>
>> >> eg:
>> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> {
>> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >>   return t1;
>> >> }
>> >>
>> >> After patch, copyprop transformed it into:
>> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> return a1;
>> >
>> > But that's a bad transform -- if we know that t1 == a1 then it's
>> > better to use t1 as that's readily available in the return register
>> > while the register for a1 might have been clobbered and thus we
>> > need to spill it for the later return.
>> Oh I didn't realize this could possibly pessimize RA.
>> For test-case:
>>
>> void *t1 = memcpy (dest, src, n);
>> if (t1 != dest)
>>   __builtin_abort ();
>>
>> we could copy-propagate t1 into cond_expr and make the condition redundant.
>> However I suppose this particular case could be handled with VRP instead
>> (t1 and dest should be marked equivalent) ?
>
> Yeah, exposing this to value-numbering in general can enable some
> optimizations (but I wouldn't put it in copyprop).  Note it's then
> difficult to avoid copy-propgating things...
>
> The user can also write
>
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   __builtin_memcpy (a1, a2, a3);
>   return a1;
> }
>
> so it's good to improve code-gen for that (for the tailcall issue).
For the tail-call, issue should we artificially create a lhs and use that
as return value (perhaps by a separate pass before tailcall) ?

__builtin_memcpy (a1, a2, a3);
return a1;

gets transformed to:
_1 = __builtin_memcpy (a1, a2, a3)
return _1;

So tail-call optimization pass would see the IL in it's expected form.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> >> But this now interferes with tail-call optimization, because it is not
>> >> able to emit memcpy
>> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
>> >> I am not sure how to workaround this.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Richard.
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25  5:16           ` Prathamesh Kulkarni
@ 2016-11-25  8:07             ` Richard Biener
  2016-11-25  8:18               ` Prathamesh Kulkarni
  2016-11-25 15:47               ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Richard Biener @ 2016-11-25  8:07 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >> Consider following test-case:
> >> >> >>
> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> {
> >> >> >>   __builtin_memcpy (a1, a2, a3);
> >> >> >>   return a1;
> >> >> >> }
> >> >> >>
> >> >> >> return a1 can be considered equivalent to return value of memcpy,
> >> >> >> and the call could be emitted as a tail-call.
> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> >> >> but if it is changed to:
> >> >> >>
> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> return t1;
> >> >> >>
> >> >> >> Then memcpy is emitted as a tail-call.
> >> >> >> The attached patch tries to handle the former case.
> >> >> >>
> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> >> >> Does this patch look OK ?
> >> >> >
> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
> >> >> > */
> >> >> > +tree
> >> >> > +gimple_call_return_arg (gcall *call_stmt)
> >> >> > +{
> >> >> >
> >> >> >
> >> >> > Please just inline it at the single use - the name is not terribly
> >> >> > informative.
> >> >> >
> >> >> > I'm not sure you can rely on code-generation working if you not
> >> >> > effectively change the IL to
> >> >> >
> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
> >> >> >   return a1;
> >> >> >
> >> >> > someone more familiar with RTL expansion plus tail call emission on
> >> >> > RTL needs to chime in.
> >> >> Well I was trying to copy-propagate function's argument into uses of
> >> >> it's return value if
> >> >> function returned that argument, so the assignment to lhs of call
> >> >> could be made redundant.
> >> >>
> >> >> eg:
> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> {
> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >>   return t1;
> >> >> }
> >> >>
> >> >> After patch, copyprop transformed it into:
> >> >> t1 = __builtin_memcpy (a1, a2, a3);
> >> >> return a1;
> >> >
> >> > But that's a bad transform -- if we know that t1 == a1 then it's
> >> > better to use t1 as that's readily available in the return register
> >> > while the register for a1 might have been clobbered and thus we
> >> > need to spill it for the later return.
> >> Oh I didn't realize this could possibly pessimize RA.
> >> For test-case:
> >>
> >> void *t1 = memcpy (dest, src, n);
> >> if (t1 != dest)
> >>   __builtin_abort ();
> >>
> >> we could copy-propagate t1 into cond_expr and make the condition redundant.
> >> However I suppose this particular case could be handled with VRP instead
> >> (t1 and dest should be marked equivalent) ?
> >
> > Yeah, exposing this to value-numbering in general can enable some
> > optimizations (but I wouldn't put it in copyprop).  Note it's then
> > difficult to avoid copy-propgating things...
> >
> > The user can also write
> >
> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> > {
> >   __builtin_memcpy (a1, a2, a3);
> >   return a1;
> > }
> >
> > so it's good to improve code-gen for that (for the tailcall issue).
> For the tail-call, issue should we artificially create a lhs and use that
> as return value (perhaps by a separate pass before tailcall) ?
> 
> __builtin_memcpy (a1, a2, a3);
> return a1;
> 
> gets transformed to:
> _1 = __builtin_memcpy (a1, a2, a3)
> return _1;
> 
> So tail-call optimization pass would see the IL in it's expected form.

As said, a RTL expert needs to chime in here.  Iff then tail-call
itself should do this rewrite.  But if this form is required to make
things work (I suppose you checked it _does_ actually work?) then
we'd need to make sure later passes do not undo it.  So it looks
fragile to me.  OTOH I seem to remember that the flags we set on
GIMPLE are merely a hint to RTL expansion and the tailcalling is
verified again there?

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> >> But this now interferes with tail-call optimization, because it is not
> >> >> able to emit memcpy
> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
> >> >> I am not sure how to workaround this.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Richard.
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25  8:07             ` Richard Biener
@ 2016-11-25  8:18               ` Prathamesh Kulkarni
  2016-11-25  8:25                 ` Richard Biener
  2016-11-25 15:47               ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-25  8:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc Patches

On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >> Consider following test-case:
>> >> >> >>
>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> {
>> >> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >> >>   return a1;
>> >> >> >> }
>> >> >> >>
>> >> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> >> and the call could be emitted as a tail-call.
>> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> >> but if it is changed to:
>> >> >> >>
>> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> return t1;
>> >> >> >>
>> >> >> >> Then memcpy is emitted as a tail-call.
>> >> >> >> The attached patch tries to handle the former case.
>> >> >> >>
>> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> >> Does this patch look OK ?
>> >> >> >
>> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
>> >> >> > */
>> >> >> > +tree
>> >> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> >> > +{
>> >> >> >
>> >> >> >
>> >> >> > Please just inline it at the single use - the name is not terribly
>> >> >> > informative.
>> >> >> >
>> >> >> > I'm not sure you can rely on code-generation working if you not
>> >> >> > effectively change the IL to
>> >> >> >
>> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >   return a1;
>> >> >> >
>> >> >> > someone more familiar with RTL expansion plus tail call emission on
>> >> >> > RTL needs to chime in.
>> >> >> Well I was trying to copy-propagate function's argument into uses of
>> >> >> it's return value if
>> >> >> function returned that argument, so the assignment to lhs of call
>> >> >> could be made redundant.
>> >> >>
>> >> >> eg:
>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> {
>> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >>   return t1;
>> >> >> }
>> >> >>
>> >> >> After patch, copyprop transformed it into:
>> >> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> return a1;
>> >> >
>> >> > But that's a bad transform -- if we know that t1 == a1 then it's
>> >> > better to use t1 as that's readily available in the return register
>> >> > while the register for a1 might have been clobbered and thus we
>> >> > need to spill it for the later return.
>> >> Oh I didn't realize this could possibly pessimize RA.
>> >> For test-case:
>> >>
>> >> void *t1 = memcpy (dest, src, n);
>> >> if (t1 != dest)
>> >>   __builtin_abort ();
>> >>
>> >> we could copy-propagate t1 into cond_expr and make the condition redundant.
>> >> However I suppose this particular case could be handled with VRP instead
>> >> (t1 and dest should be marked equivalent) ?
>> >
>> > Yeah, exposing this to value-numbering in general can enable some
>> > optimizations (but I wouldn't put it in copyprop).  Note it's then
>> > difficult to avoid copy-propgating things...
>> >
>> > The user can also write
>> >
>> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> > {
>> >   __builtin_memcpy (a1, a2, a3);
>> >   return a1;
>> > }
>> >
>> > so it's good to improve code-gen for that (for the tailcall issue).
>> For the tail-call, issue should we artificially create a lhs and use that
>> as return value (perhaps by a separate pass before tailcall) ?
>>
>> __builtin_memcpy (a1, a2, a3);
>> return a1;
>>
>> gets transformed to:
>> _1 = __builtin_memcpy (a1, a2, a3)
>> return _1;
>>
>> So tail-call optimization pass would see the IL in it's expected form.
>
> As said, a RTL expert needs to chime in here.  Iff then tail-call
> itself should do this rewrite.  But if this form is required to make
> things work (I suppose you checked it _does_ actually work?) then
> we'd need to make sure later passes do not undo it.  So it looks
> fragile to me.  OTOH I seem to remember that the flags we set on
> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> verified again there?

Yeah, I verified the form works:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

assembly:
f:
.LFB0:
        .cfi_startproc
        jmp     memcpy
        .cfi_endproc

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >> But this now interferes with tail-call optimization, because it is not
>> >> >> able to emit memcpy
>> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
>> >> >> I am not sure how to workaround this.
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> > Richard.
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25  8:18               ` Prathamesh Kulkarni
@ 2016-11-25  8:25                 ` Richard Biener
  2016-11-25  8:31                   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-11-25  8:25 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >> >
> >> >> >> >> Hi,
> >> >> >> >> Consider following test-case:
> >> >> >> >>
> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> >> {
> >> >> >> >>   __builtin_memcpy (a1, a2, a3);
> >> >> >> >>   return a1;
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,
> >> >> >> >> and the call could be emitted as a tail-call.
> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> >> >> >> but if it is changed to:
> >> >> >> >>
> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> >> return t1;
> >> >> >> >>
> >> >> >> >> Then memcpy is emitted as a tail-call.
> >> >> >> >> The attached patch tries to handle the former case.
> >> >> >> >>
> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> >> >> >> Does this patch look OK ?
> >> >> >> >
> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
> >> >> >> > */
> >> >> >> > +tree
> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)
> >> >> >> > +{
> >> >> >> >
> >> >> >> >
> >> >> >> > Please just inline it at the single use - the name is not terribly
> >> >> >> > informative.
> >> >> >> >
> >> >> >> > I'm not sure you can rely on code-generation working if you not
> >> >> >> > effectively change the IL to
> >> >> >> >
> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> >   return a1;
> >> >> >> >
> >> >> >> > someone more familiar with RTL expansion plus tail call emission on
> >> >> >> > RTL needs to chime in.
> >> >> >> Well I was trying to copy-propagate function's argument into uses of
> >> >> >> it's return value if
> >> >> >> function returned that argument, so the assignment to lhs of call
> >> >> >> could be made redundant.
> >> >> >>
> >> >> >> eg:
> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> {
> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >>   return t1;
> >> >> >> }
> >> >> >>
> >> >> >> After patch, copyprop transformed it into:
> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> return a1;
> >> >> >
> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's
> >> >> > better to use t1 as that's readily available in the return register
> >> >> > while the register for a1 might have been clobbered and thus we
> >> >> > need to spill it for the later return.
> >> >> Oh I didn't realize this could possibly pessimize RA.
> >> >> For test-case:
> >> >>
> >> >> void *t1 = memcpy (dest, src, n);
> >> >> if (t1 != dest)
> >> >>   __builtin_abort ();
> >> >>
> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant.
> >> >> However I suppose this particular case could be handled with VRP instead
> >> >> (t1 and dest should be marked equivalent) ?
> >> >
> >> > Yeah, exposing this to value-numbering in general can enable some
> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then
> >> > difficult to avoid copy-propgating things...
> >> >
> >> > The user can also write
> >> >
> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> > {
> >> >   __builtin_memcpy (a1, a2, a3);
> >> >   return a1;
> >> > }
> >> >
> >> > so it's good to improve code-gen for that (for the tailcall issue).
> >> For the tail-call, issue should we artificially create a lhs and use that
> >> as return value (perhaps by a separate pass before tailcall) ?
> >>
> >> __builtin_memcpy (a1, a2, a3);
> >> return a1;
> >>
> >> gets transformed to:
> >> _1 = __builtin_memcpy (a1, a2, a3)
> >> return _1;
> >>
> >> So tail-call optimization pass would see the IL in it's expected form.
> >
> > As said, a RTL expert needs to chime in here.  Iff then tail-call
> > itself should do this rewrite.  But if this form is required to make
> > things work (I suppose you checked it _does_ actually work?) then
> > we'd need to make sure later passes do not undo it.  So it looks
> > fragile to me.  OTOH I seem to remember that the flags we set on
> > GIMPLE are merely a hint to RTL expansion and the tailcalling is
> > verified again there?
> 
> Yeah, I verified the form works:
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   void *t1 = __builtin_memcpy (a1, a2, a3);
>   return t1;
> }
> 
> assembly:
> f:
> .LFB0:
>         .cfi_startproc
>         jmp     memcpy
>         .cfi_endproc

I meant the

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

form after your patch to the tailcall pass.

Richard.

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Richard.
> >> >
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >> But this now interferes with tail-call optimization, because it is not
> >> >> >> able to emit memcpy
> >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
> >> >> >> I am not sure how to workaround this.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> >
> >> >> >> > Richard.
> >> >> >>
> >> >> >
> >> >> > --
> >> >> > Richard Biener <rguenther@suse.de>
> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >> >>
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25  8:25                 ` Richard Biener
@ 2016-11-25  8:31                   ` Prathamesh Kulkarni
  0 siblings, 0 replies; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-11-25  8:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc Patches

On 25 November 2016 at 13:55, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:
>> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
>> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >
>> >> >> >> >> Hi,
>> >> >> >> >> Consider following test-case:
>> >> >> >> >>
>> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> >> {
>> >> >> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >> >> >>   return a1;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> >> >> and the call could be emitted as a tail-call.
>> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> >> >> but if it is changed to:
>> >> >> >> >>
>> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> >> return t1;
>> >> >> >> >>
>> >> >> >> >> Then memcpy is emitted as a tail-call.
>> >> >> >> >> The attached patch tries to handle the former case.
>> >> >> >> >>
>> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> >> >> Does this patch look OK ?
>> >> >> >> >
>> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.
>> >> >> >> > */
>> >> >> >> > +tree
>> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> >> >> > +{
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Please just inline it at the single use - the name is not terribly
>> >> >> >> > informative.
>> >> >> >> >
>> >> >> >> > I'm not sure you can rely on code-generation working if you not
>> >> >> >> > effectively change the IL to
>> >> >> >> >
>> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> >   return a1;
>> >> >> >> >
>> >> >> >> > someone more familiar with RTL expansion plus tail call emission on
>> >> >> >> > RTL needs to chime in.
>> >> >> >> Well I was trying to copy-propagate function's argument into uses of
>> >> >> >> it's return value if
>> >> >> >> function returned that argument, so the assignment to lhs of call
>> >> >> >> could be made redundant.
>> >> >> >>
>> >> >> >> eg:
>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> {
>> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >>   return t1;
>> >> >> >> }
>> >> >> >>
>> >> >> >> After patch, copyprop transformed it into:
>> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> return a1;
>> >> >> >
>> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's
>> >> >> > better to use t1 as that's readily available in the return register
>> >> >> > while the register for a1 might have been clobbered and thus we
>> >> >> > need to spill it for the later return.
>> >> >> Oh I didn't realize this could possibly pessimize RA.
>> >> >> For test-case:
>> >> >>
>> >> >> void *t1 = memcpy (dest, src, n);
>> >> >> if (t1 != dest)
>> >> >>   __builtin_abort ();
>> >> >>
>> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant.
>> >> >> However I suppose this particular case could be handled with VRP instead
>> >> >> (t1 and dest should be marked equivalent) ?
>> >> >
>> >> > Yeah, exposing this to value-numbering in general can enable some
>> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then
>> >> > difficult to avoid copy-propgating things...
>> >> >
>> >> > The user can also write
>> >> >
>> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> > {
>> >> >   __builtin_memcpy (a1, a2, a3);
>> >> >   return a1;
>> >> > }
>> >> >
>> >> > so it's good to improve code-gen for that (for the tailcall issue).
>> >> For the tail-call, issue should we artificially create a lhs and use that
>> >> as return value (perhaps by a separate pass before tailcall) ?
>> >>
>> >> __builtin_memcpy (a1, a2, a3);
>> >> return a1;
>> >>
>> >> gets transformed to:
>> >> _1 = __builtin_memcpy (a1, a2, a3)
>> >> return _1;
>> >>
>> >> So tail-call optimization pass would see the IL in it's expected form.
>> >
>> > As said, a RTL expert needs to chime in here.  Iff then tail-call
>> > itself should do this rewrite.  But if this form is required to make
>> > things work (I suppose you checked it _does_ actually work?) then
>> > we'd need to make sure later passes do not undo it.  So it looks
>> > fragile to me.  OTOH I seem to remember that the flags we set on
>> > GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> > verified again there?
>>
>> Yeah, I verified the form works:
>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> {
>>   void *t1 = __builtin_memcpy (a1, a2, a3);
>>   return t1;
>> }
>>
>> assembly:
>> f:
>> .LFB0:
>>         .cfi_startproc
>>         jmp     memcpy
>>         .cfi_endproc
>
> I meant the
>
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   __builtin_memcpy (a1, a2, a3);
>   return a1;
> }
>
> form after your patch to the tailcall pass.
Oops, sorry -;)
Yes, before the patch:
f:
.LFB0:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    memcpy
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

after patch:
f:
.LFB0:
        .cfi_startproc
        jmp     memcpy
        .cfi_endproc

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Richard.
>> >> >
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> >> But this now interferes with tail-call optimization, because it is not
>> >> >> >> able to emit memcpy
>> >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
>> >> >> >> I am not sure how to workaround this.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Prathamesh
>> >> >> >> >
>> >> >> >> > Richard.
>> >> >> >>
>> >> >> >
>> >> >> > --
>> >> >> > Richard Biener <rguenther@suse.de>
>> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >> >>
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25  8:07             ` Richard Biener
  2016-11-25  8:18               ` Prathamesh Kulkarni
@ 2016-11-25 15:47               ` Jeff Law
  2016-12-01 11:42                 ` Prathamesh Kulkarni
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Law @ 2016-11-25 15:47 UTC (permalink / raw)
  To: Richard Biener, Prathamesh Kulkarni; +Cc: gcc Patches

On 11/25/2016 01:07 AM, Richard Biener wrote:

>> For the tail-call, issue should we artificially create a lhs and use that
>> as return value (perhaps by a separate pass before tailcall) ?
>>
>> __builtin_memcpy (a1, a2, a3);
>> return a1;
>>
>> gets transformed to:
>> _1 = __builtin_memcpy (a1, a2, a3)
>> return _1;
>>
>> So tail-call optimization pass would see the IL in it's expected form.
>
> As said, a RTL expert needs to chime in here.  Iff then tail-call
> itself should do this rewrite.  But if this form is required to make
> things work (I suppose you checked it _does_ actually work?) then
> we'd need to make sure later passes do not undo it.  So it looks
> fragile to me.  OTOH I seem to remember that the flags we set on
> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> verified again there?
So tail calling actually sits on the border between trees and RTL. 
Essentially it's an expand-time decision as we use information from 
trees as well as low level target information.

I would not expect the former sequence to tail call.  The tail calling 
code does not know that the return value from memcpy will be a1.  Thus 
the tail calling code has to assume that it'll have to copy a1 into the 
return register after returning from memcpy, which obviously can't be 
done if we tail called memcpy.

The second form is much more likely to turn into a tail call sequence 
because the return value from memcpy will be sitting in the proper 
register.  This form out to work for most calling conventions that allow 
tail calls.

We could (in theory) try and exploit the fact that memcpy returns its 
first argument as a return value, but that would only be helpful on a 
target where the first argument and return value use the same register. 
So I'd have a slight preference to rewriting per Prathamesh's suggestion 
above since it's more general.


Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-11-25 15:47               ` Jeff Law
@ 2016-12-01 11:42                 ` Prathamesh Kulkarni
  2016-12-01 12:10                   ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-12-01 11:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 3579 bytes --]

On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> On 11/25/2016 01:07 AM, Richard Biener wrote:
>
>>> For the tail-call, issue should we artificially create a lhs and use that
>>> as return value (perhaps by a separate pass before tailcall) ?
>>>
>>> __builtin_memcpy (a1, a2, a3);
>>> return a1;
>>>
>>> gets transformed to:
>>> _1 = __builtin_memcpy (a1, a2, a3)
>>> return _1;
>>>
>>> So tail-call optimization pass would see the IL in it's expected form.
>>
>>
>> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> itself should do this rewrite.  But if this form is required to make
>> things work (I suppose you checked it _does_ actually work?) then
>> we'd need to make sure later passes do not undo it.  So it looks
>> fragile to me.  OTOH I seem to remember that the flags we set on
>> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> verified again there?
>
> So tail calling actually sits on the border between trees and RTL.
> Essentially it's an expand-time decision as we use information from trees as
> well as low level target information.
>
> I would not expect the former sequence to tail call.  The tail calling code
> does not know that the return value from memcpy will be a1.  Thus the tail
> calling code has to assume that it'll have to copy a1 into the return
> register after returning from memcpy, which obviously can't be done if we
> tail called memcpy.
>
> The second form is much more likely to turn into a tail call sequence
> because the return value from memcpy will be sitting in the proper register.
> This form out to work for most calling conventions that allow tail calls.
>
> We could (in theory) try and exploit the fact that memcpy returns its first
> argument as a return value, but that would only be helpful on a target where
> the first argument and return value use the same register. So I'd have a
> slight preference to rewriting per Prathamesh's suggestion above since it's
> more general.
Thanks for the suggestion. The attached patch creates artificial lhs,
and returns it if the function returns it's argument and that argument
is used as return-value.

eg:
f (void * a1, void * a2, long unsigned int a3)
{
  <bb 2> [0.0%]:
  # .MEM_5 = VDEF <.MEM_1(D)>
  __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
  # VUSE <.MEM_5>
  return a1_2(D);

}

is transformed to:
f (void * a1, void * a2, long unsigned int a3)
{
  void * _6;

  <bb 2> [0.0%]:
  # .MEM_5 = VDEF <.MEM_1(D)>
  _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
  # VUSE <.MEM_5>
  return _6;

}

While testing, I came across an issue with function f() defined
intail-padding1.C:
struct X
{
  ~X() {}
  int n;
  char d;
};

X f()
{
  X nrvo;
  __builtin_memset (&nrvo, 0, sizeof(X));
  return nrvo;
}

input to the pass:
X f() ()
{
  <bb 2> [0.0%]:
  # .MEM_3 = VDEF <.MEM_1(D)>
  __builtin_memset (nrvo_2(D), 0, 8);
  # VUSE <.MEM_3>
  return nrvo_2(D);

}

verify_gimple_return failed with:
tail-padding1.C:13:1: error: invalid conversion in return statement
 }
 ^
struct X

struct X &

# VUSE <.MEM_3>
return _4;

It seems the return type of function (struct X) differs with the type
of return value (struct X&).
Not sure how this is possible ?
To work around that, I guarded the transform on:
useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
                                             TREE_TYPE (retval)))

in the patch. Does that look OK ?

Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
Cross-tested on arm*-*-*, aarch64*-*-*.

Thanks,
Prathamesh
>
>
> Jeff

[-- Attachment #2: tailcall-5.diff --]
[-- Type: text/plain, Size: 2342 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..d46ca50 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,37 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call)
+		      && ret_stmt)
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
+							TREE_TYPE (retval)))
+			{
+			  ass_var = make_ssa_name (TREE_TYPE (arg));
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 11:42                 ` Prathamesh Kulkarni
@ 2016-12-01 12:10                   ` Richard Biener
  2016-12-01 12:50                     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-01 12:10 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Jeff Law, gcc Patches

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >
> >>> For the tail-call, issue should we artificially create a lhs and use that
> >>> as return value (perhaps by a separate pass before tailcall) ?
> >>>
> >>> __builtin_memcpy (a1, a2, a3);
> >>> return a1;
> >>>
> >>> gets transformed to:
> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >>> return _1;
> >>>
> >>> So tail-call optimization pass would see the IL in it's expected form.
> >>
> >>
> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> itself should do this rewrite.  But if this form is required to make
> >> things work (I suppose you checked it _does_ actually work?) then
> >> we'd need to make sure later passes do not undo it.  So it looks
> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> verified again there?
> >
> > So tail calling actually sits on the border between trees and RTL.
> > Essentially it's an expand-time decision as we use information from trees as
> > well as low level target information.
> >
> > I would not expect the former sequence to tail call.  The tail calling code
> > does not know that the return value from memcpy will be a1.  Thus the tail
> > calling code has to assume that it'll have to copy a1 into the return
> > register after returning from memcpy, which obviously can't be done if we
> > tail called memcpy.
> >
> > The second form is much more likely to turn into a tail call sequence
> > because the return value from memcpy will be sitting in the proper register.
> > This form out to work for most calling conventions that allow tail calls.
> >
> > We could (in theory) try and exploit the fact that memcpy returns its first
> > argument as a return value, but that would only be helpful on a target where
> > the first argument and return value use the same register. So I'd have a
> > slight preference to rewriting per Prathamesh's suggestion above since it's
> > more general.
> Thanks for the suggestion. The attached patch creates artificial lhs,
> and returns it if the function returns it's argument and that argument
> is used as return-value.
> 
> eg:
> f (void * a1, void * a2, long unsigned int a3)
> {
>   <bb 2> [0.0%]:
>   # .MEM_5 = VDEF <.MEM_1(D)>
>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>   # VUSE <.MEM_5>
>   return a1_2(D);
> 
> }
> 
> is transformed to:
> f (void * a1, void * a2, long unsigned int a3)
> {
>   void * _6;
> 
>   <bb 2> [0.0%]:
>   # .MEM_5 = VDEF <.MEM_1(D)>
>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>   # VUSE <.MEM_5>
>   return _6;
> 
> }
> 
> While testing, I came across an issue with function f() defined
> intail-padding1.C:
> struct X
> {
>   ~X() {}
>   int n;
>   char d;
> };
> 
> X f()
> {
>   X nrvo;
>   __builtin_memset (&nrvo, 0, sizeof(X));
>   return nrvo;
> }
> 
> input to the pass:
> X f() ()
> {
>   <bb 2> [0.0%]:
>   # .MEM_3 = VDEF <.MEM_1(D)>
>   __builtin_memset (nrvo_2(D), 0, 8);
>   # VUSE <.MEM_3>
>   return nrvo_2(D);
> 
> }
> 
> verify_gimple_return failed with:
> tail-padding1.C:13:1: error: invalid conversion in return statement
>  }
>  ^
> struct X
> 
> struct X &
> 
> # VUSE <.MEM_3>
> return _4;
> 
> It seems the return type of function (struct X) differs with the type
> of return value (struct X&).
> Not sure how this is possible ?

You need to honor DECL_BY_REFERENCE of DECL_RESULT.

> To work around that, I guarded the transform on:
> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>                                              TREE_TYPE (retval)))
> 
> in the patch. Does that look OK ?
> 
> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> 
> Thanks,
> Prathamesh
> >
> >
> > Jeff
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 12:10                   ` Richard Biener
@ 2016-12-01 12:50                     ` Prathamesh Kulkarni
  2016-12-01 12:56                       ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-12-01 12:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 4430 bytes --]

On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >
>> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >>>
>> >>> __builtin_memcpy (a1, a2, a3);
>> >>> return a1;
>> >>>
>> >>> gets transformed to:
>> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >>> return _1;
>> >>>
>> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >>
>> >>
>> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> itself should do this rewrite.  But if this form is required to make
>> >> things work (I suppose you checked it _does_ actually work?) then
>> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> verified again there?
>> >
>> > So tail calling actually sits on the border between trees and RTL.
>> > Essentially it's an expand-time decision as we use information from trees as
>> > well as low level target information.
>> >
>> > I would not expect the former sequence to tail call.  The tail calling code
>> > does not know that the return value from memcpy will be a1.  Thus the tail
>> > calling code has to assume that it'll have to copy a1 into the return
>> > register after returning from memcpy, which obviously can't be done if we
>> > tail called memcpy.
>> >
>> > The second form is much more likely to turn into a tail call sequence
>> > because the return value from memcpy will be sitting in the proper register.
>> > This form out to work for most calling conventions that allow tail calls.
>> >
>> > We could (in theory) try and exploit the fact that memcpy returns its first
>> > argument as a return value, but that would only be helpful on a target where
>> > the first argument and return value use the same register. So I'd have a
>> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> > more general.
>> Thanks for the suggestion. The attached patch creates artificial lhs,
>> and returns it if the function returns it's argument and that argument
>> is used as return-value.
>>
>> eg:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>   <bb 2> [0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return a1_2(D);
>>
>> }
>>
>> is transformed to:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>   void * _6;
>>
>>   <bb 2> [0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return _6;
>>
>> }
>>
>> While testing, I came across an issue with function f() defined
>> intail-padding1.C:
>> struct X
>> {
>>   ~X() {}
>>   int n;
>>   char d;
>> };
>>
>> X f()
>> {
>>   X nrvo;
>>   __builtin_memset (&nrvo, 0, sizeof(X));
>>   return nrvo;
>> }
>>
>> input to the pass:
>> X f() ()
>> {
>>   <bb 2> [0.0%]:
>>   # .MEM_3 = VDEF <.MEM_1(D)>
>>   __builtin_memset (nrvo_2(D), 0, 8);
>>   # VUSE <.MEM_3>
>>   return nrvo_2(D);
>>
>> }
>>
>> verify_gimple_return failed with:
>> tail-padding1.C:13:1: error: invalid conversion in return statement
>>  }
>>  ^
>> struct X
>>
>> struct X &
>>
>> # VUSE <.MEM_3>
>> return _4;
>>
>> It seems the return type of function (struct X) differs with the type
>> of return value (struct X&).
>> Not sure how this is possible ?
>
> You need to honor DECL_BY_REFERENCE of DECL_RESULT.
Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
resolved the error.
Does the attached version look OK ?
Validation in progress.

Thanks,
Prathamesh
>
>> To work around that, I guarded the transform on:
>> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>>                                              TREE_TYPE (retval)))
>>
>> in the patch. Does that look OK ?
>>
>> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> Cross-tested on arm*-*-*, aarch64*-*-*.
>>
>> Thanks,
>> Prathamesh
>> >
>> >
>> > Jeff
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[-- Attachment #2: tailcall-6.diff --]
[-- Type: text/plain, Size: 2297 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..a1c8bd7 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,36 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call)
+		      && ret_stmt)
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+			{
+			  ass_var = make_ssa_name (TREE_TYPE (arg));
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 12:50                     ` Prathamesh Kulkarni
@ 2016-12-01 12:56                       ` Richard Biener
  2016-12-01 13:05                         ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-01 12:56 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Jeff Law, gcc Patches

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >
> >> >>> For the tail-call, issue should we artificially create a lhs and use that
> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >>>
> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >>> return a1;
> >> >>>
> >> >>> gets transformed to:
> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >>> return _1;
> >> >>>
> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >>
> >> >>
> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> verified again there?
> >> >
> >> > So tail calling actually sits on the border between trees and RTL.
> >> > Essentially it's an expand-time decision as we use information from trees as
> >> > well as low level target information.
> >> >
> >> > I would not expect the former sequence to tail call.  The tail calling code
> >> > does not know that the return value from memcpy will be a1.  Thus the tail
> >> > calling code has to assume that it'll have to copy a1 into the return
> >> > register after returning from memcpy, which obviously can't be done if we
> >> > tail called memcpy.
> >> >
> >> > The second form is much more likely to turn into a tail call sequence
> >> > because the return value from memcpy will be sitting in the proper register.
> >> > This form out to work for most calling conventions that allow tail calls.
> >> >
> >> > We could (in theory) try and exploit the fact that memcpy returns its first
> >> > argument as a return value, but that would only be helpful on a target where
> >> > the first argument and return value use the same register. So I'd have a
> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
> >> > more general.
> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> and returns it if the function returns it's argument and that argument
> >> is used as return-value.
> >>
> >> eg:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>   <bb 2> [0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return a1_2(D);
> >>
> >> }
> >>
> >> is transformed to:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>   void * _6;
> >>
> >>   <bb 2> [0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return _6;
> >>
> >> }
> >>
> >> While testing, I came across an issue with function f() defined
> >> intail-padding1.C:
> >> struct X
> >> {
> >>   ~X() {}
> >>   int n;
> >>   char d;
> >> };
> >>
> >> X f()
> >> {
> >>   X nrvo;
> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >>   return nrvo;
> >> }
> >>
> >> input to the pass:
> >> X f() ()
> >> {
> >>   <bb 2> [0.0%]:
> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >>   # VUSE <.MEM_3>
> >>   return nrvo_2(D);
> >>
> >> }
> >>
> >> verify_gimple_return failed with:
> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >>  }
> >>  ^
> >> struct X
> >>
> >> struct X &
> >>
> >> # VUSE <.MEM_3>
> >> return _4;
> >>
> >> It seems the return type of function (struct X) differs with the type
> >> of return value (struct X&).
> >> Not sure how this is possible ?
> >
> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> resolved the error.
> Does the attached version look OK ?

+                         ass_var = make_ssa_name (TREE_TYPE (arg));

can you try

      ass_var = copy_ssa_name (arg);

instead?  That way the underlying decl should make sure the
DECL_BY_REFERENCE check in the IL verification works.

Thanks,
Richard.


> Validation in progress.
> 
> Thanks,
> Prathamesh
> >
> >> To work around that, I guarded the transform on:
> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >>                                              TREE_TYPE (retval)))
> >>
> >> in the patch. Does that look OK ?
> >>
> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >
> >> > Jeff
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 12:56                       ` Richard Biener
@ 2016-12-01 13:05                         ` Prathamesh Kulkarni
  2016-12-01 13:08                           ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-12-01 13:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 5581 bytes --]

On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >
>> >> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >>>
>> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >>> return a1;
>> >> >>>
>> >> >>> gets transformed to:
>> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >>> return _1;
>> >> >>>
>> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >>
>> >> >>
>> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> verified again there?
>> >> >
>> >> > So tail calling actually sits on the border between trees and RTL.
>> >> > Essentially it's an expand-time decision as we use information from trees as
>> >> > well as low level target information.
>> >> >
>> >> > I would not expect the former sequence to tail call.  The tail calling code
>> >> > does not know that the return value from memcpy will be a1.  Thus the tail
>> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> > register after returning from memcpy, which obviously can't be done if we
>> >> > tail called memcpy.
>> >> >
>> >> > The second form is much more likely to turn into a tail call sequence
>> >> > because the return value from memcpy will be sitting in the proper register.
>> >> > This form out to work for most calling conventions that allow tail calls.
>> >> >
>> >> > We could (in theory) try and exploit the fact that memcpy returns its first
>> >> > argument as a return value, but that would only be helpful on a target where
>> >> > the first argument and return value use the same register. So I'd have a
>> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> >> > more general.
>> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> and returns it if the function returns it's argument and that argument
>> >> is used as return-value.
>> >>
>> >> eg:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return a1_2(D);
>> >>
>> >> }
>> >>
>> >> is transformed to:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   void * _6;
>> >>
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return _6;
>> >>
>> >> }
>> >>
>> >> While testing, I came across an issue with function f() defined
>> >> intail-padding1.C:
>> >> struct X
>> >> {
>> >>   ~X() {}
>> >>   int n;
>> >>   char d;
>> >> };
>> >>
>> >> X f()
>> >> {
>> >>   X nrvo;
>> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >>   return nrvo;
>> >> }
>> >>
>> >> input to the pass:
>> >> X f() ()
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >>   # VUSE <.MEM_3>
>> >>   return nrvo_2(D);
>> >>
>> >> }
>> >>
>> >> verify_gimple_return failed with:
>> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >>  }
>> >>  ^
>> >> struct X
>> >>
>> >> struct X &
>> >>
>> >> # VUSE <.MEM_3>
>> >> return _4;
>> >>
>> >> It seems the return type of function (struct X) differs with the type
>> >> of return value (struct X&).
>> >> Not sure how this is possible ?
>> >
>> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> resolved the error.
>> Does the attached version look OK ?
>
> +                         ass_var = make_ssa_name (TREE_TYPE (arg));
>
> can you try
>
>       ass_var = copy_ssa_name (arg);
>
> instead?  That way the underlying decl should make sure the
> DECL_BY_REFERENCE check in the IL verification works.
Done in the attached version and verified tail-padding1.C passes with
the change.
Does it look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
>> Validation in progress.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> To work around that, I guarded the transform on:
>> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >>                                              TREE_TYPE (retval)))
>> >>
>> >> in the patch. Does that look OK ?
>> >>
>> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >
>> >> > Jeff
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[-- Attachment #2: tailcall-7.diff --]
[-- Type: text/plain, Size: 2285 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..6135dc2 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,36 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call)
+		      && ret_stmt)
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+			{
+			  ass_var = copy_ssa_name (arg);
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 13:05                         ` Prathamesh Kulkarni
@ 2016-12-01 13:08                           ` Richard Biener
  2016-12-01 13:18                             ` Prathamesh Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-01 13:08 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Jeff Law, gcc Patches

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >
> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >>>
> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >>> return a1;
> >> >> >>>
> >> >> >>> gets transformed to:
> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >>> return _1;
> >> >> >>>
> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >> >>
> >> >> >>
> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> verified again there?
> >> >> >
> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> > Essentially it's an expand-time decision as we use information from trees as
> >> >> > well as low level target information.
> >> >> >
> >> >> > I would not expect the former sequence to tail call.  The tail calling code
> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> > register after returning from memcpy, which obviously can't be done if we
> >> >> > tail called memcpy.
> >> >> >
> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> > because the return value from memcpy will be sitting in the proper register.
> >> >> > This form out to work for most calling conventions that allow tail calls.
> >> >> >
> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
> >> >> > argument as a return value, but that would only be helpful on a target where
> >> >> > the first argument and return value use the same register. So I'd have a
> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
> >> >> > more general.
> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> and returns it if the function returns it's argument and that argument
> >> >> is used as return-value.
> >> >>
> >> >> eg:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return a1_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> is transformed to:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   void * _6;
> >> >>
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return _6;
> >> >>
> >> >> }
> >> >>
> >> >> While testing, I came across an issue with function f() defined
> >> >> intail-padding1.C:
> >> >> struct X
> >> >> {
> >> >>   ~X() {}
> >> >>   int n;
> >> >>   char d;
> >> >> };
> >> >>
> >> >> X f()
> >> >> {
> >> >>   X nrvo;
> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >>   return nrvo;
> >> >> }
> >> >>
> >> >> input to the pass:
> >> >> X f() ()
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >>   # VUSE <.MEM_3>
> >> >>   return nrvo_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> verify_gimple_return failed with:
> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >>  }
> >> >>  ^
> >> >> struct X
> >> >>
> >> >> struct X &
> >> >>
> >> >> # VUSE <.MEM_3>
> >> >> return _4;
> >> >>
> >> >> It seems the return type of function (struct X) differs with the type
> >> >> of return value (struct X&).
> >> >> Not sure how this is possible ?
> >> >
> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> resolved the error.
> >> Does the attached version look OK ?
> >
> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
> >
> > can you try
> >
> >       ass_var = copy_ssa_name (arg);
> >
> > instead?  That way the underlying decl should make sure the
> > DECL_BY_REFERENCE check in the IL verification works.
> Done in the attached version and verified tail-padding1.C passes with
> the change.
> Does it look OK ?

+         if (!ass_var)
+           {
+             /* Check if function returns one if it's arguments
+                and that argument is used as return value.
+                In that case create an artificial lhs to call_stmt,
+                and set it as the return value.  */
+
+             unsigned rf = gimple_call_return_flags (call);
+             if (rf & ERF_RETURNS_ARG)
+               {
+                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+                 if (argnum < gimple_call_num_args (call)
+                     && ret_stmt)

check && ret_stmt above together with !ass_var.

+                   {
+                     tree arg = gimple_call_arg (call, argnum);
+                     tree retval = gimple_return_retval (ret_stmt);
+                     if (retval
+                         && TREE_CODE (retval) == SSA_NAME
+                         && operand_equal_p (retval, arg, 0)
+                         && !DECL_BY_REFERENCE (DECL_RESULT 
(cfun->decl)))

the DECL_BY_REFERENCE check can be removed now (I think).

Richard.

> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >
> >> Validation in progress.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >> To work around that, I guarded the transform on:
> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >> >>                                              TREE_TYPE (retval)))
> >> >>
> >> >> in the patch. Does that look OK ?
> >> >>
> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >
> >> >> > Jeff
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 13:08                           ` Richard Biener
@ 2016-12-01 13:18                             ` Prathamesh Kulkarni
  2016-12-01 13:22                               ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-12-01 13:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc Patches

On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >> >
>> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >> >>>
>> >> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >> >>> return a1;
>> >> >> >>>
>> >> >> >>> gets transformed to:
>> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >> >>> return _1;
>> >> >> >>>
>> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >> >>
>> >> >> >>
>> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> >> verified again there?
>> >> >> >
>> >> >> > So tail calling actually sits on the border between trees and RTL.
>> >> >> > Essentially it's an expand-time decision as we use information from trees as
>> >> >> > well as low level target information.
>> >> >> >
>> >> >> > I would not expect the former sequence to tail call.  The tail calling code
>> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
>> >> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> >> > register after returning from memcpy, which obviously can't be done if we
>> >> >> > tail called memcpy.
>> >> >> >
>> >> >> > The second form is much more likely to turn into a tail call sequence
>> >> >> > because the return value from memcpy will be sitting in the proper register.
>> >> >> > This form out to work for most calling conventions that allow tail calls.
>> >> >> >
>> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
>> >> >> > argument as a return value, but that would only be helpful on a target where
>> >> >> > the first argument and return value use the same register. So I'd have a
>> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> >> >> > more general.
>> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> >> and returns it if the function returns it's argument and that argument
>> >> >> is used as return-value.
>> >> >>
>> >> >> eg:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return a1_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> is transformed to:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   void * _6;
>> >> >>
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return _6;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> While testing, I came across an issue with function f() defined
>> >> >> intail-padding1.C:
>> >> >> struct X
>> >> >> {
>> >> >>   ~X() {}
>> >> >>   int n;
>> >> >>   char d;
>> >> >> };
>> >> >>
>> >> >> X f()
>> >> >> {
>> >> >>   X nrvo;
>> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >> >>   return nrvo;
>> >> >> }
>> >> >>
>> >> >> input to the pass:
>> >> >> X f() ()
>> >> >> {
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >> >>   # VUSE <.MEM_3>
>> >> >>   return nrvo_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> verify_gimple_return failed with:
>> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >> >>  }
>> >> >>  ^
>> >> >> struct X
>> >> >>
>> >> >> struct X &
>> >> >>
>> >> >> # VUSE <.MEM_3>
>> >> >> return _4;
>> >> >>
>> >> >> It seems the return type of function (struct X) differs with the type
>> >> >> of return value (struct X&).
>> >> >> Not sure how this is possible ?
>> >> >
>> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> >> resolved the error.
>> >> Does the attached version look OK ?
>> >
>> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
>> >
>> > can you try
>> >
>> >       ass_var = copy_ssa_name (arg);
>> >
>> > instead?  That way the underlying decl should make sure the
>> > DECL_BY_REFERENCE check in the IL verification works.
>> Done in the attached version and verified tail-padding1.C passes with
>> the change.
>> Does it look OK ?
>
> +         if (!ass_var)
> +           {
> +             /* Check if function returns one if it's arguments
> +                and that argument is used as return value.
> +                In that case create an artificial lhs to call_stmt,
> +                and set it as the return value.  */
> +
> +             unsigned rf = gimple_call_return_flags (call);
> +             if (rf & ERF_RETURNS_ARG)
> +               {
> +                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
> +                 if (argnum < gimple_call_num_args (call)
> +                     && ret_stmt)
>
> check && ret_stmt above together with !ass_var.
Oops, sorry about that.
>
> +                   {
> +                     tree arg = gimple_call_arg (call, argnum);
> +                     tree retval = gimple_return_retval (ret_stmt);
> +                     if (retval
> +                         && TREE_CODE (retval) == SSA_NAME
> +                         && operand_equal_p (retval, arg, 0)
> +                         && !DECL_BY_REFERENCE (DECL_RESULT
> (cfun->decl)))
>
> the DECL_BY_REFERENCE check can be removed now (I think).
Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
differently:

tail-padding1.C:13:1: error: RESULT_DECL should be read only when
DECL_BY_REFERENCE is set
 }
 ^
while verifying SSA_NAME nrvo_4 in statement
# .MEM_3 = VDEF <.MEM_1(D)>
    nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
tail-padding1.C:13:1: internal compiler error: verify_ssa failed

Thanks,
Prathamesh
>
> Richard.
>
>> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Richard.
>> >
>> >
>> >> Validation in progress.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >> To work around that, I guarded the transform on:
>> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >> >>                                              TREE_TYPE (retval)))
>> >> >>
>> >> >> in the patch. Does that look OK ?
>> >> >>
>> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> >
>> >> >> > Jeff
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 13:18                             ` Prathamesh Kulkarni
@ 2016-12-01 13:22                               ` Richard Biener
  2016-12-01 22:27                                 ` Jeff Law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-01 13:22 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Jeff Law, gcc Patches

On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >> >
> >> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
> >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >> >>>
> >> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >> >>> return a1;
> >> >> >> >>>
> >> >> >> >>> gets transformed to:
> >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >> >>> return _1;
> >> >> >> >>>
> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> >> verified again there?
> >> >> >> >
> >> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> >> > Essentially it's an expand-time decision as we use information from trees as
> >> >> >> > well as low level target information.
> >> >> >> >
> >> >> >> > I would not expect the former sequence to tail call.  The tail calling code
> >> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
> >> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> >> > register after returning from memcpy, which obviously can't be done if we
> >> >> >> > tail called memcpy.
> >> >> >> >
> >> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> >> > because the return value from memcpy will be sitting in the proper register.
> >> >> >> > This form out to work for most calling conventions that allow tail calls.
> >> >> >> >
> >> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
> >> >> >> > argument as a return value, but that would only be helpful on a target where
> >> >> >> > the first argument and return value use the same register. So I'd have a
> >> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
> >> >> >> > more general.
> >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> >> and returns it if the function returns it's argument and that argument
> >> >> >> is used as return-value.
> >> >> >>
> >> >> >> eg:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return a1_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> is transformed to:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   void * _6;
> >> >> >>
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return _6;
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> While testing, I came across an issue with function f() defined
> >> >> >> intail-padding1.C:
> >> >> >> struct X
> >> >> >> {
> >> >> >>   ~X() {}
> >> >> >>   int n;
> >> >> >>   char d;
> >> >> >> };
> >> >> >>
> >> >> >> X f()
> >> >> >> {
> >> >> >>   X nrvo;
> >> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >> >>   return nrvo;
> >> >> >> }
> >> >> >>
> >> >> >> input to the pass:
> >> >> >> X f() ()
> >> >> >> {
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >> >>   # VUSE <.MEM_3>
> >> >> >>   return nrvo_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> verify_gimple_return failed with:
> >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >> >>  }
> >> >> >>  ^
> >> >> >> struct X
> >> >> >>
> >> >> >> struct X &
> >> >> >>
> >> >> >> # VUSE <.MEM_3>
> >> >> >> return _4;
> >> >> >>
> >> >> >> It seems the return type of function (struct X) differs with the type
> >> >> >> of return value (struct X&).
> >> >> >> Not sure how this is possible ?
> >> >> >
> >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> >> resolved the error.
> >> >> Does the attached version look OK ?
> >> >
> >> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
> >> >
> >> > can you try
> >> >
> >> >       ass_var = copy_ssa_name (arg);
> >> >
> >> > instead?  That way the underlying decl should make sure the
> >> > DECL_BY_REFERENCE check in the IL verification works.
> >> Done in the attached version and verified tail-padding1.C passes with
> >> the change.
> >> Does it look OK ?
> >
> > +         if (!ass_var)
> > +           {
> > +             /* Check if function returns one if it's arguments
> > +                and that argument is used as return value.
> > +                In that case create an artificial lhs to call_stmt,
> > +                and set it as the return value.  */
> > +
> > +             unsigned rf = gimple_call_return_flags (call);
> > +             if (rf & ERF_RETURNS_ARG)
> > +               {
> > +                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
> > +                 if (argnum < gimple_call_num_args (call)
> > +                     && ret_stmt)
> >
> > check && ret_stmt above together with !ass_var.
> Oops, sorry about that.
> >
> > +                   {
> > +                     tree arg = gimple_call_arg (call, argnum);
> > +                     tree retval = gimple_return_retval (ret_stmt);
> > +                     if (retval
> > +                         && TREE_CODE (retval) == SSA_NAME
> > +                         && operand_equal_p (retval, arg, 0)
> > +                         && !DECL_BY_REFERENCE (DECL_RESULT
> > (cfun->decl)))
> >
> > the DECL_BY_REFERENCE check can be removed now (I think).
> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
> differently:
> 
> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
> DECL_BY_REFERENCE is set
>  }
>  ^
> while verifying SSA_NAME nrvo_4 in statement
> # .MEM_3 = VDEF <.MEM_1(D)>
>     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
> tail-padding1.C:13:1: internal compiler error: verify_ssa failed

Hmm, ok.  Not sure why we enforce this.

Note that in the end this patch looks fishy -- iff we really need
the LHS on the assignment for correctness if we have the tailcall
flag set then what guarantees that later passes do not remove it
again?  So anybody removing a LHS would need to unset the tailcall flag?

Saying again that I don't know enough about the RTL part of tailcall
expansion.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >
> >> >> Validation in progress.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >> To work around that, I guarded the transform on:
> >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >> >> >>                                              TREE_TYPE (retval)))
> >> >> >>
> >> >> >> in the patch. Does that look OK ?
> >> >> >>
> >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> >
> >> >> >> >
> >> >> >> > Jeff
> >> >> >>
> >> >> >
> >> >> > --
> >> >> > Richard Biener <rguenther@suse.de>
> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 13:22                               ` Richard Biener
@ 2016-12-01 22:27                                 ` Jeff Law
  2016-12-02  6:04                                   ` Prathamesh Kulkarni
  2016-12-02  8:33                                   ` Richard Biener
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff Law @ 2016-12-01 22:27 UTC (permalink / raw)
  To: Richard Biener, Prathamesh Kulkarni; +Cc: gcc Patches

On 12/01/2016 06:22 AM, Richard Biener wrote:
>> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
>> differently:
>>
>> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
>> DECL_BY_REFERENCE is set
>>  }
>>  ^
>> while verifying SSA_NAME nrvo_4 in statement
>> # .MEM_3 = VDEF <.MEM_1(D)>
>>     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
>> tail-padding1.C:13:1: internal compiler error: verify_ssa failed
>
> Hmm, ok.  Not sure why we enforce this.
I don't know either.  But I would start by looking at tree-nrv.c since 
it looks (based on the variable names) that the named-value-return 
optimization kicked in.

>
> Note that in the end this patch looks fishy -- iff we really need
> the LHS on the assignment for correctness if we have the tailcall
> flag set then what guarantees that later passes do not remove it
> again?  So anybody removing a LHS would need to unset the tailcall flag?
>
> Saying again that I don't know enough about the RTL part of tailcall
> expansion.
The LHS on the assignment makes it easier to identify when a tail call 
is possible.  It's not needed for correctness.  Not having the LHS on 
the assignment just means we won't get an optimized tail call.

Under what circumstances would the LHS possibly be removed?  We know the 
return statement references the LHS, so it's not going to be something 
that DCE will do.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 22:27                                 ` Jeff Law
@ 2016-12-02  6:04                                   ` Prathamesh Kulkarni
  2016-12-02  8:33                                   ` Richard Biener
  1 sibling, 0 replies; 32+ messages in thread
From: Prathamesh Kulkarni @ 2016-12-02  6:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

On 2 December 2016 at 03:57, Jeff Law <law@redhat.com> wrote:
> On 12/01/2016 06:22 AM, Richard Biener wrote:
>>>
>>> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
>>> differently:
>>>
>>> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
>>> DECL_BY_REFERENCE is set
>>>  }
>>>  ^
>>> while verifying SSA_NAME nrvo_4 in statement
>>> # .MEM_3 = VDEF <.MEM_1(D)>
>>>     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
>>> tail-padding1.C:13:1: internal compiler error: verify_ssa failed
>>
>>
>> Hmm, ok.  Not sure why we enforce this.
>
> I don't know either.  But I would start by looking at tree-nrv.c since it
> looks (based on the variable names) that the named-value-return optimization
> kicked in.
Um, the name nrv0 was in the test-case itself. The transform takes
place in tailr1 pass,
which appears to be before nrv, so possibly this is not related to nrv ?

The verify check seems to be added in r161898 by Honza to fix PR 44813
based on Richard's following suggestion from
https://gcc.gnu.org/ml/gcc-patches/2010-07/msg00358.html:

"We should never see a defintion of a RESULT_DECL SSA name for
DECL_BY_REFERENCE RESULT_DECLs (that would
be a bug - we should add verification to the SSA verifier, can you
do add that?)."

The attached patch moves && ret_stmt together with !ass_var,
and keeps the !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
check, and adjusts tailcall-9.c testcase to scan _\[0-9\]* = __builtin_memcpy
in tailr1 dump since that's where the transform takes place.
Is this version OK ?

Thanks,
Prathamesh
>
>>
>> Note that in the end this patch looks fishy -- iff we really need
>> the LHS on the assignment for correctness if we have the tailcall
>> flag set then what guarantees that later passes do not remove it
>> again?  So anybody removing a LHS would need to unset the tailcall flag?
>>
>> Saying again that I don't know enough about the RTL part of tailcall
>> expansion.
>
> The LHS on the assignment makes it easier to identify when a tail call is
> possible.  It's not needed for correctness.  Not having the LHS on the
> assignment just means we won't get an optimized tail call.
>
> Under what circumstances would the LHS possibly be removed?  We know the
> return statement references the LHS, so it's not going to be something that
> DCE will do.
>
> jeff

[-- Attachment #2: tailcall-8.diff --]
[-- Type: text/plain, Size: 2282 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..9c482f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailr-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump "_\[0-9\]* = __builtin_memcpy" "tailr1" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..64f624f 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,35 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var && ret_stmt)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call))
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+			{
+			  ass_var = copy_ssa_name (arg);
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-01 22:27                                 ` Jeff Law
  2016-12-02  6:04                                   ` Prathamesh Kulkarni
@ 2016-12-02  8:33                                   ` Richard Biener
  2016-12-05 22:53                                     ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-02  8:33 UTC (permalink / raw)
  To: Jeff Law; +Cc: Prathamesh Kulkarni, gcc Patches

On Thu, 1 Dec 2016, Jeff Law wrote:

> On 12/01/2016 06:22 AM, Richard Biener wrote:
> > > Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
> > > differently:
> > > 
> > > tail-padding1.C:13:1: error: RESULT_DECL should be read only when
> > > DECL_BY_REFERENCE is set
> > >  }
> > >  ^
> > > while verifying SSA_NAME nrvo_4 in statement
> > > # .MEM_3 = VDEF <.MEM_1(D)>
> > >     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
> > > tail-padding1.C:13:1: internal compiler error: verify_ssa failed
> > 
> > Hmm, ok.  Not sure why we enforce this.
> I don't know either.  But I would start by looking at tree-nrv.c since it
> looks (based on the variable names) that the named-value-return optimization
> kicked in.
> 
> > 
> > Note that in the end this patch looks fishy -- iff we really need
> > the LHS on the assignment for correctness if we have the tailcall
> > flag set then what guarantees that later passes do not remove it
> > again?  So anybody removing a LHS would need to unset the tailcall flag?
> > 
> > Saying again that I don't know enough about the RTL part of tailcall
> > expansion.
> The LHS on the assignment makes it easier to identify when a tail call is
> possible.  It's not needed for correctness.  Not having the LHS on the
> assignment just means we won't get an optimized tail call.
> 
> Under what circumstances would the LHS possibly be removed?  We know the
> return statement references the LHS, so it's not going to be something that
> DCE will do.

Well, I thought Prathamesh added the optimization to copy-propagate
the lhs from the returned argument.  So we'd have both transforms here.

Of course as always the user could have written the code in this way.

If the LHS is not required for correctness then I don't think we need
to put it there - Pratamesh verified the call is tail-called already
if marked by the tailcall pass, even if the LHS is not present.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-02  8:33                                   ` Richard Biener
@ 2016-12-05 22:53                                     ` Jeff Law
  2016-12-06  8:25                                       ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Law @ 2016-12-05 22:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc Patches

On 12/02/2016 01:33 AM, Richard Biener wrote:
>> The LHS on the assignment makes it easier to identify when a tail call is
>> possible.  It's not needed for correctness.  Not having the LHS on the
>> assignment just means we won't get an optimized tail call.
>>
>> Under what circumstances would the LHS possibly be removed?  We know the
>> return statement references the LHS, so it's not going to be something that
>> DCE will do.
>
> Well, I thought Prathamesh added the optimization to copy-propagate
> the lhs from the returned argument.  So we'd have both transforms here.
That seems like a mistake -- the fact that we can copy propagate the LHS 
from the returned argument is interesting, but in practice I've found it 
to not be useful to do so.

The problem is it makes the value look live across a the call and we're 
then dependent upon the register allocator to know the trick about the 
returned argument value and apply it consistently -- which it does not 
last I checked.

I think we're better off leaving the call in the form of LHS = call () 
if the return value is used.  That's going to be more palatable to tail 
calling.

>
> Of course as always the user could have written the code in this way.
>
> If the LHS is not required for correctness then I don't think we need
> to put it there - Pratamesh verified the call is tail-called already
> if marked by the tailcall pass, even if the LHS is not present.
But if the function returns the value from the tail call, then going 
through an LHS is the right thing to do.  Using the magic "argX will be 
the return value" seems clever, but actually hurts in practice.
jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-05 22:53                                     ` Jeff Law
@ 2016-12-06  8:25                                       ` Richard Biener
  2016-12-06 10:16                                         ` Richard Biener
  2016-12-07 16:54                                         ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Richard Biener @ 2016-12-06  8:25 UTC (permalink / raw)
  To: Jeff Law; +Cc: Prathamesh Kulkarni, gcc Patches

On Mon, 5 Dec 2016, Jeff Law wrote:

> On 12/02/2016 01:33 AM, Richard Biener wrote:
> > > The LHS on the assignment makes it easier to identify when a tail call is
> > > possible.  It's not needed for correctness.  Not having the LHS on the
> > > assignment just means we won't get an optimized tail call.
> > > 
> > > Under what circumstances would the LHS possibly be removed?  We know the
> > > return statement references the LHS, so it's not going to be something
> > > that
> > > DCE will do.
> > 
> > Well, I thought Prathamesh added the optimization to copy-propagate
> > the lhs from the returned argument.  So we'd have both transforms here.
> That seems like a mistake -- the fact that we can copy propagate the LHS from
> the returned argument is interesting, but in practice I've found it to not be
> useful to do so.
> 
> The problem is it makes the value look live across a the call and we're then
> dependent upon the register allocator to know the trick about the returned
> argument value and apply it consistently -- which it does not last I checked.
> 
> I think we're better off leaving the call in the form of LHS = call () if the
> return value is used.  That's going to be more palatable to tail calling.

Yes, that's something I also raised earlier in the thread.  Note that
any kind of value-numbering probably wants to know the equivalence
for simplifications but in the end wants to disable propagating the
copy (in fact it should propagate the return value from the point of
the call).  I suppose I know how to implement that in FRE/PRE given it has
separate value-numbering and elimination phases.  Something for GCC 8.

> > Of course as always the user could have written the code in this way.
> > 
> > If the LHS is not required for correctness then I don't think we need
> > to put it there - Pratamesh verified the call is tail-called already
> > if marked by the tailcall pass, even if the LHS is not present.
> But if the function returns the value from the tail call, then going through
> an LHS is the right thing to do.  Using the magic "argX will be the return
> value" seems clever, but actually hurts in practice.

So we do want the reverse transform (for the case of returning by
reference that's going to be tricky if not impossible due to the
IL hygiene we enforce).

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-06  8:25                                       ` Richard Biener
@ 2016-12-06 10:16                                         ` Richard Biener
  2016-12-07 21:20                                           ` Jeff Law
  2016-12-07 16:54                                         ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-06 10:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: Prathamesh Kulkarni, gcc Patches

On Tue, 6 Dec 2016, Richard Biener wrote:

> On Mon, 5 Dec 2016, Jeff Law wrote:
> 
> > On 12/02/2016 01:33 AM, Richard Biener wrote:
> > > > The LHS on the assignment makes it easier to identify when a tail call is
> > > > possible.  It's not needed for correctness.  Not having the LHS on the
> > > > assignment just means we won't get an optimized tail call.
> > > > 
> > > > Under what circumstances would the LHS possibly be removed?  We know the
> > > > return statement references the LHS, so it's not going to be something
> > > > that
> > > > DCE will do.
> > > 
> > > Well, I thought Prathamesh added the optimization to copy-propagate
> > > the lhs from the returned argument.  So we'd have both transforms here.
> > That seems like a mistake -- the fact that we can copy propagate the LHS from
> > the returned argument is interesting, but in practice I've found it to not be
> > useful to do so.
> > 
> > The problem is it makes the value look live across a the call and we're then
> > dependent upon the register allocator to know the trick about the returned
> > argument value and apply it consistently -- which it does not last I checked.
> > 
> > I think we're better off leaving the call in the form of LHS = call () if the
> > return value is used.  That's going to be more palatable to tail calling.
> 
> Yes, that's something I also raised earlier in the thread.  Note that
> any kind of value-numbering probably wants to know the equivalence
> for simplifications but in the end wants to disable propagating the
> copy (in fact it should propagate the return value from the point of
> the call).  I suppose I know how to implement that in FRE/PRE given it has
> separate value-numbering and elimination phases.  Something for GCC 8.

The following does that (it shows we don't handle separating LHS
and overall stmt effect very well).  It optimizes a testcase like

void *foo (void *p, int c, __SIZE_TYPE__ n)
{
  void *q = __builtin_memset (p, c, n);
  if (q == p)
    return p;
  return q;
}

to

foo (void * p, int c, long unsigned int n)
{
  void * q;

  <bb 2> [0.0%]:
  q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D));
  return q_7;

in early FRE.

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c  (revision 243282)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4065,10 +4068,11 @@ eliminate_avail (tree op)
   tree valnum = VN_INFO (op)->valnum;
   if (TREE_CODE (valnum) == SSA_NAME)
     {
+      if (el_avail.length () > SSA_NAME_VERSION (valnum)
+         && el_avail[SSA_NAME_VERSION (valnum)])
+       return el_avail[SSA_NAME_VERSION (valnum)];
       if (SSA_NAME_IS_DEFAULT_DEF (valnum))
        return valnum;
-      if (el_avail.length () > SSA_NAME_VERSION (valnum))
-       return el_avail[SSA_NAME_VERSION (valnum)];
     }
   else if (is_gimple_min_invariant (valnum))
     return valnum;
@@ -4263,6 +4275,14 @@ eliminate_dom_walker::before_dom_childre
                eliminate_push_avail (sprime);
            }
 
+         /* While we might have value numbered a call return value to
+            one of its arguments the call itself might not be solely
+            represented by its return value.  Thus do not ignore
+            side-effects indicated by a VARYING vdef.  */
+         if (gimple_vdef (stmt)
+             && VN_INFO (gimple_vdef (stmt))->valnum == gimple_vdef 
(stmt))
+           sprime = NULL_TREE;
+
          /* If this now constitutes a copy duplicate points-to
             and range info appropriately.  This is especially
             important for inserted code.  See tree-ssa-copy.c
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c        (revision 243282)
+++ gcc/tree-ssa-sccvn.c        (working copy)
@@ -3951,6 +4061,25 @@ visit_use (tree use)
              changed = defs_to_varying (call_stmt);
              goto done;
            }
+
+         int rflags = gimple_call_return_flags (call_stmt);
+         if (rflags & ERF_RETURNS_ARG)
+           {
+             unsigned argnum = rflags & ERF_RETURN_ARG_MASK;
+             if (argnum < gimple_call_num_args (call_stmt))
+               {
+                 tree arg = gimple_call_arg (call_stmt, argnum);
+                 if (TREE_CODE (arg) == SSA_NAME
+                     || is_gimple_min_invariant (arg))
+                   {
+                     changed = visit_copy (lhs, arg);
+                     if (gimple_vdef (call_stmt))
+                       changed |= set_ssa_val_to (gimple_vdef 
(call_stmt),
+                                                  gimple_vdef 
(call_stmt));
+                     goto done;
+                   }
+               }
+           }
        }
 
       /* Pick up flags from a devirtualization target.  */


> > > Of course as always the user could have written the code in this way.
> > > 
> > > If the LHS is not required for correctness then I don't think we need
> > > to put it there - Pratamesh verified the call is tail-called already
> > > if marked by the tailcall pass, even if the LHS is not present.
> > But if the function returns the value from the tail call, then going through
> > an LHS is the right thing to do.  Using the magic "argX will be the return
> > value" seems clever, but actually hurts in practice.
> 
> So we do want the reverse transform (for the case of returning by
> reference that's going to be tricky if not impossible due to the
> IL hygiene we enforce).
> 
> Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-06  8:25                                       ` Richard Biener
  2016-12-06 10:16                                         ` Richard Biener
@ 2016-12-07 16:54                                         ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2016-12-07 16:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc Patches

On 12/06/2016 01:25 AM, Richard Biener wrote:
>> But if the function returns the value from the tail call, then going through
>> an LHS is the right thing to do.  Using the magic "argX will be the return
>> value" seems clever, but actually hurts in practice.
>
> So we do want the reverse transform (for the case of returning by
> reference that's going to be tricky if not impossible due to the
> IL hygiene we enforce).
It might be useful, but I'd like to see instrumentation before doing any 
significant work on this problem.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-06 10:16                                         ` Richard Biener
@ 2016-12-07 21:20                                           ` Jeff Law
  2016-12-09  8:10                                             ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Law @ 2016-12-07 21:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc Patches

On 12/06/2016 03:16 AM, Richard Biener wrote:
> On Tue, 6 Dec 2016, Richard Biener wrote:
>
>> On Mon, 5 Dec 2016, Jeff Law wrote:
>>
>>> On 12/02/2016 01:33 AM, Richard Biener wrote:
>>>>> The LHS on the assignment makes it easier to identify when a tail call is
>>>>> possible.  It's not needed for correctness.  Not having the LHS on the
>>>>> assignment just means we won't get an optimized tail call.
>>>>>
>>>>> Under what circumstances would the LHS possibly be removed?  We know the
>>>>> return statement references the LHS, so it's not going to be something
>>>>> that
>>>>> DCE will do.
>>>>
>>>> Well, I thought Prathamesh added the optimization to copy-propagate
>>>> the lhs from the returned argument.  So we'd have both transforms here.
>>> That seems like a mistake -- the fact that we can copy propagate the LHS from
>>> the returned argument is interesting, but in practice I've found it to not be
>>> useful to do so.
>>>
>>> The problem is it makes the value look live across a the call and we're then
>>> dependent upon the register allocator to know the trick about the returned
>>> argument value and apply it consistently -- which it does not last I checked.
>>>
>>> I think we're better off leaving the call in the form of LHS = call () if the
>>> return value is used.  That's going to be more palatable to tail calling.
>>
>> Yes, that's something I also raised earlier in the thread.  Note that
>> any kind of value-numbering probably wants to know the equivalence
>> for simplifications but in the end wants to disable propagating the
>> copy (in fact it should propagate the return value from the point of
>> the call).  I suppose I know how to implement that in FRE/PRE given it has
>> separate value-numbering and elimination phases.  Something for GCC 8.
>
> The following does that (it shows we don't handle separating LHS
> and overall stmt effect very well).  It optimizes a testcase like
>
> void *foo (void *p, int c, __SIZE_TYPE__ n)
> {
>   void *q = __builtin_memset (p, c, n);
>   if (q == p)
>     return p;
>   return q;
> }
>
> to
>
> foo (void * p, int c, long unsigned int n)
> {
>   void * q;
>
>   <bb 2> [0.0%]:
>   q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D));
>   return q_7;
>
> in early FRE.
Yea.  Not sure how often something like that would happen in practice, 
but using the equivalence to simplify rather than for propagation seems 
like the way to go.

I keep thinking about doing some similar in DOM, but haven't gotten 
around to seeing what the fallout would be.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-07 21:20                                           ` Jeff Law
@ 2016-12-09  8:10                                             ` Richard Biener
  2016-12-12 16:36                                               ` Jeff Law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2016-12-09  8:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: Prathamesh Kulkarni, gcc Patches

On Wed, 7 Dec 2016, Jeff Law wrote:

> On 12/06/2016 03:16 AM, Richard Biener wrote:
> > On Tue, 6 Dec 2016, Richard Biener wrote:
> > 
> > > On Mon, 5 Dec 2016, Jeff Law wrote:
> > > 
> > > > On 12/02/2016 01:33 AM, Richard Biener wrote:
> > > > > > The LHS on the assignment makes it easier to identify when a tail
> > > > > > call is
> > > > > > possible.  It's not needed for correctness.  Not having the LHS on
> > > > > > the
> > > > > > assignment just means we won't get an optimized tail call.
> > > > > > 
> > > > > > Under what circumstances would the LHS possibly be removed?  We know
> > > > > > the
> > > > > > return statement references the LHS, so it's not going to be
> > > > > > something
> > > > > > that
> > > > > > DCE will do.
> > > > > 
> > > > > Well, I thought Prathamesh added the optimization to copy-propagate
> > > > > the lhs from the returned argument.  So we'd have both transforms
> > > > > here.
> > > > That seems like a mistake -- the fact that we can copy propagate the LHS
> > > > from
> > > > the returned argument is interesting, but in practice I've found it to
> > > > not be
> > > > useful to do so.
> > > > 
> > > > The problem is it makes the value look live across a the call and we're
> > > > then
> > > > dependent upon the register allocator to know the trick about the
> > > > returned
> > > > argument value and apply it consistently -- which it does not last I
> > > > checked.
> > > > 
> > > > I think we're better off leaving the call in the form of LHS = call ()
> > > > if the
> > > > return value is used.  That's going to be more palatable to tail
> > > > calling.
> > > 
> > > Yes, that's something I also raised earlier in the thread.  Note that
> > > any kind of value-numbering probably wants to know the equivalence
> > > for simplifications but in the end wants to disable propagating the
> > > copy (in fact it should propagate the return value from the point of
> > > the call).  I suppose I know how to implement that in FRE/PRE given it has
> > > separate value-numbering and elimination phases.  Something for GCC 8.
> > 
> > The following does that (it shows we don't handle separating LHS
> > and overall stmt effect very well).  It optimizes a testcase like
> > 
> > void *foo (void *p, int c, __SIZE_TYPE__ n)
> > {
> >   void *q = __builtin_memset (p, c, n);
> >   if (q == p)
> >     return p;
> >   return q;
> > }
> > 
> > to
> > 
> > foo (void * p, int c, long unsigned int n)
> > {
> >   void * q;
> > 
> >   <bb 2> [0.0%]:
> >   q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D));
> >   return q_7;
> > 
> > in early FRE.
> Yea.  Not sure how often something like that would happen in practice, but
> using the equivalence to simplify rather than for propagation seems like the
> way to go.
> 
> I keep thinking about doing some similar in DOM, but haven't gotten around to
> seeing what the fallout would be.

Shouldn't be too bad (it would require to keep an additional 
what-to-substitute-for-value-X lattice during the DOM walk).  But it
will still require some "magic" to decide about those conditional
equivalences... (I think).

Separating "values" from what we substitute during elimination is a good
thing in general, so we can be more aggressive with the value parts.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [tree-tailcall] Check if function returns it's argument
  2016-12-09  8:10                                             ` Richard Biener
@ 2016-12-12 16:36                                               ` Jeff Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Law @ 2016-12-12 16:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc Patches

On 12/09/2016 01:10 AM, Richard Biener wrote:

>> Yea.  Not sure how often something like that would happen in practice, but
>> using the equivalence to simplify rather than for propagation seems like the
>> way to go.
>>
>> I keep thinking about doing some similar in DOM, but haven't gotten around to
>> seeing what the fallout would be.
>
> Shouldn't be too bad (it would require to keep an additional
> what-to-substitute-for-value-X lattice during the DOM walk).  But it
> will still require some "magic" to decide about those conditional
> equivalences... (I think).
>
> Separating "values" from what we substitute during elimination is a good
> thing in general, so we can be more aggressive with the value parts.
I think we'd just need to change SSA_NAME_VALUE from a vector of trees 
to a pair representation or to have an on-the-side bitmap to indicate 
values that can be used for simplification but not for propagation.

It shouldn't be terrible.
jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-12-12 16:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-24  7:18 [tree-tailcall] Check if function returns it's argument Prathamesh Kulkarni
2016-11-24  8:37 ` Richard Biener
2016-11-24 12:15   ` Prathamesh Kulkarni
2016-11-24 12:18     ` Richard Biener
2016-11-24 12:21       ` Jakub Jelinek
2016-11-24 12:23         ` Richard Biener
2016-11-24 12:32       ` Prathamesh Kulkarni
2016-11-24 12:38         ` Richard Biener
2016-11-25  5:16           ` Prathamesh Kulkarni
2016-11-25  8:07             ` Richard Biener
2016-11-25  8:18               ` Prathamesh Kulkarni
2016-11-25  8:25                 ` Richard Biener
2016-11-25  8:31                   ` Prathamesh Kulkarni
2016-11-25 15:47               ` Jeff Law
2016-12-01 11:42                 ` Prathamesh Kulkarni
2016-12-01 12:10                   ` Richard Biener
2016-12-01 12:50                     ` Prathamesh Kulkarni
2016-12-01 12:56                       ` Richard Biener
2016-12-01 13:05                         ` Prathamesh Kulkarni
2016-12-01 13:08                           ` Richard Biener
2016-12-01 13:18                             ` Prathamesh Kulkarni
2016-12-01 13:22                               ` Richard Biener
2016-12-01 22:27                                 ` Jeff Law
2016-12-02  6:04                                   ` Prathamesh Kulkarni
2016-12-02  8:33                                   ` Richard Biener
2016-12-05 22:53                                     ` Jeff Law
2016-12-06  8:25                                       ` Richard Biener
2016-12-06 10:16                                         ` Richard Biener
2016-12-07 21:20                                           ` Jeff Law
2016-12-09  8:10                                             ` Richard Biener
2016-12-12 16:36                                               ` Jeff Law
2016-12-07 16:54                                         ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).