public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch i386]: Expand sibling-tail-calls via accumulator register
@ 2014-05-22 21:33 Kai Tietz
  2014-05-22 22:07 ` H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-22 21:33 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Henderson

Hello,

This patch avoids for sibling-tail-calls the use of pseudo-register.  Instead it uses for load of memory-address the accumulator-register.  By this we avoid that IRA/LRA need to choose a register.  So we reduce register-pressure.

The accumulator-register is always a valid register on tail-call case.  All other registers might be callee-saved, or used for argument-passing.  The only case where we would use the accumulator on call is the variadic-case for x86_64 ABI.  Just that this function never is a candidate for sibling-tail-calls.

ChangeLog

2014-05-22  Kai Tietz  <ktietz@redhat.com>

	* config/i386/i386.c (ix86_expand_call): Enforce for sibcalls
	on memory the use of accumulator-register.

Regression tested for x86_64-unknown-linux-gnu, x86_64-w64-mingw32, and i686-pc-cygwin.
Ok for apply?

Regards,
Kai

Index: i386.c
===================================================================
--- i386.c	(Revision 210412)
+++ i386.c	(Arbeitskopie)
@@ -24898,8 +24898,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 	   ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
 	   : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
     {
+      rtx r;
       fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
-      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
+      if (!sibcall)
+	r = copy_to_mode_reg (word_mode, fnaddr);
+      else
+	{
+	  r = gen_rtx_REG (word_mode, AX_REG);
+	  if (! general_operand (fnaddr, VOIDmode))
+		    fnaddr = force_operand (fnaddr, r);
+	  if (fnaddr != r)
+	    emit_move_insn (r, fnaddr);
+	}
+      fnaddr = gen_rtx_MEM (QImode, r);
     }
 
   call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-22 21:33 [patch i386]: Expand sibling-tail-calls via accumulator register Kai Tietz
@ 2014-05-22 22:07 ` H.J. Lu
  2014-05-23 18:04   ` Jeff Law
  2014-05-23 18:16 ` Jeff Law
  2014-05-27 15:51 ` Richard Henderson
  2 siblings, 1 reply; 39+ messages in thread
From: H.J. Lu @ 2014-05-22 22:07 UTC (permalink / raw)
  To: Kai Tietz; +Cc: GCC Patches, Richard Henderson

On Thu, May 22, 2014 at 2:33 PM, Kai Tietz <ktietz@redhat.com> wrote:
> Hello,
>
> This patch avoids for sibling-tail-calls the use of pseudo-register.  Instead it uses for load of memory-address the accumulator-register.  By this we avoid that IRA/LRA need to choose a register.  So we reduce register-pressure.
>
> The accumulator-register is always a valid register on tail-call case.  All other registers might be callee-saved, or used for argument-passing.  The only case where we would use the accumulator on call is the variadic-case for x86_64 ABI.  Just that this function never is a candidate for sibling-tail-calls.
>
> ChangeLog
>
> 2014-05-22  Kai Tietz  <ktietz@redhat.com>
>
>         * config/i386/i386.c (ix86_expand_call): Enforce for sibcalls
>         on memory the use of accumulator-register.
>
> Regression tested for x86_64-unknown-linux-gnu, x86_64-w64-mingw32, and i686-pc-cygwin.
> Ok for apply?
>
> Regards,
> Kai
>
> Index: i386.c
> ===================================================================
> --- i386.c      (Revision 210412)
> +++ i386.c      (Arbeitskopie)
> @@ -24898,8 +24898,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
>            ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
>            : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
>      {
> +      rtx r;
>        fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
> -      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
> +      if (!sibcall)
> +       r = copy_to_mode_reg (word_mode, fnaddr);
> +      else
> +       {
> +         r = gen_rtx_REG (word_mode, AX_REG)

If fnaddr points to a function with variable argument list in
64-bit, AX_REG may be used to store number of FP arguments
passed in registers.

> +         if (! general_operand (fnaddr, VOIDmode))
> +                   fnaddr = force_operand (fnaddr, r);
> +         if (fnaddr != r)
> +           emit_move_insn (r, fnaddr);
> +       }
> +      fnaddr = gen_rtx_MEM (QImode, r);
>      }
>
>    call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);



-- 
H.J.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-22 22:07 ` H.J. Lu
@ 2014-05-23 18:04   ` Jeff Law
  2014-05-26 18:20     ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Law @ 2014-05-23 18:04 UTC (permalink / raw)
  To: H.J. Lu, Kai Tietz; +Cc: GCC Patches, Richard Henderson

On 05/22/14 16:07, H.J. Lu wrote:
> On Thu, May 22, 2014 at 2:33 PM, Kai Tietz <ktietz@redhat.com> wrote:
>> Hello,
>>
>> This patch avoids for sibling-tail-calls the use of pseudo-register.  Instead it uses for load of memory-address the accumulator-register.  By this we avoid that IRA/LRA need to choose a register.  So we reduce register-pressure.
>>
>> The accumulator-register is always a valid register on tail-call case.  All other registers might be callee-saved, or used for argument-passing.  The only case where we would use the accumulator on call is the variadic-case for x86_64 ABI.  Just that this function never is a candidate for sibling-tail-calls.
>>
>> ChangeLog
>>
>> 2014-05-22  Kai Tietz  <ktietz@redhat.com>
>>
>>          * config/i386/i386.c (ix86_expand_call): Enforce for sibcalls
>>          on memory the use of accumulator-register.
>>
>> Regression tested for x86_64-unknown-linux-gnu, x86_64-w64-mingw32, and i686-pc-cygwin.
>> Ok for apply?
>>
>> Regards,
>> Kai
>>
>> Index: i386.c
>> ===================================================================
>> --- i386.c      (Revision 210412)
>> +++ i386.c      (Arbeitskopie)
>> @@ -24898,8 +24898,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
>>             ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
>>             : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
>>       {
>> +      rtx r;
>>         fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
>> -      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
>> +      if (!sibcall)
>> +       r = copy_to_mode_reg (word_mode, fnaddr);
>> +      else
>> +       {
>> +         r = gen_rtx_REG (word_mode, AX_REG)
>
> If fnaddr points to a function with variable argument list in
> 64-bit, AX_REG may be used to store number of FP arguments
> passed in registers.
Right, but as Kai mentioned earlier, a varardic function should have 
been rejected by now as a sibcall target.

Regardless, I think adding a check here shouldn't hurt and makes the 
backend more bullet-proof if the target independent bits get smarter in 
the future.

jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-22 21:33 [patch i386]: Expand sibling-tail-calls via accumulator register Kai Tietz
  2014-05-22 22:07 ` H.J. Lu
@ 2014-05-23 18:16 ` Jeff Law
  2014-05-27 15:51 ` Richard Henderson
  2 siblings, 0 replies; 39+ messages in thread
From: Jeff Law @ 2014-05-23 18:16 UTC (permalink / raw)
  To: Kai Tietz, GCC Patches; +Cc: Richard Henderson

On 05/22/14 15:33, Kai Tietz wrote:
> Hello,
>
> This patch avoids for sibling-tail-calls the use of pseudo-register.
> Instead it uses for load of memory-address the accumulator-register.
> By this we avoid that IRA/LRA need to choose a register.  So we
> reduce register-pressure.
>
> The accumulator-register is always a valid register on tail-call
> case.  All other registers might be callee-saved, or used for
> argument-passing.  The only case where we would use the accumulator
> on call is the variadic-case for x86_64 ABI.  Just that this function
> never is a candidate for sibling-tail-calls.
>
> ChangeLog
>
> 2014-05-22  Kai Tietz  <ktietz@redhat.com>
>
> * config/i386/i386.c (ix86_expand_call): Enforce for sibcalls on
> memory the use of accumulator-register.
I'm generally not a fan of explicitly mentioning hard registers in RTL. 
  Though most of the major problems related to doing that have been 
resolved through the years.


>
> Regression tested for x86_64-unknown-linux-gnu, x86_64-w64-mingw32,
> and i686-pc-cygwin. Ok for apply?
In the interest of defensive programming, can you verify that fnaddr 
doesn't refer to a varardic function?  Hmm, I guess we can't get to a 
signature here.  So, never mind.

So I think the way to go is to ensure that the x86 port always rejects 
sibcalls to a varardic target, which I think can be done in 
ix86-function_ok_for_sibcall.

With that change this patch should be OK.  But please post it one more 
time for a final review.

jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-23 18:04   ` Jeff Law
@ 2014-05-26 18:20     ` Kai Tietz
  2014-05-26 18:32       ` Jakub Jelinek
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-26 18:20 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, GCC Patches, Richard Henderson

Hi,

adjusted patch to make it more bullet-proved and tested it.

2014-05-26  Kai Tietz  <ktietz@redhat.com>

	* config/i386/i386.c (ix86_expand_call): Enforce for sibcalls
	on memory the use of accumulator-register.
	(ix86_function_ok_for_sibcall): Reject for x86_64 ABI sibling
	calls for varardic-functions.

Regression tested for x86_64-unknown-linux-gnu, x86_64-w64-mingw32, and i686-pc-cygwin.
Ok for apply?

Regards,
Kai


Index: i386.c
===================================================================
--- i386.c	(revision 210936)
+++ i386.c	(working copy)
@@ -5298,6 +5298,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
       decl_or_type = type;
     }
 
+  /* We need to reject stdarg-function for x86_64 ABI as accumulator
+     is used as argument.  */
+  if (TARGET_64BIT && stdarg_p (type)
+      && ix86_function_type_abi (type) == SYSV_ABI)
+    return false;
+
   /* Check that the return value locations are the same.  Like
      if we are returning floats on the 80387 register stack, we cannot
      make a sibcall from a function that doesn't return a float to a
@@ -24916,8 +24922,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 	   ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
 	   : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
     {
+      rtx r;
       fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
-      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
+      if (!sibcall)
+	r = copy_to_mode_reg (word_mode, fnaddr);
+      else
+	{
+	  r = gen_rtx_REG (word_mode, AX_REG);
+	  if (! general_operand (fnaddr, VOIDmode))
+		    fnaddr = force_operand (fnaddr, r);
+	  if (fnaddr != r)
+	    emit_move_insn (r, fnaddr);
+	}
+      fnaddr = gen_rtx_MEM (QImode, r);
     }
 
   call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 18:20     ` Kai Tietz
@ 2014-05-26 18:32       ` Jakub Jelinek
  2014-05-26 19:22         ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Jelinek @ 2014-05-26 18:32 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, H.J. Lu, GCC Patches, Richard Henderson

On Mon, May 26, 2014 at 02:20:36PM -0400, Kai Tietz wrote:
> --- i386.c	(revision 210936)
> +++ i386.c	(working copy)
> @@ -5298,6 +5298,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>        decl_or_type = type;
>      }
>  
> +  /* We need to reject stdarg-function for x86_64 ABI as accumulator
> +     is used as argument.  */
> +  if (TARGET_64BIT && stdarg_p (type)
> +      && ix86_function_type_abi (type) == SYSV_ABI)
> +    return false;
> +
>    /* Check that the return value locations are the same.  Like
>       if we are returning floats on the 80387 register stack, we cannot
>       make a sibcall from a function that doesn't return a float to a
> @@ -24916,8 +24922,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
>  	   ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
>  	   : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
>      {
> +      rtx r;
>        fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
> -      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
> +      if (!sibcall)
> +	r = copy_to_mode_reg (word_mode, fnaddr);
> +      else
> +	{
> +	  r = gen_rtx_REG (word_mode, AX_REG);
> +	  if (! general_operand (fnaddr, VOIDmode))
> +		    fnaddr = force_operand (fnaddr, r);

Wrong formatting.

> +	  if (fnaddr != r)
> +	    emit_move_insn (r, fnaddr);
> +	}
> +      fnaddr = gen_rtx_MEM (QImode, r);
>      }
>  
>    call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);

In any case, I still can't understand how limiting the choices of the
register allocator can improve code rather than making it worse.
If the accumulator is available there, why doesn't the RA choose it
if it is beneficial?  And why aren't other registers similarly suitable for
that?  Say r10, r11...

	Jakub

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 18:32       ` Jakub Jelinek
@ 2014-05-26 19:22         ` Kai Tietz
  2014-05-26 19:35           ` Jakub Jelinek
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-26 19:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, H.J. Lu, GCC Patches, Richard Henderson

----- Original Message -----
> On Mon, May 26, 2014 at 02:20:36PM -0400, Kai Tietz wrote:
> > --- i386.c	(revision 210936)
> > +++ i386.c	(working copy)
> > @@ -5298,6 +5298,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
> >        decl_or_type = type;
> >      }
> >  
> > +  /* We need to reject stdarg-function for x86_64 ABI as accumulator
> > +     is used as argument.  */
> > +  if (TARGET_64BIT && stdarg_p (type)
> > +      && ix86_function_type_abi (type) == SYSV_ABI)
> > +    return false;
> > +
> >    /* Check that the return value locations are the same.  Like
> >       if we are returning floats on the 80387 register stack, we cannot
> >       make a sibcall from a function that doesn't return a float to a
> > @@ -24916,8 +24922,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
> >  	   ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
> >  	   : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
> >      {
> > +      rtx r;
> >        fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
> > -      fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
> > +      if (!sibcall)
> > +	r = copy_to_mode_reg (word_mode, fnaddr);
> > +      else
> > +	{
> > +	  r = gen_rtx_REG (word_mode, AX_REG);
> > +	  if (! general_operand (fnaddr, VOIDmode))
> > +		    fnaddr = force_operand (fnaddr, r);
> 
> Wrong formatting.

Thanks for the heads up.  Fix the superflous tab.

> > +	  if (fnaddr != r)
> > +	    emit_move_insn (r, fnaddr);
> > +	}
> > +      fnaddr = gen_rtx_MEM (QImode, r);
> >      }
> >  
> >    call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);
> 
> In any case, I still can't understand how limiting the choices of the
> register allocator can improve code rather than making it worse.
> If the accumulator is available there, why doesn't the RA choose it
> if it is beneficial?  And why aren't other registers similarly suitable for
> that?  Say r10, r11...

I don't see it as limiting.  The intend of this is more to have fixed patterns on epilogue.  And in fact is accumulator that register which can be used as scratch-register for all i386-targets.  Beside for varardic-functions, which anyway aren't any good candidates for sibling-call-optimization (on x86_64 due ABI).
Well, for x86_64 ABI we might could consider to use R11_REG instead of AX_REG.  Is there any advantage in special-case for x86_64 ABI?
The R10-register isn't a good choice due it might be used as drap-register and therefore can't be loaded before epilogue gets destroyed.
 
> 	Jakub
> 

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 19:22         ` Kai Tietz
@ 2014-05-26 19:35           ` Jakub Jelinek
  2014-05-26 20:32             ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Jelinek @ 2014-05-26 19:35 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, H.J. Lu, GCC Patches, Richard Henderson

On Mon, May 26, 2014 at 03:22:50PM -0400, Kai Tietz wrote:
> > In any case, I still can't understand how limiting the choices of the
> > register allocator can improve code rather than making it worse.
> > If the accumulator is available there, why doesn't the RA choose it
> > if it is beneficial?  And why aren't other registers similarly suitable for
> > that?  Say r10, r11...
> 
> I don't see it as limiting.  The intend of this is more to have fixed
> patterns on epilogue.  And in fact is accumulator that register which can
> be used as scratch-register for all i386-targets.  Beside for
> varardic-functions, which anyway aren't any good candidates for
> sibling-call-optimization (on x86_64 due ABI).  Well, for x86_64 ABI we
> might could consider to use R11_REG instead of AX_REG.  Is there any
> advantage in special-case for x86_64 ABI?  The R10-register isn't a good
> choice due it might be used as drap-register and therefore can't be loaded
> before epilogue gets destroyed.

It is limiting.  If r11/rax and often also r10 can be chosen, telling the RA
it can only choose rax is a limitation.

Can you show some testcase where your patch is actually beneficial?  We
should analyze why the RA made that choice.

	Jakub

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 19:35           ` Jakub Jelinek
@ 2014-05-26 20:32             ` Kai Tietz
  2014-05-27 15:39               ` Jeff Law
  2014-05-28  8:43               ` Kai Tietz
  0 siblings, 2 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-26 20:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, H.J. Lu, GCC Patches, Richard Henderson

----- Original Message -----
> On Mon, May 26, 2014 at 03:22:50PM -0400, Kai Tietz wrote:
> > > In any case, I still can't understand how limiting the choices of the
> > > register allocator can improve code rather than making it worse.
> > > If the accumulator is available there, why doesn't the RA choose it
> > > if it is beneficial?  And why aren't other registers similarly suitable
> > > for
> > > that?  Say r10, r11...
> > 
> > I don't see it as limiting.  The intend of this is more to have fixed
> > patterns on epilogue.  And in fact is accumulator that register which can
> > be used as scratch-register for all i386-targets.  Beside for
> > varardic-functions, which anyway aren't any good candidates for
> > sibling-call-optimization (on x86_64 due ABI).  Well, for x86_64 ABI we
> > might could consider to use R11_REG instead of AX_REG.  Is there any
> > advantage in special-case for x86_64 ABI?  The R10-register isn't a good
> > choice due it might be used as drap-register and therefore can't be loaded
> > before epilogue gets destroyed.
> 
> It is limiting.  If r11/rax and often also r10 can be chosen, telling the RA
> it can only choose rax is a limitation.

No, it isn't.  For sure.  The code-branch choosing the accu to call after epilogue isn't used as memories for sibling-calls aren't allowed.  This code-branch will get active with my other sibling-tail-call patch.
 
> Can you show some testcase where your patch is actually beneficial?  We
> should analyze why the RA made that choice.

So it is obvious I can't provide you samples you asked for.  Nevertheless I am pretty interested to see a sample by you (with activated sibling-tail-call memories) which chooses for tail-call-register for memory something else then accu.
 
> 	Jakub
> 

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 20:32             ` Kai Tietz
@ 2014-05-27 15:39               ` Jeff Law
  2014-05-27 15:51                 ` Kai Tietz
  2014-05-27 16:22                 ` Richard Henderson
  2014-05-28  8:43               ` Kai Tietz
  1 sibling, 2 replies; 39+ messages in thread
From: Jeff Law @ 2014-05-27 15:39 UTC (permalink / raw)
  To: Kai Tietz, Jakub Jelinek; +Cc: H.J. Lu, GCC Patches, Richard Henderson

On 05/26/14 14:32, Kai Tietz wrote:
> ----- Original Message -----
>> On Mon, May 26, 2014 at 03:22:50PM -0400, Kai Tietz wrote:
>>>> In any case, I still can't understand how limiting the choices
>>>> of the register allocator can improve code rather than making
>>>> it worse. If the accumulator is available there, why doesn't
>>>> the RA choose it if it is beneficial?  And why aren't other
>>>> registers similarly suitable for that?  Say r10, r11...
>>>
>>> I don't see it as limiting.  The intend of this is more to have
>>> fixed patterns on epilogue.  And in fact is accumulator that
>>> register which can be used as scratch-register for all
>>> i386-targets.  Beside for varardic-functions, which anyway aren't
>>> any good candidates for sibling-call-optimization (on x86_64 due
>>> ABI).  Well, for x86_64 ABI we might could consider to use
>>> R11_REG instead of AX_REG.  Is there any advantage in
>>> special-case for x86_64 ABI?  The R10-register isn't a good
>>> choice due it might be used as drap-register and therefore can't
>>> be loaded before epilogue gets destroyed.
>>
>> It is limiting.  If r11/rax and often also r10 can be chosen,
>> telling the RA it can only choose rax is a limitation.
>
> No, it isn't.  For sure.  The code-branch choosing the accu to call
> after epilogue isn't used as memories for sibling-calls aren't
> allowed.  This code-branch will get active with my other
> sibling-tail-call patch.
But the value we want may be sitting around in a convenient register 
(such as r11).  So if we force the sibcall to use %rax, then we have to 
generate a copy.  Yet if we have a constraint for the set of registers 
allowed here, then we give the register allocator the opportunity to 
coalesce away the copies.

Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-27 15:39               ` Jeff Law
@ 2014-05-27 15:51                 ` Kai Tietz
  2014-05-27 16:22                 ` Richard Henderson
  1 sibling, 0 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-27 15:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jakub Jelinek, H.J. Lu, GCC Patches, Richard Henderson

Ok, so just the part ok the patch remains to prevent varardic-functions being a sibling-tail-call remains.

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-22 21:33 [patch i386]: Expand sibling-tail-calls via accumulator register Kai Tietz
  2014-05-22 22:07 ` H.J. Lu
  2014-05-23 18:16 ` Jeff Law
@ 2014-05-27 15:51 ` Richard Henderson
  2 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-05-27 15:51 UTC (permalink / raw)
  To: Kai Tietz, GCC Patches

On 05/22/2014 02:33 PM, Kai Tietz wrote:
> 	* config/i386/i386.c (ix86_expand_call): Enforce for sibcalls
> 	on memory the use of accumulator-register.

I don't like this at all.

I'm fine with allowing memories that are fully symbolic, e.g.

extern void (*foo)(void);
void f(void) { foo(); }

but any time you've got to use one insn to form the address in %eax, you might
as well have also issued the memory load into %eax.  And if the memory load is
moved earlier, you no longer need to constrain to %eax, but let the register
allocator choose the call-clobbered register to use.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-27 15:39               ` Jeff Law
  2014-05-27 15:51                 ` Kai Tietz
@ 2014-05-27 16:22                 ` Richard Henderson
  2014-05-27 16:48                   ` Jeff Law
  1 sibling, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-05-27 16:22 UTC (permalink / raw)
  To: Jeff Law, Kai Tietz, Jakub Jelinek; +Cc: H.J. Lu, GCC Patches

On 05/27/2014 08:39 AM, Jeff Law wrote:
> But the value we want may be sitting around in a convenient register (such as
> r11).  So if we force the sibcall to use %rax, then we have to generate a
> copy.  Yet if we have a constraint for the set of registers allowed here, then
> we give the register allocator the opportunity to coalesce away the copies.

We do have an existing call-clobbered register class that we currently do use
for indirect sibcalls.

The problem that Kai is trying to work around is that he doesn't want to load
the value into a register, but its address.  But the register allocator only
has BASE_REG_CLASS and INDEX_REG_CLASS, and has no possibility to force the
address components into registers that are call clobbered.

My thinking is that, with few exceptions, this doesn't help anything.  If we
have to load the full address into a register:

	lea	ofs(base, index, scale), %eax
	...
	call	*0(%eax)

we might as well include the memory load

	mov	ofs(base, index, scale), %eax
	...
	call	*%eax

As far as I can see, the only time we can actually save an instruction is when
the address is fully symbolic:

	mov	foo, %eax
	...
	call	*%eax

becomes

	call	*foo


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-27 16:22                 ` Richard Henderson
@ 2014-05-27 16:48                   ` Jeff Law
  2014-05-27 17:26                     ` Richard Henderson
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Law @ 2014-05-27 16:48 UTC (permalink / raw)
  To: Richard Henderson, Kai Tietz, Jakub Jelinek; +Cc: H.J. Lu, GCC Patches

On 05/27/14 10:22, Richard Henderson wrote:
> On 05/27/2014 08:39 AM, Jeff Law wrote:
>> But the value we want may be sitting around in a convenient register (such as
>> r11).  So if we force the sibcall to use %rax, then we have to generate a
>> copy.  Yet if we have a constraint for the set of registers allowed here, then
>> we give the register allocator the opportunity to coalesce away the copies.
>
> We do have an existing call-clobbered register class that we currently do use
> for indirect sibcalls.
>
> The problem that Kai is trying to work around is that he doesn't want to load
> the value into a register, but its address.  But the register allocator only
> has BASE_REG_CLASS and INDEX_REG_CLASS, and has no possibility to force the
> address components into registers that are call clobbered.
>
> My thinking is that, with few exceptions, this doesn't help anything.  If we
> have to load the full address into a register:
>
> 	lea	ofs(base, index, scale), %eax
> 	...
> 	call	*0(%eax)
>
> we might as well include the memory load
>
> 	mov	ofs(base, index, scale), %eax
> 	...
> 	call	*%eax
Ok.  My misunderstanding.

Granted, this probably doesn't happen enough to matter, but isn't it 
likely profitable from a pipeline standpoint to go ahead and do the 
memory load separate from the indirect call/jump as well?


>
> As far as I can see, the only time we can actually save an instruction is when
> the address is fully symbolic:
>
> 	mov	foo, %eax
> 	...
> 	call	*%eax
>
> becomes
>
> 	call	*foo
Noted.
jeff


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-27 16:48                   ` Jeff Law
@ 2014-05-27 17:26                     ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-05-27 17:26 UTC (permalink / raw)
  To: Jeff Law, Kai Tietz, Jakub Jelinek; +Cc: H.J. Lu, GCC Patches

On 05/27/2014 09:48 AM, Jeff Law wrote:
>>     lea    ofs(base, index, scale), %eax
>>     ...
>>     call    *0(%eax)
>>
>> we might as well include the memory load
>>
>>     mov    ofs(base, index, scale), %eax
>>     ...
>>     call    *%eax
> Ok.  My misunderstanding.
> 
> Granted, this probably doesn't happen enough to matter, but isn't it likely
> profitable from a pipeline standpoint to go ahead and do the memory load
> separate from the indirect call/jump as well?

I'm sure.  Especially if the data is currently living in L2, and the scheduler
has enough room to move the load sufficiently early to hide the latency.

As I've said before, I think this is interesting solely as a space optimization.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-26 20:32             ` Kai Tietz
  2014-05-27 15:39               ` Jeff Law
@ 2014-05-28  8:43               ` Kai Tietz
  2014-05-28 15:42                 ` H.J. Lu
                                   ` (3 more replies)
  1 sibling, 4 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-28  8:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, GCC Patches, Jakub Jelinek, Richard Henderson

Hi,

I modified prior patch so that it uses the new predicate sibcall_memory_operand to extend sibcall_insn_operand.
Just one change in i386.c remains about x86_output_mi_thunk.  Later one isn't pretty much essential.  Nevertheless it makes
code equivalent to none-memory-case for potential tail-sibcalls.

ChangeLog gcc

2014-05-28  Kai Tietz  <ktietz@redhat.com>

	* config/i386/i386.c (x86_output_mi_thunk): Add memory case
	for sibling-tail-calls.
	* config/i386/i386.md (sibcall_insn_operand): Add memory-constrain
	to its use.
	* config/i386/predicates.md (sibcall_memory_operand): New predicate.
	(sibcall_insn_operand): Add check for sibcall_memory_operand.

ChangeLog testsuite

2014-05-28  Kai Tietz  <ktietz@redhat.com>

	PR target/60104
	* gcc.target/i386/sibcall-1.c: New test.
	* gcc.target/i386/sibcall-2.c: New test.
	* gcc.target/i386/sibcall-3.c: New test.

Regression tested x86_64-unknown-linux-gnu multilib, x86_64-w64-mingw32, and i686-pc-cygwin.  Ok for apply?

Regards,
Kai


Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 210985)
+++ gcc/config/i386/i386.c	(working copy)
@@ -38893,7 +38893,16 @@ x86_output_mi_thunk (FILE *file,
      For our purposes here, we can get away with (ab)using a jump pattern,
      because we're going to do no optimization.  */
   if (MEM_P (fnaddr))
-    emit_jump_insn (gen_indirect_jump (fnaddr));
+    {
+      if (sibcall_insn_operand (fnaddr, word_mode))
+	{
+	  tmp = gen_rtx_CALL (VOIDmode, fnaddr, const0_rtx);
+          tmp = emit_call_insn (tmp);
+          SIBLING_CALL_P (tmp) = 1;
+	}
+      else
+	emit_jump_insn (gen_indirect_jump (fnaddr));
+    }
   else
     {
       if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 210985)
+++ gcc/config/i386/i386.md	(working copy)
@@ -11376,7 +11376,7 @@
   [(set_attr "type" "call")])
 
 (define_insn "*sibcall"
-  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
+  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uzm"))
 	 (match_operand 1))]
   "SIBLING_CALL_P (insn)"
   "* return ix86_output_call_insn (insn, operands[0]);"
@@ -11406,7 +11406,7 @@
   [(set_attr "type" "call")])
 
 (define_insn "*sibcall_pop"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
+  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uzm"))
 	 (match_operand 1))
    (set (reg:SI SP_REG)
 	(plus:SI (reg:SI SP_REG)
@@ -11451,7 +11451,7 @@
 
 (define_insn "*sibcall_value"
   [(set (match_operand 0)
-	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
+	(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uzm"))
 	      (match_operand 2)))]
   "SIBLING_CALL_P (insn)"
   "* return ix86_output_call_insn (insn, operands[1]);"
@@ -11494,7 +11494,7 @@
 
 (define_insn "*sibcall_value_pop"
   [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
+	(call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uzm"))
 	      (match_operand 2)))
    (set (reg:SI SP_REG)
 	(plus:SI (reg:SI SP_REG)
Index: gcc/config/i386/predicates.md
===================================================================
--- gcc/config/i386/predicates.md	(revision 210985)
+++ gcc/config/i386/predicates.md	(working copy)
@@ -71,6 +71,16 @@
   return ANY_QI_REG_P (op);
 })
 
+(define_predicate "sibcall_memory_operand"
+  (match_operand 0 "memory_operand")
+{
+  op = XEXP (op, 0);
+
+  if (GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+  return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op));
+})
+
 ;; Match an SI or HImode register for a zero_extract.
 (define_special_predicate "ext_register_operand"
   (match_operand 0 "register_operand")
@@ -600,7 +610,9 @@
 (define_special_predicate "sibcall_insn_operand"
   (ior (match_test "constant_call_address_operand
 		     (op, mode == VOIDmode ? mode : Pmode)")
-       (match_operand 0 "register_no_elim_operand")))
+       (match_operand 0 "register_no_elim_operand")
+       (and (not (match_test "TARGET_X32"))
+	    (match_operand 0 "sibcall_memory_operand"))))
 
 ;; Return true if OP is a call from MS ABI to SYSV ABI function.
 (define_predicate "call_rex64_ms_sysv_operation"
Index: gcc/testsuite/gcc.target/i386/sibcall-1.c
===================================================================
--- gcc/testsuite/gcc.target/i386/sibcall-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/sibcall-1.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+extern int (*foo)(int);
+
+int boo (int a)
+{
+  return (*foo) (a);
+}
+
+/* { dg-final { scan-assembler-not "mov" } } */
Index: gcc/testsuite/gcc.target/i386/sibcall-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/sibcall-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/sibcall-2.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { xfail { *-*-* } } } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+extern int doo1 (int);
+extern int doo2 (int);
+extern void bar (char *);
+
+int foo (int a)
+{
+  char s[256];
+  bar (s);
+  return (a < 0 ? doo1 : doo2) (a);
+}
+
+/* { dg-final { scan-assembler-not "call[ \t]*.%eax" } } */
Index: gcc/testsuite/gcc.target/i386/sibcall-3.c
===================================================================
--- gcc/testsuite/gcc.target/i386/sibcall-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/sibcall-3.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+extern 
+#ifdef _WIN32
+ __declspec (dllimport)
+#endif
+ void foo (int a);
+
+void bar (int a)
+{
+  return foo (a);
+}
+
+/* { dg-final { scan-assembler-not "jmp[ \t]*.%eax" } } */
Index: gcc/testsuite/gcc.target/i386/sibcall-4.c
===================================================================
--- gcc/testsuite/gcc.target/i386/sibcall-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/sibcall-4.c	(working copy)
@@ -0,0 +1,15 @@
+/* Testcase for PR target/46219.  */
+/* { dg-do compile { xfail { *-*-* } } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler-not "jmp[ \t]*.%eax" } } */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28  8:43               ` Kai Tietz
@ 2014-05-28 15:42                 ` H.J. Lu
  2014-05-28 17:22                 ` Richard Henderson
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: H.J. Lu @ 2014-05-28 15:42 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, GCC Patches, Jakub Jelinek, Richard Henderson

On Wed, May 28, 2014 at 1:43 AM, Kai Tietz <ktietz@redhat.com> wrote:
> Hi,
>
> I modified prior patch so that it uses the new predicate sibcall_memory_operand to extend sibcall_insn_operand.
> Just one change in i386.c remains about x86_output_mi_thunk.  Later one isn't pretty much essential.  Nevertheless it makes
> code equivalent to none-memory-case for potential tail-sibcalls.
>
> ChangeLog gcc
>
> 2014-05-28  Kai Tietz  <ktietz@redhat.com>
>
>         * config/i386/i386.c (x86_output_mi_thunk): Add memory case
>         for sibling-tail-calls.
>         * config/i386/i386.md (sibcall_insn_operand): Add memory-constrain
>         to its use.
>         * config/i386/predicates.md (sibcall_memory_operand): New predicate.
>         (sibcall_insn_operand): Add check for sibcall_memory_operand.
>
> ChangeLog testsuite
>
> 2014-05-28  Kai Tietz  <ktietz@redhat.com>
>
>         PR target/60104
>         * gcc.target/i386/sibcall-1.c: New test.
>         * gcc.target/i386/sibcall-2.c: New test.
>         * gcc.target/i386/sibcall-3.c: New test.
>
> Regression tested x86_64-unknown-linux-gnu multilib, x86_64-w64-mingw32, and i686-pc-cygwin.  Ok for apply?
>
> Regards,
> Kai
>
>
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      (revision 210985)
> +++ gcc/config/i386/i386.c      (working copy)
> @@ -38893,7 +38893,16 @@ x86_output_mi_thunk (FILE *file,
>       For our purposes here, we can get away with (ab)using a jump pattern,
>       because we're going to do no optimization.  */
>    if (MEM_P (fnaddr))
> -    emit_jump_insn (gen_indirect_jump (fnaddr));
> +    {
> +      if (sibcall_insn_operand (fnaddr, word_mode))
> +       {
> +         tmp = gen_rtx_CALL (VOIDmode, fnaddr, const0_rtx);
> +          tmp = emit_call_insn (tmp);
> +          SIBLING_CALL_P (tmp) = 1;
> +       }
> +      else
> +       emit_jump_insn (gen_indirect_jump (fnaddr));
> +    }
>    else
>      {
>        if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
> Index: gcc/config/i386/i386.md
> ===================================================================
> --- gcc/config/i386/i386.md     (revision 210985)
> +++ gcc/config/i386/i386.md     (working copy)
> @@ -11376,7 +11376,7 @@
>    [(set_attr "type" "call")])
>
>  (define_insn "*sibcall"
> -  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uz"))
> +  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uzm"))
>          (match_operand 1))]
>    "SIBLING_CALL_P (insn)"
>    "* return ix86_output_call_insn (insn, operands[0]);"
> @@ -11406,7 +11406,7 @@
>    [(set_attr "type" "call")])
>
>  (define_insn "*sibcall_pop"
> -  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uz"))
> +  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uzm"))
>          (match_operand 1))
>     (set (reg:SI SP_REG)
>         (plus:SI (reg:SI SP_REG)
> @@ -11451,7 +11451,7 @@
>
>  (define_insn "*sibcall_value"
>    [(set (match_operand 0)
> -       (call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uz"))
> +       (call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uzm"))
>               (match_operand 2)))]
>    "SIBLING_CALL_P (insn)"
>    "* return ix86_output_call_insn (insn, operands[1]);"
> @@ -11494,7 +11494,7 @@
>
>  (define_insn "*sibcall_value_pop"
>    [(set (match_operand 0)
> -       (call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uz"))
> +       (call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uzm"))
>               (match_operand 2)))
>     (set (reg:SI SP_REG)
>         (plus:SI (reg:SI SP_REG)
> Index: gcc/config/i386/predicates.md
> ===================================================================
> --- gcc/config/i386/predicates.md       (revision 210985)
> +++ gcc/config/i386/predicates.md       (working copy)
> @@ -71,6 +71,16 @@
>    return ANY_QI_REG_P (op);
>  })
>
> +(define_predicate "sibcall_memory_operand"
> +  (match_operand 0 "memory_operand")
> +{
> +  op = XEXP (op, 0);
> +
> +  if (GET_CODE (op) == CONST)
> +    op = XEXP (op, 0);
> +  return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op));
> +})
> +
>  ;; Match an SI or HImode register for a zero_extract.
>  (define_special_predicate "ext_register_operand"
>    (match_operand 0 "register_operand")
> @@ -600,7 +610,9 @@
>  (define_special_predicate "sibcall_insn_operand"
>    (ior (match_test "constant_call_address_operand
>                      (op, mode == VOIDmode ? mode : Pmode)")
> -       (match_operand 0 "register_no_elim_operand")))
> +       (match_operand 0 "register_no_elim_operand")
> +       (and (not (match_test "TARGET_X32"))
> +           (match_operand 0 "sibcall_memory_operand"))))
>
>  ;; Return true if OP is a call from MS ABI to SYSV ABI function.
>  (define_predicate "call_rex64_ms_sysv_operation"
> Index: gcc/testsuite/gcc.target/i386/sibcall-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/sibcall-1.c   (revision 0)
> +++ gcc/testsuite/gcc.target/i386/sibcall-1.c   (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-options "-O2" } */
> +
> +extern int (*foo)(int);
> +
> +int boo (int a)
> +{
> +  return (*foo) (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "mov" } } */
> Index: gcc/testsuite/gcc.target/i386/sibcall-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/sibcall-2.c   (revision 0)
> +++ gcc/testsuite/gcc.target/i386/sibcall-2.c   (working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { xfail { *-*-* } } } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-options "-O2" } */
> +
> +extern int doo1 (int);
> +extern int doo2 (int);
> +extern void bar (char *);
> +
> +int foo (int a)
> +{
> +  char s[256];
> +  bar (s);
> +  return (a < 0 ? doo1 : doo2) (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "call[ \t]*.%eax" } } */
> Index: gcc/testsuite/gcc.target/i386/sibcall-3.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/sibcall-3.c   (revision 0)
> +++ gcc/testsuite/gcc.target/i386/sibcall-3.c   (working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-options "-O2" } */
> +
> +extern
> +#ifdef _WIN32
> + __declspec (dllimport)
> +#endif
> + void foo (int a);
> +
> +void bar (int a)
> +{
> +  return foo (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "jmp[ \t]*.%eax" } } */
> Index: gcc/testsuite/gcc.target/i386/sibcall-4.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/sibcall-4.c   (revision 0)
> +++ gcc/testsuite/gcc.target/i386/sibcall-4.c   (working copy)
> @@ -0,0 +1,15 @@
> +/* Testcase for PR target/46219.  */
> +/* { dg-do compile { xfail { *-*-* } } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-options "-O2" } */
> +
> +typedef void (*dispatch_t)(long offset);
> +
> +dispatch_t dispatch[256];
> +
> +void male_indirect_jump (long offset)
> +{
> +  dispatch[offset](offset);
> +}
> +
> +/* { dg-final { scan-assembler-not "jmp[ \t]*.%eax" } } */

Shouldn't the testcases also be applied to x86-64?

-- 
H.J.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28  8:43               ` Kai Tietz
  2014-05-28 15:42                 ` H.J. Lu
@ 2014-05-28 17:22                 ` Richard Henderson
  2014-05-28 19:13                   ` Jeff Law
  2014-05-30 23:09                 ` Tom de Vries
  2014-07-06 12:49                 ` Iain Sandoe
  3 siblings, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-05-28 17:22 UTC (permalink / raw)
  To: Kai Tietz, Jeff Law; +Cc: H.J. Lu, GCC Patches, Jakub Jelinek

On 05/28/2014 01:43 AM, Kai Tietz wrote:
> +  if (GET_CODE (op) == CONST)
> +    op = XEXP (op, 0);
> +  return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op));

Surely all this boils down to just CONSTANT_P (op),
without having to look through the CONST at all.

Otherwise this looks ok.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 17:22                 ` Richard Henderson
@ 2014-05-28 19:13                   ` Jeff Law
  2014-05-28 21:28                     ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Law @ 2014-05-28 19:13 UTC (permalink / raw)
  To: Richard Henderson, Kai Tietz; +Cc: H.J. Lu, GCC Patches, Jakub Jelinek

On 05/28/14 11:22, Richard Henderson wrote:
> On 05/28/2014 01:43 AM, Kai Tietz wrote:
>> +  if (GET_CODE (op) == CONST)
>> +    op = XEXP (op, 0);
>> +  return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op));
>
> Surely all this boils down to just CONSTANT_P (op),
> without having to look through the CONST at all.
Something certainly seems odd there.

Kai, is your intention to detect symbol_ref + const_int?  Isn't the 
canonical form for that:

(const (plus (symbol_ref) (const_int))?

It appears to me the code above looks for

(const (symbol_ref))

or

(const (const_int))

?!?

Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 19:13                   ` Jeff Law
@ 2014-05-28 21:28                     ` Kai Tietz
  2014-05-28 21:52                       ` Jakub Jelinek
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-28 21:28 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, H.J. Lu, GCC Patches, Jakub Jelinek



----- Original Message -----
> On 05/28/14 11:22, Richard Henderson wrote:
> > On 05/28/2014 01:43 AM, Kai Tietz wrote:
> >> +  if (GET_CODE (op) == CONST)
> >> +    op = XEXP (op, 0);
> >> +  return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op));
> >
> > Surely all this boils down to just CONSTANT_P (op),
> > without having to look through the CONST at all.
> Something certainly seems odd there.
> 
> Kai, is your intention to detect symbol_ref + const_int?  Isn't the
> canonical form for that:
> 
> (const (plus (symbol_ref) (const_int))?
> 
> It appears to me the code above looks for
> 
> (const (symbol_ref))
> 
> or
> 
> (const (const_int))
> 
> ?!?
> 
> Jeff
> 

Yes, I missed the plus-part.

I am just running bootstrap with regression testing for altering predicate to:

(define_predicate "sibcall_memory_operand"
  (match_operand 0 "memory_operand")
{
  op = XEXP (op, 0);

  if (GET_CODE (op) == CONST)
    op = XEXP (op, 0);
  if (GET_CODE (op) == PLUS && CONSTANT_P (XEXP (op, 0)))
    op = XEXP (op, 1);
  return CONSTANT_P (op);
})

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 21:28                     ` Kai Tietz
@ 2014-05-28 21:52                       ` Jakub Jelinek
  2014-05-28 21:55                         ` Jeff Law
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Jelinek @ 2014-05-28 21:52 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, Richard Henderson, H.J. Lu, GCC Patches

On Wed, May 28, 2014 at 05:28:31PM -0400, Kai Tietz wrote:
> Yes, I missed the plus-part.
> 
> I am just running bootstrap with regression testing for altering predicate to:
> 
> (define_predicate "sibcall_memory_operand"
>   (match_operand 0 "memory_operand")
> {
>   op = XEXP (op, 0);
> 
>   if (GET_CODE (op) == CONST)
>     op = XEXP (op, 0);
>   if (GET_CODE (op) == PLUS && CONSTANT_P (XEXP (op, 0)))
>     op = XEXP (op, 1);

Why not get rid of all the above 4 lines and just keep:

>   return CONSTANT_P (op);

?  CONST matches CONSTANT_P, and what is inside of CONST should be
fine, and (plus (symbol_ref) (const_int)) not surrounded by CONST
ir invalid.

	Jakub

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 21:52                       ` Jakub Jelinek
@ 2014-05-28 21:55                         ` Jeff Law
  2014-05-28 22:03                           ` Jakub Jelinek
  2014-05-29  3:57                           ` Richard Henderson
  0 siblings, 2 replies; 39+ messages in thread
From: Jeff Law @ 2014-05-28 21:55 UTC (permalink / raw)
  To: Jakub Jelinek, Kai Tietz; +Cc: Richard Henderson, H.J. Lu, GCC Patches

On 05/28/14 15:52, Jakub Jelinek wrote:
> On Wed, May 28, 2014 at 05:28:31PM -0400, Kai Tietz wrote:
>> Yes, I missed the plus-part.
>>
>> I am just running bootstrap with regression testing for altering predicate to:
>>
>> (define_predicate "sibcall_memory_operand"
>>    (match_operand 0 "memory_operand")
>> {
>>    op = XEXP (op, 0);
>>
>>    if (GET_CODE (op) == CONST)
>>      op = XEXP (op, 0);
>>    if (GET_CODE (op) == PLUS && CONSTANT_P (XEXP (op, 0)))
>>      op = XEXP (op, 1);
>
> Why not get rid of all the above 4 lines and just keep:
>
>>    return CONSTANT_P (op);
>
> ?  CONST matches CONSTANT_P, and what is inside of CONST should be
> fine, and (plus (symbol_ref) (const_int)) not surrounded by CONST
> ir invalid.
Haven't we recently had problems with being overly accepting of stuff 
inside CONST when using the CONST for address expressions.  ISTM we 
should only accept what the processor supports here.

jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 21:55                         ` Jeff Law
@ 2014-05-28 22:03                           ` Jakub Jelinek
  2014-05-28 22:14                             ` Kai Tietz
  2014-05-29  3:57                           ` Richard Henderson
  1 sibling, 1 reply; 39+ messages in thread
From: Jakub Jelinek @ 2014-05-28 22:03 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kai Tietz, Richard Henderson, H.J. Lu, GCC Patches

On Wed, May 28, 2014 at 03:54:58PM -0600, Jeff Law wrote:
> >Why not get rid of all the above 4 lines and just keep:
> >
> >>   return CONSTANT_P (op);
> >
> >?  CONST matches CONSTANT_P, and what is inside of CONST should be
> >fine, and (plus (symbol_ref) (const_int)) not surrounded by CONST
> >ir invalid.
> Haven't we recently had problems with being overly accepting of
> stuff inside CONST when using the CONST for address expressions.
> ISTM we should only accept what the processor supports here.

The only recent problem I remember was that we forgot to put CONST
around (plus (symbol_ref) (const_int)), but I see no problem not
accepting such invalid RTL.
The processor shouldn't care, for the instructions a CONST is just
any kind of immediate, what exactly it is matters solely to the
assembler/linker and dynamic linker
(if there are relocations for it, if the expression can be expressed in
the assembly, etc.).  But that is common to all CONST operands, there is
nothing special particularly about sibcalls.

	Jakub

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 22:03                           ` Jakub Jelinek
@ 2014-05-28 22:14                             ` Kai Tietz
  0 siblings, 0 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-28 22:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Richard Henderson, H.J. Lu, GCC Patches

----- Original Message -----
> On Wed, May 28, 2014 at 03:54:58PM -0600, Jeff Law wrote:
> > >Why not get rid of all the above 4 lines and just keep:
> > >
> > >>   return CONSTANT_P (op);
> > >
> > >?  CONST matches CONSTANT_P, and what is inside of CONST should be
> > >fine, and (plus (symbol_ref) (const_int)) not surrounded by CONST
> > >ir invalid.
> > Haven't we recently had problems with being overly accepting of
> > stuff inside CONST when using the CONST for address expressions.
> > ISTM we should only accept what the processor supports here.
> 
> The only recent problem I remember was that we forgot to put CONST
> around (plus (symbol_ref) (const_int)), but I see no problem not
> accepting such invalid RTL.
> The processor shouldn't care, for the instructions a CONST is just
> any kind of immediate, what exactly it is matters solely to the
> assembler/linker and dynamic linker
> (if there are relocations for it, if the expression can be expressed in
> the assembly, etc.).  But that is common to all CONST operands, there is
> nothing special particularly about sibcalls.
> 
> 	Jakub
> 

Well, actually we want to prevent to accept anything plus/mult within memory-addresses, which hasn't a symbol-ref, or a constant-value as arguments.  Is it for sure that there are within a CONST-rtx no registers?  If so, we could check intitally for CONSTANT_P.

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28 21:55                         ` Jeff Law
  2014-05-28 22:03                           ` Jakub Jelinek
@ 2014-05-29  3:57                           ` Richard Henderson
  2014-05-30  8:08                             ` Kai Tietz
  1 sibling, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-05-29  3:57 UTC (permalink / raw)
  To: Jeff Law, Jakub Jelinek, Kai Tietz; +Cc: H.J. Lu, GCC Patches

On 05/28/2014 02:54 PM, Jeff Law wrote:
> On 05/28/14 15:52, Jakub Jelinek wrote:
>> On Wed, May 28, 2014 at 05:28:31PM -0400, Kai Tietz wrote:
>>> Yes, I missed the plus-part.
>>>
>>> I am just running bootstrap with regression testing for altering predicate to:
>>>
>>> (define_predicate "sibcall_memory_operand"
>>>    (match_operand 0 "memory_operand")
>>> {
>>>    op = XEXP (op, 0);
>>>
>>>    if (GET_CODE (op) == CONST)
>>>      op = XEXP (op, 0);
>>>    if (GET_CODE (op) == PLUS && CONSTANT_P (XEXP (op, 0)))
>>>      op = XEXP (op, 1);
>>
>> Why not get rid of all the above 4 lines and just keep:
>>
>>>    return CONSTANT_P (op);
>>
>> ?  CONST matches CONSTANT_P, and what is inside of CONST should be
>> fine, and (plus (symbol_ref) (const_int)) not surrounded by CONST
>> ir invalid.
> Haven't we recently had problems with being overly accepting of stuff inside
> CONST when using the CONST for address expressions.  ISTM we should only accept
> what the processor supports here.

Recall that it has just satisfied memory_operand, where all the real checks
should have been done.  I think just the CONSTANT_P check is sufficient.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-29  3:57                           ` Richard Henderson
@ 2014-05-30  8:08                             ` Kai Tietz
  2014-05-30 15:56                               ` Richard Henderson
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-30  8:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Jakub Jelinek, H.J. Lu, GCC Patches

So, completed bootstrap and regression-test for x86_64-unknown-linux-gnu (multilib) by using following predicate for sibcall-patch.

(define_predicate "sibcall_memory_operand"
  (match_operand 0 "memory_operand")
{
  return CONSTANT_P (op);
})


Worked fine, no regressions.  Is sibcall-patch ok with that predicate to be applied?

Regards,
Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30  8:08                             ` Kai Tietz
@ 2014-05-30 15:56                               ` Richard Henderson
  2014-05-30 16:51                                 ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-05-30 15:56 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, Jakub Jelinek, H.J. Lu, GCC Patches

On 05/30/2014 01:08 AM, Kai Tietz wrote:
> (define_predicate "sibcall_memory_operand"
>   (match_operand 0 "memory_operand")
> {
>   return CONSTANT_P (op);
> })

Surely that always returns false?  Surely XEXP (op, 0) so that you look at the
address, not the memory.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30 15:56                               ` Richard Henderson
@ 2014-05-30 16:51                                 ` Kai Tietz
  2014-05-30 17:15                                   ` Jeff Law
  2014-05-30 22:05                                   ` H.J. Lu
  0 siblings, 2 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-30 16:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Jakub Jelinek, H.J. Lu, GCC Patches

----- Original Message -----
> On 05/30/2014 01:08 AM, Kai Tietz wrote:
> > (define_predicate "sibcall_memory_operand"
> >   (match_operand 0 "memory_operand")
> > {
> >   return CONSTANT_P (op);
> > })
> 
> Surely that always returns false?  Surely XEXP (op, 0) so that you look at
> the
> address, not the memory.
> 
> 
> r~
> 

Doh ... of course


(define_predicate "sibcall_memory_operand"
  (match_operand 0 "memory_operand")
{
  return CONSTANT_P (XEXP (op, 0));
})


Actually I tested the proper version :}

Kai

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30 16:51                                 ` Kai Tietz
@ 2014-05-30 17:15                                   ` Jeff Law
  2014-05-30 22:05                                   ` H.J. Lu
  1 sibling, 0 replies; 39+ messages in thread
From: Jeff Law @ 2014-05-30 17:15 UTC (permalink / raw)
  To: Kai Tietz, Richard Henderson; +Cc: Jakub Jelinek, H.J. Lu, GCC Patches

On 05/30/14 10:51, Kai Tietz wrote:
> ----- Original Message -----
>> On 05/30/2014 01:08 AM, Kai Tietz wrote:
>>> (define_predicate "sibcall_memory_operand"
>>>    (match_operand 0 "memory_operand")
>>> {
>>>    return CONSTANT_P (op);
>>> })
>>
>> Surely that always returns false?  Surely XEXP (op, 0) so that you look at
>> the
>> address, not the memory.
>>
>>
>> r~
>>
>
> Doh ... of course
>
>
> (define_predicate "sibcall_memory_operand"
>    (match_operand 0 "memory_operand")
> {
>    return CONSTANT_P (XEXP (op, 0));
> })
>
>
> Actually I tested the proper version :}
In that case, it's good to go.  OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30 16:51                                 ` Kai Tietz
  2014-05-30 17:15                                   ` Jeff Law
@ 2014-05-30 22:05                                   ` H.J. Lu
  2014-05-30 22:46                                     ` H.J. Lu
  1 sibling, 1 reply; 39+ messages in thread
From: H.J. Lu @ 2014-05-30 22:05 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Richard Henderson, Jeff Law, Jakub Jelinek, GCC Patches

On Fri, May 30, 2014 at 9:51 AM, Kai Tietz <ktietz@redhat.com> wrote:
> ----- Original Message -----
>> On 05/30/2014 01:08 AM, Kai Tietz wrote:
>> > (define_predicate "sibcall_memory_operand"
>> >   (match_operand 0 "memory_operand")
>> > {
>> >   return CONSTANT_P (op);
>> > })
>>
>> Surely that always returns false?  Surely XEXP (op, 0) so that you look at
>> the
>> address, not the memory.
>>
>>
>> r~
>>
>
> Doh ... of course
>
>
> (define_predicate "sibcall_memory_operand"
>   (match_operand 0 "memory_operand")
> {
>   return CONSTANT_P (XEXP (op, 0));
> })
>
>
> Actually I tested the proper version :}
>

Have you tested bootstrap on i686?  I think it may have broken bootstrap
on i686:

https://gcc.gnu.org/ml/gcc-regression/2014-05/msg00408.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30 22:05                                   ` H.J. Lu
@ 2014-05-30 22:46                                     ` H.J. Lu
  0 siblings, 0 replies; 39+ messages in thread
From: H.J. Lu @ 2014-05-30 22:46 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Richard Henderson, Jeff Law, Jakub Jelinek, GCC Patches

On Fri, May 30, 2014 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 30, 2014 at 9:51 AM, Kai Tietz <ktietz@redhat.com> wrote:
>> ----- Original Message -----
>>> On 05/30/2014 01:08 AM, Kai Tietz wrote:
>>> > (define_predicate "sibcall_memory_operand"
>>> >   (match_operand 0 "memory_operand")
>>> > {
>>> >   return CONSTANT_P (op);
>>> > })
>>>
>>> Surely that always returns false?  Surely XEXP (op, 0) so that you look at
>>> the
>>> address, not the memory.
>>>
>>>
>>> r~
>>>
>>
>> Doh ... of course
>>
>>
>> (define_predicate "sibcall_memory_operand"
>>   (match_operand 0 "memory_operand")
>> {
>>   return CONSTANT_P (XEXP (op, 0));
>> })
>>
>>
>> Actually I tested the proper version :}
>>
>
> Have you tested bootstrap on i686?  I think it may have broken bootstrap
> on i686:
>
> https://gcc.gnu.org/ml/gcc-regression/2014-05/msg00408.html
>

The function is

void
ira_traverse_loop_tree (bool bb_p, ira_loop_tree_node_t loop_node,
                        void (*preorder_func) (ira_loop_tree_node_t),
                        void (*postorder_func) (ira_loop_tree_node_t))
{
    ...
    if (postorder_func != NULL)
      (*postorder_func) (loop_node);
}

postorder_func is passed on stack for i686 and we generate

   jmp    *0x28(%esp)

sibcall invalidates the original arguments on stack and puts
the new argument on stack.  There is no place on stack to
be used for indirect sibcall via memory.  In this case,
sibcall_memory_operand should return false.

-- 
H.J.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28  8:43               ` Kai Tietz
  2014-05-28 15:42                 ` H.J. Lu
  2014-05-28 17:22                 ` Richard Henderson
@ 2014-05-30 23:09                 ` Tom de Vries
  2014-07-06 12:49                 ` Iain Sandoe
  3 siblings, 0 replies; 39+ messages in thread
From: Tom de Vries @ 2014-05-30 23:09 UTC (permalink / raw)
  To: Kai Tietz, Jeff Law
  Cc: H.J. Lu, GCC Patches, Jakub Jelinek, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

On 28-05-14 10:43, Kai Tietz wrote:
> Index: gcc/testsuite/gcc.target/i386/sibcall-4.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/sibcall-4.c	(revision 0)
> +++ gcc/testsuite/gcc.target/i386/sibcall-4.c	(working copy)
> @@ -0,0 +1,15 @@
> +/* Testcase for PR target/46219.  */
> +/* { dg-do compile { xfail { *-*-* } } */

Hi,

I've committed this follow-up patch to add a missing closing brace.

Thanks,
- Tom

[-- Attachment #2: 0001-Add-missing-closing-brace.patch --]
[-- Type: text/x-patch, Size: 562 bytes --]

2014-05-31  Tom de Vries  <tom@codesourcery.com>

	* gcc.target/i386/sibcall-4.c: Add missing closing brace.

diff --git a/gcc/testsuite/gcc.target/i386/sibcall-4.c b/gcc/testsuite/gcc.target/i386/sibcall-4.c
index e157338..e9ae939 100644
--- a/gcc/testsuite/gcc.target/i386/sibcall-4.c
+++ b/gcc/testsuite/gcc.target/i386/sibcall-4.c
@@ -1,5 +1,5 @@
 /* Testcase for PR target/46219.  */
-/* { dg-do compile { xfail { *-*-* } } */
+/* { dg-do compile { xfail { *-*-* } } } */
 /* { dg-require-effective-target ia32 } */
 /* { dg-options "-O2" } */
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-28  8:43               ` Kai Tietz
                                   ` (2 preceding siblings ...)
  2014-05-30 23:09                 ` Tom de Vries
@ 2014-07-06 12:49                 ` Iain Sandoe
  3 siblings, 0 replies; 39+ messages in thread
From: Iain Sandoe @ 2014-07-06 12:49 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, GCC Patches, Richard Henderson

Hello Kai,

On 28 May 2014, at 09:43, Kai Tietz wrote:

> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c	(revision 210985)
> +++ gcc/config/i386/i386.c	(working copy)
> @@ -38893,7 +38893,16 @@ x86_output_mi_thunk (FILE *file,
>      For our purposes here, we can get away with (ab)using a jump pattern,
>      because we're going to do no optimization.  */
>   if (MEM_P (fnaddr))
> -    emit_jump_insn (gen_indirect_jump (fnaddr));
> +    {
> +      if (sibcall_insn_operand (fnaddr, word_mode))
> +	{
> +	  tmp = gen_rtx_CALL (VOIDmode, fnaddr, const0_rtx);
> +          tmp = emit_call_insn (tmp);
> +          SIBLING_CALL_P (tmp) = 1;
> +	}
> +      else
> +	emit_jump_insn (gen_indirect_jump (fnaddr));
> +    }
>   else

As discussed in PR61387 and on IRC, this patch (http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=211089), in particular the section above, causes massive regressions (~900 test fails) for x86_64-darwin*.

I have some questions and observations and would request some progress in resolving this.

FWIW, x86_64-darwin *passes* the test-cases you added with the patch series.

====

The section of code above will only fire  (1) we are PIC (2) we are not PECOFF (3) the target returns binds_local_p () false for the function (see lines 38863-38871).

Observations:

A. AFAICT, the section of code above is _never_ exercised by x86_64-linux for a full bootstrap and regression test.
  -- so please could you identify how you tested it?
  -- whatever is done to resolve the current issue, it seems that an appropriate test-case is required.

B. You have considerably altered the behaviour of that code block without any amendment to the preceding comment.
  -- please update the comment to refect the changed behaviour.

the case is generated by:

	  tmp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, fnaddr), UNSPEC_GOTPCREL);
	  tmp = gen_rtx_CONST (Pmode, tmp);
	  fnaddr = gen_const_mem (Pmode, tmp);

C. The code above seems pretty generic, grep doesn't reveal any different handling of UNSPEC_GOTPCREL for Darwin - what part of the Darwin implementation are you suggesting needs amendment?

====

Certainly, it is easy enough to make a patch to disable this operation on Darwin (and thus fix the regressions) -- however, I'd like to be sure that there is simply not some lurking issue, that has simply only manifested on Darwin so far.

thanks,
Iain

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
@ 2014-07-09 17:49 Dominique Dhumieres
  0 siblings, 0 replies; 39+ messages in thread
From: Dominique Dhumieres @ 2014-07-09 17:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: rth, law, ktietz, iain

> FWIW, x86_64-darwin *passes* the test-cases you added with the patch series.

gcc.target/i386/sibcall-1.c fails in 32 bit mode:

FAIL: gcc.target/i386/sibcall-1.c scan-assembler-not mov

Cheers,

Dominique

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-31 11:58     ` Kai Tietz
@ 2014-05-31 12:22       ` Kai Tietz
  0 siblings, 0 replies; 39+ messages in thread
From: Kai Tietz @ 2014-05-31 12:22 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Kai Tietz, H.J. Lu, Richard Henderson, Jeff Law, Jakub Jelinek,
	gcc-patches

In testcase a NULL escaped ...

ChangeLog testsuite

2014-05-31  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/sibcall-6.c: New test.

Index: gcc.target/i386/sibcall-6.c
===================================================================
--- gcc.target/i386/sibcall-6.c    (Revision 0)
+++ gcc.target/i386/sibcall-6.c    (Arbeitskopie)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+typedef void *ira_loop_tree_node_t;
+
+extern int end (int);
+extern int doo (int);
+
+void
+ira_traverse_loop_tree (int bb_p, ira_loop_tree_node_t loop_node,
+                        void (*preorder_func) (ira_loop_tree_node_t),
+                        void (*postorder_func) (ira_loop_tree_node_t))
+{
+  int l, r = 0x1, h = 0, j = 0;
+
+  if (preorder_func)
+    (*preorder_func) (loop_node);
+
+  if (bb_p)
+    {
+      for (l = 0; l < end (l); l++)
+        {
+          r += doo (l);
+          h += (l + 1) * 3;
+          h %= (l + 1);
+          r -= doo (h);
+          j += (l + 1) * 7;
+          j %= (l + 1);
+          r += doo (j);
+        }
+    }
+
+  if (postorder_func)
+    (*postorder_func) (loop_node);
+}
+/* { dg-final { scan-assembler "jmp[ \t]*.%eax" } } */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-31 10:15   ` Kai Tietz
@ 2014-05-31 11:58     ` Kai Tietz
  2014-05-31 12:22       ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-31 11:58 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Kai Tietz, H.J. Lu, Richard Henderson, Jeff Law, Jakub Jelinek,
	gcc-patches

Hello,

recent fallout about sibcall was caused by using 'm' constraint for
sibcalls.  By this wrongly combines happened on reload-pass.

ChangeLog

2014-05-31  Kai Tietz  <ktietz@redhat.com>

    * constrains.md (define_constrain): New 'B' constraint.
    * i386.md (sibcall_insn_operand): Use B instead of m constraint.

ChangeLog testsuite

2014-05-31  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/sibcall-6.c: New test.

Bootstrapped i386-unknown-linux-gnu, x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: constraints.md
===================================================================
--- constraints.md    (Revision 211089)
+++ constraints.md    (Arbeitskopie)
@@ -18,7 +18,7 @@
 ;; <http://www.gnu.org/licenses/>.

 ;;; Unused letters:
-;;;     B     H
+;;;           H
 ;;;           h j

 ;; Integer register constraints.
@@ -151,6 +151,11 @@
   "@internal Constant call address operand."
   (match_operand 0 "constant_call_address_operand"))

+(define_constraint "B"
+  "@internal Sibcall memory operand."
+  (and (not (match_test "TARGET_X32"))
+       (match_operand 0 "sibcall_memory_operand")))
+
 (define_constraint "w"
   "@internal Call memory operand."
   (and (not (match_test "TARGET_X32"))
Index: i386.md
===================================================================
--- i386.md    (Revision 211089)
+++ i386.md    (Arbeitskopie)
@@ -11376,12 +11376,41 @@
   [(set_attr "type" "call")])

 (define_insn "*sibcall"
-  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "Uzm"))
+  [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UzB"))
      (match_operand 1))]
   "SIBLING_CALL_P (insn)"
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])

+; TODO
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+        (match_operand:DI 1 "memory_operand"))
+   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
+  "TARGET_64BIT"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (match_dup 0)
+        (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+        (match_operand:SI 1 "memory_operand"))
+   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
+  "!TARGET_64BIT"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (match_dup 0)
+        (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+        (match_operand:DI 1 "memory_operand"))
+   (call (mem:QI (match_operand:DI 2 "register_operand"))
+         (match_operand 3))]
+  "TARGET_64BIT  && REG_P (operands[0]) && 0
+    && REG_P (operands[2])
+    && REGNO (operands[0]) == REGNO (operands[2])"
+  [(call (mem:QI (match_dup 1)) (match_dup 1))])
+
 (define_expand "call_pop"
   [(parallel [(call (match_operand:QI 0)
             (match_operand:SI 1))
@@ -11406,7 +11435,7 @@
   [(set_attr "type" "call")])

 (define_insn "*sibcall_pop"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "Uzm"))
+  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "UzB"))
      (match_operand 1))
    (set (reg:SI SP_REG)
     (plus:SI (reg:SI SP_REG)
@@ -11451,7 +11480,7 @@

 (define_insn "*sibcall_value"
   [(set (match_operand 0)
-    (call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "Uzm"))
+    (call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "UzB"))
           (match_operand 2)))]
   "SIBLING_CALL_P (insn)"
   "* return ix86_output_call_insn (insn, operands[1]);"
@@ -11494,7 +11523,7 @@

 (define_insn "*sibcall_value_pop"
   [(set (match_operand 0)
-    (call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "Uzm"))
+    (call (mem:QI (match_operand:SI 1 "sibcall_insn_operand" "UzB"))
           (match_operand 2)))
    (set (reg:SI SP_REG)
     (plus:SI (reg:SI SP_REG)
Index: gcc.target/i386/sibcall-6.c
===================================================================
--- gcc.target/i386/sibcall-6.c    (Revision 0)
+++ gcc.target/i386/sibcall-6.c    (Arbeitskopie)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2" } */
+
+typedef void *ira_loop_tree_node_t;
+
+extern int end (int);
+extern int doo (int);
+
+void
+ira_traverse_loop_tree (int bb_p, ira_loop_tree_node_t loop_node,
+                        void (*preorder_func) (ira_loop_tree_node_t),
+                        void (*postorder_func) (ira_loop_tree_node_t))
+{
+  int l, r = 0x1, h = 0, j = 0;
+
+  if (preorder_func)
+    (*preorder_func) (loop_node);
+
+  if (bb_p)
+    {
+      for (l = 0; l < end (l); l++)
+        {
+          r += doo (l);
+          h += (l + 1) * 3;
+          h %= (l + 1);
+          r -= doo (h);
+          j += (l + 1) * 7;
+          j %= (l + 1);
+          r += doo (j);
+        }
+    }
+
+  if (postorder_func != NULL)
+    (*postorder_func) (loop_node);
+}
+/* { dg-final { scan-assembler "jmp[ \t]*.%eax" } } */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-31  9:09 ` Kai Tietz
@ 2014-05-31 10:15   ` Kai Tietz
  2014-05-31 11:58     ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-31 10:15 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Kai Tietz, H.J. Lu, Richard Henderson, Jeff Law, Jakub Jelinek,
	gcc-patches

That patch won't help here.  My tests showing here a different
problem.  For -O2 or above this issue happens.

For -O2 the following code will be generated.
        movl    28(%esp), %eax
        testl   %eax, %eax
        je      L16
        movl    %ebp, 64(%esp)
        addl    $44, %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        jmp     *28(%esp)
Question here is why it tries to use memory-address here instead of
simply using eax-register, which is already loaded.

A small testcase for that is:

typedef void *ira_loop_tree_node_t;
#define NULL 0

extern int end (int);
extern int doo (int);

void
ira_traverse_loop_tree (bool bb_p, ira_loop_tree_node_t loop_node,
                        void (*preorder_func) (ira_loop_tree_node_t),
                        void (*postorder_func) (ira_loop_tree_node_t))
{
  int l, r = 0x1, h = 0, j = 0;

  if (preorder_func != NULL)
    (*preorder_func) (loop_node);

  if (bb_p)
    {
      for (l = 0; l < end (l); l++)
        {
          r += doo (l);
          h += (l + 1) * 3;
          h %= (l + 1);
          r -= doo (h);
          j += (l + 1) * 7;
          j %= (l + 1);
          r += doo (j);
        }
    }

  if (postorder_func != NULL)
    (*postorder_func) (loop_node);
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
  2014-05-30 23:07 Bernd Edlinger
@ 2014-05-31  9:09 ` Kai Tietz
  2014-05-31 10:15   ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Kai Tietz @ 2014-05-31  9:09 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Kai Tietz, H.J. Lu, Richard Henderson, Jeff Law, Jakub Jelinek,
	gcc-patches

Hmm,  this might cause by allowing CONST-memory-addresses.  This was
doubtful from the beginning.
It would be helpful to see here for the case a final rtl-dump.

Could you try if following patch (it is incomplete as it disregards
plus-expression) solves your bootstrap issue?

Regards,
Kai

Index: predicates.md
===================================================================
--- predicates.md       (Revision 211102)
+++ predicates.md       (Arbeitskopie)
@@ -74,6 +74,9 @@
 (define_predicate "sibcall_memory_operand"
   (match_operand 0 "memory_operand")
 {
+  op = XEXP (op, 0);
+  if (GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
   return CONSTANT_P (XEXP (op, 0));
 })

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch i386]: Expand sibling-tail-calls via accumulator register
@ 2014-05-30 23:07 Bernd Edlinger
  2014-05-31  9:09 ` Kai Tietz
  0 siblings, 1 reply; 39+ messages in thread
From: Bernd Edlinger @ 2014-05-30 23:07 UTC (permalink / raw)
  To: Kai Tietz
  Cc: H.J. Lu, Richard Henderson, Jeff Law, Jakub Jelinek, gcc-patches

Hi Kai,

this patch also mis-compiles binuitls-2.24 on x86_64.

In the function walk_wild_consider_section (ld/ldlang.c)
a tail-call gets miscompiled:

The stack frame is cleaned up, but now the jump target is invalid.

   0x000000000040c801 <+193>:    add    $0x28,%rsp
   0x000000000040c805 <+197>:    mov    %r13,%rsi
   0x000000000040c808 <+200>:    pop    %rbx
   0x000000000040c809 <+201>:    mov    %r14,%rdi
   0x000000000040c80c <+204>:    pop    %rbp
   0x000000000040c80d <+205>:    pop    %r12
   0x000000000040c80f <+207>:    pop    %r13
   0x000000000040c811 <+209>:    pop    %r14
   0x000000000040c813 <+211>:    pop    %r15
   0x000000000040c815 <+213>:    jmpq   *0x10(%rsp)

before the patch the sequence did save the jump target in rax:

   0x000000000040c801 <+193>:    mov    0x10(%rsp),%rax
   0x000000000040c806 <+198>:    add    $0x28,%rsp
   0x000000000040c80a <+202>:    pop    %rbx
   0x000000000040c80b <+203>:    mov    %r13,%rsi
   0x000000000040c80e <+206>:    mov    %r14,%rdi
   0x000000000040c811 <+209>:    pop    %rbp
   0x000000000040c812 <+210>:    pop    %r12
   0x000000000040c814 <+212>:    pop    %r13
   0x000000000040c816 <+214>:    pop    %r14
   0x000000000040c818 <+216>:    pop    %r15
   0x000000000040c81a <+218>:    jmpq   *%rax


Regards
Bernd.
 		 	   		  

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-07-09 17:49 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-22 21:33 [patch i386]: Expand sibling-tail-calls via accumulator register Kai Tietz
2014-05-22 22:07 ` H.J. Lu
2014-05-23 18:04   ` Jeff Law
2014-05-26 18:20     ` Kai Tietz
2014-05-26 18:32       ` Jakub Jelinek
2014-05-26 19:22         ` Kai Tietz
2014-05-26 19:35           ` Jakub Jelinek
2014-05-26 20:32             ` Kai Tietz
2014-05-27 15:39               ` Jeff Law
2014-05-27 15:51                 ` Kai Tietz
2014-05-27 16:22                 ` Richard Henderson
2014-05-27 16:48                   ` Jeff Law
2014-05-27 17:26                     ` Richard Henderson
2014-05-28  8:43               ` Kai Tietz
2014-05-28 15:42                 ` H.J. Lu
2014-05-28 17:22                 ` Richard Henderson
2014-05-28 19:13                   ` Jeff Law
2014-05-28 21:28                     ` Kai Tietz
2014-05-28 21:52                       ` Jakub Jelinek
2014-05-28 21:55                         ` Jeff Law
2014-05-28 22:03                           ` Jakub Jelinek
2014-05-28 22:14                             ` Kai Tietz
2014-05-29  3:57                           ` Richard Henderson
2014-05-30  8:08                             ` Kai Tietz
2014-05-30 15:56                               ` Richard Henderson
2014-05-30 16:51                                 ` Kai Tietz
2014-05-30 17:15                                   ` Jeff Law
2014-05-30 22:05                                   ` H.J. Lu
2014-05-30 22:46                                     ` H.J. Lu
2014-05-30 23:09                 ` Tom de Vries
2014-07-06 12:49                 ` Iain Sandoe
2014-05-23 18:16 ` Jeff Law
2014-05-27 15:51 ` Richard Henderson
2014-05-30 23:07 Bernd Edlinger
2014-05-31  9:09 ` Kai Tietz
2014-05-31 10:15   ` Kai Tietz
2014-05-31 11:58     ` Kai Tietz
2014-05-31 12:22       ` Kai Tietz
2014-07-09 17:49 Dominique Dhumieres

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).