public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
@ 2015-03-19 14:39 Kyrill Tkachov
  2015-03-27 10:06 ` Kyrill Tkachov
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-03-19 14:39 UTC (permalink / raw)
  To: GCC Patches; +Cc: Honggyu Kim

[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]

Hi all,

This patch fixes PR 65358. For details look at the excellent write-up
by Honggyu in bugzilla. The problem is that we're trying to pass a struct
partially on the stack and partially in regs during a tail-call optimisation
but the struct we're passing is also a partial incoming arg though the split
between stack and regs is different from its outgoing usage.

The emit_push_insn code ends up doing a block move for the on-stack part but
ends up overwriting the part that needs to be loaded into regs.
My first thought was to just load the regs part first and then do the stack
part but that doesn't work as multiple comments in that function indicate
(the block move being expanded to movmem or other functions being one of the
reasons).

My proposed solution is to detect when the overlap happens, find the
overlapping region and load it before the stack pushing into pseudos and
after the stack pushing is done move the overlapping values from the pseudos
into the hard argument regs that they're supposed to go.

That way this new functionality should only ever be triggered when there's
the overlap in this PR (causing wrong-code) and shouldn't affect codegen
anywhere else.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
and x86_64-linux-gnu.

According to the PR this appears at least as far back 4.6 so this isn't a
regression on the release branches, but it is a wrong-code bug.

I'll let Honggyu upstream the testcase separately
(https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)

I'll be testing this on the 4.8 and 4.9 branches.
Thoughts on this approach?

Thanks,
Kyrill

2015-03-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/65358
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: expr.patch --]
[-- Type: text/x-patch; name=expr.patch, Size: 3995 bytes --]

commit 490c5f2074d76a2927afaea99e4dd0bacccb413c
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/expr.c b/gcc/expr.c
index dc13a14..d3b9156 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* Add SIZE to X and check whether it's greater than Y.
+   If it is, return the constant amount by which it's greater or smaller.
+   If the two are not statically comparable (for example, X and Y contain
+   different registers) return -1.  This is used in expand_push_insn to
+   figure out if reading SIZE bytes from location X will end up reading from
+   location Y.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  return INTVAL (sub);
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
@@ -4179,6 +4198,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4332,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else
+		overlapping = 0;
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4411,9 +4463,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4472,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-03-19 14:39 [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall Kyrill Tkachov
@ 2015-03-27 10:06 ` Kyrill Tkachov
  2015-03-29 11:29 ` Honggyu Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-03-27 10:06 UTC (permalink / raw)
  To: GCC Patches; +Cc: Honggyu Kim

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html
Thanks,
Kyrill

On 19/03/15 14:39, Kyrill Tkachov wrote:
> Hi all,
>
> This patch fixes PR 65358. For details look at the excellent write-up
> by Honggyu in bugzilla. The problem is that we're trying to pass a struct
> partially on the stack and partially in regs during a tail-call optimisation
> but the struct we're passing is also a partial incoming arg though the split
> between stack and regs is different from its outgoing usage.
>
> The emit_push_insn code ends up doing a block move for the on-stack part but
> ends up overwriting the part that needs to be loaded into regs.
> My first thought was to just load the regs part first and then do the stack
> part but that doesn't work as multiple comments in that function indicate
> (the block move being expanded to movmem or other functions being one of the
> reasons).
>
> My proposed solution is to detect when the overlap happens, find the
> overlapping region and load it before the stack pushing into pseudos and
> after the stack pushing is done move the overlapping values from the pseudos
> into the hard argument regs that they're supposed to go.
>
> That way this new functionality should only ever be triggered when there's
> the overlap in this PR (causing wrong-code) and shouldn't affect codegen
> anywhere else.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
> and x86_64-linux-gnu.
>
> According to the PR this appears at least as far back 4.6 so this isn't a
> regression on the release branches, but it is a wrong-code bug.
>
> I'll let Honggyu upstream the testcase separately
> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)
After guidance from Jeff, I'll take this testcase in as well if this is 
approved.

>
> I'll be testing this on the 4.8 and 4.9 branches.
> Thoughts on this approach?
>
> Thanks,
> Kyrill
>
> 2015-03-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR middle-end/65358
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-03-19 14:39 [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall Kyrill Tkachov
  2015-03-27 10:06 ` Kyrill Tkachov
@ 2015-03-29 11:29 ` Honggyu Kim
  2015-04-13 14:01 ` Kyrill Tkachov
  2015-04-17 17:26 ` Jeff Law
  3 siblings, 0 replies; 22+ messages in thread
From: Honggyu Kim @ 2015-03-29 11:29 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On Thu, Mar 19, 2015 at 02:39:11PM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch fixes PR 65358. For details look at the excellent write-up
> by Honggyu in bugzilla. The problem is that we're trying to pass a struct
> partially on the stack and partially in regs during a tail-call optimisation
> but the struct we're passing is also a partial incoming arg though the split
> between stack and regs is different from its outgoing usage.
> 
> The emit_push_insn code ends up doing a block move for the on-stack part but
> ends up overwriting the part that needs to be loaded into regs.
> My first thought was to just load the regs part first and then do the stack
> part but that doesn't work as multiple comments in that function indicate
> (the block move being expanded to movmem or other functions being one of the
> reasons).
> 
> My proposed solution is to detect when the overlap happens, find the
> overlapping region and load it before the stack pushing into pseudos and
> after the stack pushing is done move the overlapping values from the pseudos
> into the hard argument regs that they're supposed to go.
> 
> That way this new functionality should only ever be triggered when there's
> the overlap in this PR (causing wrong-code) and shouldn't affect codegen
> anywhere else.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
> and x86_64-linux-gnu.
> 
> According to the PR this appears at least as far back 4.6 so this isn't a
> regression on the release branches, but it is a wrong-code bug.
> 
> I'll let Honggyu upstream the testcase separately
> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)
> 
> I'll be testing this on the 4.8 and 4.9 branches.
> Thoughts on this approach?
> 
> Thanks,
> Kyrill
> 
Hi, Kyrill

I have verified the generated assembly code and tested on the target board.
PR 65358 testcase works fine now with your patch.
https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html

Here is the generated assembly code of PR 65358:

foo:    (without this patch)    |    (with this patch)
        sub     sp, sp, #8      |    sub     sp, sp, #8
        mov     r0, r1          |    mov     r0, r1
        str     lr, [sp, #-4]!  |    str     lr, [sp, #-4]!
        add     ip, sp, #8      |    mov     r1, r2
(1)     ldr     lr, [sp, #16]   |    ldr     lr, [sp, #16]
        mov     r1, r2          |    mov     r2, r3
        str     r3, [sp, #8]    |    ldr     ip, [sp, #12]
(2)     str     lr, [sp, #12]   |    str     r3, [sp, #8]
        ldr     lr, [sp], #4    |    str     lr, [sp, #12]
        ldmia   ip, {r2, r3}    |    ldr     lr, [sp], #4
        add     sp, sp, #8      |    mov     r3, ip
        b       bar             |    add     sp, sp, #8
                                |    b       bar
 
One the left side(previous code), (1) loads "p.killer", then (2) overwrites
"p.victim" value.
Until this point, "p.victim" is never copied anyway, which makes the
value disappear.

But this bug is clearly fixed with this patch as shown on the right side.

I have tested on x86_64 and got the working code regardless of having
this patch.

I appreciate your patch, Kyrill.

Honggyu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-03-19 14:39 [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall Kyrill Tkachov
  2015-03-27 10:06 ` Kyrill Tkachov
  2015-03-29 11:29 ` Honggyu Kim
@ 2015-04-13 14:01 ` Kyrill Tkachov
  2015-04-13 16:33   ` Jeff Law
  2015-04-17 17:26 ` Jeff Law
  3 siblings, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-13 14:01 UTC (permalink / raw)
  To: GCC Patches; +Cc: Honggyu Kim, Jeff Law

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html

Jeff, could you help review this patch?
Or could you point me to someone who can review this?
I can't figure out from MAINTAINERS who should be in charge of this part of the compiler.

Thanks,
Kyrill

On 19/03/15 14:39, Kyrill Tkachov wrote:
> Hi all,
>
> This patch fixes PR 65358. For details look at the excellent write-up
> by Honggyu in bugzilla. The problem is that we're trying to pass a struct
> partially on the stack and partially in regs during a tail-call optimisation
> but the struct we're passing is also a partial incoming arg though the split
> between stack and regs is different from its outgoing usage.
>
> The emit_push_insn code ends up doing a block move for the on-stack part but
> ends up overwriting the part that needs to be loaded into regs.
> My first thought was to just load the regs part first and then do the stack
> part but that doesn't work as multiple comments in that function indicate
> (the block move being expanded to movmem or other functions being one of the
> reasons).
>
> My proposed solution is to detect when the overlap happens, find the
> overlapping region and load it before the stack pushing into pseudos and
> after the stack pushing is done move the overlapping values from the pseudos
> into the hard argument regs that they're supposed to go.
>
> That way this new functionality should only ever be triggered when there's
> the overlap in this PR (causing wrong-code) and shouldn't affect codegen
> anywhere else.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
> and x86_64-linux-gnu.
>
> According to the PR this appears at least as far back 4.6 so this isn't a
> regression on the release branches, but it is a wrong-code bug.
>
> I'll let Honggyu upstream the testcase separately
> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)
>
> I'll be testing this on the 4.8 and 4.9 branches.
> Thoughts on this approach?
>
> Thanks,
> Kyrill
>
> 2015-03-19  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR middle-end/65358
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-13 14:01 ` Kyrill Tkachov
@ 2015-04-13 16:33   ` Jeff Law
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Law @ 2015-04-13 16:33 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 04/13/2015 08:01 AM, Kyrill Tkachov wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html
>
> Jeff, could you help review this patch?
> Or could you point me to someone who can review this?
> I can't figure out from MAINTAINERS who should be in charge of this part
> of the compiler.
It's in the queue of things to look at.  It wasn't marked as a 
regression and thus I deferred it.  With stage1 opening up we'll be 
working through the queue of deferred stuff shortly.

jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-03-19 14:39 [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall Kyrill Tkachov
                   ` (2 preceding siblings ...)
  2015-04-13 14:01 ` Kyrill Tkachov
@ 2015-04-17 17:26 ` Jeff Law
  2015-04-20  8:25   ` Kyrill Tkachov
  3 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-04-17 17:26 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 03/19/2015 08:39 AM, Kyrill Tkachov wrote:
> Hi all,
>
> This patch fixes PR 65358. For details look at the excellent write-up
> by Honggyu in bugzilla. The problem is that we're trying to pass a struct
> partially on the stack and partially in regs during a tail-call
> optimisation
> but the struct we're passing is also a partial incoming arg though the
> split
> between stack and regs is different from its outgoing usage.
>
> The emit_push_insn code ends up doing a block move for the on-stack part
> but
> ends up overwriting the part that needs to be loaded into regs.
> My first thought was to just load the regs part first and then do the stack
> part but that doesn't work as multiple comments in that function indicate
> (the block move being expanded to movmem or other functions being one of
> the
> reasons).
>
> My proposed solution is to detect when the overlap happens, find the
> overlapping region and load it before the stack pushing into pseudos and
> after the stack pushing is done move the overlapping values from the
> pseudos
> into the hard argument regs that they're supposed to go.
>
> That way this new functionality should only ever be triggered when there's
> the overlap in this PR (causing wrong-code) and shouldn't affect codegen
> anywhere else.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
> and x86_64-linux-gnu.
>
> According to the PR this appears at least as far back 4.6 so this isn't a
> regression on the release branches, but it is a wrong-code bug.
>
> I'll let Honggyu upstream the testcase separately
> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)
>
> I'll be testing this on the 4.8 and 4.9 branches.
> Thoughts on this approach?
>
> Thanks,
> Kyrill
>
> 2015-03-19  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      PR middle-end/65358
>      * expr.c (memory_load_overlap): New function.
>      (emit_push_insn): When pushing partial args to the stack would
>      clobber the register part load the overlapping part into a pseudo
>      and put it into the hard reg after pushing.
>
> expr.patch
>
>
> commit 490c5f2074d76a2927afaea99e4dd0bacccb413c
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Wed Mar 18 13:42:37 2015 +0000
>
>      [expr.c] PR 65358 Avoid clobbering partial argument during sibcall
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index dc13a14..d3b9156 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
>   }
>   #endif
>
> +/* Add SIZE to X and check whether it's greater than Y.
> +   If it is, return the constant amount by which it's greater or smaller.
> +   If the two are not statically comparable (for example, X and Y contain
> +   different registers) return -1.  This is used in expand_push_insn to
> +   figure out if reading SIZE bytes from location X will end up reading from
> +   location Y.  */
> +
> +static int
> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> +{
> +  rtx tmp = plus_constant (Pmode, x, size);
> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
> +
> +  if (!CONST_INT_P (sub))
> +    return -1;
> +
> +  return INTVAL (sub);
> +}
Hmmm, so what happens if the difference is < 0?   I'd be a bit worried 
about that case for the PA (for example).

So how about asserting that the INTVAL is >= 0 prior to returning so 
that we catch that case if it ever occurs?

OK for the trunk with the added assert.  Please commit the testcase from 
Honggyu at the same time you commit the patch.

Let's let it simmer for a while on the trunk before considering it to be 
backported.

jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-17 17:26 ` Jeff Law
@ 2015-04-20  8:25   ` Kyrill Tkachov
  2015-04-20 18:02     ` Jeff Law
  0 siblings, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-20  8:25 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim

Hi Jeff,

On 17/04/15 18:26, Jeff Law wrote:
> On 03/19/2015 08:39 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch fixes PR 65358. For details look at the excellent write-up
>> by Honggyu in bugzilla. The problem is that we're trying to pass a struct
>> partially on the stack and partially in regs during a tail-call
>> optimisation
>> but the struct we're passing is also a partial incoming arg though the
>> split
>> between stack and regs is different from its outgoing usage.
>>
>> The emit_push_insn code ends up doing a block move for the on-stack part
>> but
>> ends up overwriting the part that needs to be loaded into regs.
>> My first thought was to just load the regs part first and then do the stack
>> part but that doesn't work as multiple comments in that function indicate
>> (the block move being expanded to movmem or other functions being one of
>> the
>> reasons).
>>
>> My proposed solution is to detect when the overlap happens, find the
>> overlapping region and load it before the stack pushing into pseudos and
>> after the stack pushing is done move the overlapping values from the
>> pseudos
>> into the hard argument regs that they're supposed to go.
>>
>> That way this new functionality should only ever be triggered when there's
>> the overlap in this PR (causing wrong-code) and shouldn't affect codegen
>> anywhere else.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu
>> and x86_64-linux-gnu.
>>
>> According to the PR this appears at least as far back 4.6 so this isn't a
>> regression on the release branches, but it is a wrong-code bug.
>>
>> I'll let Honggyu upstream the testcase separately
>> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html)
>>
>> I'll be testing this on the 4.8 and 4.9 branches.
>> Thoughts on this approach?
>>
>> Thanks,
>> Kyrill
>>
>> 2015-03-19  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>       PR middle-end/65358
>>       * expr.c (memory_load_overlap): New function.
>>       (emit_push_insn): When pushing partial args to the stack would
>>       clobber the register part load the overlapping part into a pseudo
>>       and put it into the hard reg after pushing.
>>
>> expr.patch
>>
>>
>> commit 490c5f2074d76a2927afaea99e4dd0bacccb413c
>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>> Date:   Wed Mar 18 13:42:37 2015 +0000
>>
>>       [expr.c] PR 65358 Avoid clobbering partial argument during sibcall
>>
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index dc13a14..d3b9156 100644
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
>>    }
>>    #endif
>>
>> +/* Add SIZE to X and check whether it's greater than Y.
>> +   If it is, return the constant amount by which it's greater or smaller.
>> +   If the two are not statically comparable (for example, X and Y contain
>> +   different registers) return -1.  This is used in expand_push_insn to
>> +   figure out if reading SIZE bytes from location X will end up reading from
>> +   location Y.  */
>> +
>> +static int
>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> +{
>> +  rtx tmp = plus_constant (Pmode, x, size);
>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>> +
>> +  if (!CONST_INT_P (sub))
>> +    return -1;
>> +
>> +  return INTVAL (sub);
>> +}
> Hmmm, so what happens if the difference is < 0?   I'd be a bit worried
> about that case for the PA (for example).
>
> So how about asserting that the INTVAL is >= 0 prior to returning so
> that we catch that case if it ever occurs?

INTVAL being >= 0 is the case that I want to catch with this function.
INTVAL <0 is the usual case on leaf call optimisation. On arm, at least,
it means that x and y use the same base register (i.e. same stack frame)
but the offsets are such that reading SIZE bytes from X will not overlap
with Y, thus not requiring the workaround in this patch.
Thus, asserting that the result is positive is not right here.

What characteristic on pa makes this problematic? Is it the STACK_GROWS_UPWARD?
Should I then extend this function to do something like:

HOST_WIDE_INT res = INTVAL (sub);
#ifndef STACK_GROWS_DOWNWARD
res = -res;
#endif

return res?



>
> OK for the trunk with the added assert.  Please commit the testcase from
> Honggyu at the same time you commit the patch.

Thanks, will do after the above is resolved.

Kyrill

>
> Let's let it simmer for a while on the trunk before considering it to be
> backported.
>
> jeff
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-20  8:25   ` Kyrill Tkachov
@ 2015-04-20 18:02     ` Jeff Law
  2015-04-21  8:30       ` Kyrill Tkachov
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-04-20 18:02 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 04/20/2015 02:25 AM, Kyrill Tkachov wrote:
> Hi Jeff,
>> Hmmm, so what happens if the difference is < 0?   I'd be a bit worried
>> about that case for the PA (for example).
>>
>> So how about asserting that the INTVAL is >= 0 prior to returning so
>> that we catch that case if it ever occurs?
>
> INTVAL being >= 0 is the case that I want to catch with this function.
> INTVAL <0 is the usual case on leaf call optimisation. On arm, at least,
> it means that x and y use the same base register (i.e. same stack frame)
> but the offsets are such that reading SIZE bytes from X will not overlap
> with Y, thus not requiring the workaround in this patch.
> Thus, asserting that the result is positive is not right here.
>
> What characteristic on pa makes this problematic? Is it the
> STACK_GROWS_UPWARD?
Yea or more correctly that {STACK,FRAME}_GROWS_UPWARD and 
ARGS_GROW_DOWNWARD.  I think the stormy16 may have downward growing args 
too.


> Should I then extend this function to do something like:
>
> HOST_WIDE_INT res = INTVAL (sub);
> #ifndef STACK_GROWS_DOWNWARD
> res = -res;
> #endif
>
> return res?
It certainly feels like something is needed for targets where growth is 
in the opposite direction -- but my guess is that without a concrete 
case that triggers on those targets (just the PA in 64 bit mode and 
stormy?) we'll probably get it wrong in one way or another.  Hence my 
suggestion that we assert rather than try to handle it and silently 
generate incorrect code in the process.


Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-20 18:02     ` Jeff Law
@ 2015-04-21  8:30       ` Kyrill Tkachov
  2015-04-21 14:09         ` Jeff Law
  0 siblings, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-21  8:30 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim


On 20/04/15 19:02, Jeff Law wrote:
> On 04/20/2015 02:25 AM, Kyrill Tkachov wrote:
>> Hi Jeff,
>>> Hmmm, so what happens if the difference is < 0?   I'd be a bit worried
>>> about that case for the PA (for example).
>>>
>>> So how about asserting that the INTVAL is >= 0 prior to returning so
>>> that we catch that case if it ever occurs?
>> INTVAL being >= 0 is the case that I want to catch with this function.
>> INTVAL <0 is the usual case on leaf call optimisation. On arm, at least,
>> it means that x and y use the same base register (i.e. same stack frame)
>> but the offsets are such that reading SIZE bytes from X will not overlap
>> with Y, thus not requiring the workaround in this patch.
>> Thus, asserting that the result is positive is not right here.
>>
>> What characteristic on pa makes this problematic? Is it the
>> STACK_GROWS_UPWARD?
> Yea or more correctly that {STACK,FRAME}_GROWS_UPWARD and
> ARGS_GROW_DOWNWARD.  I think the stormy16 may have downward growing args
> too.
>
>
>> Should I then extend this function to do something like:
>>
>> HOST_WIDE_INT res = INTVAL (sub);
>> #ifndef STACK_GROWS_DOWNWARD
>> res = -res;
>> #endif
>>
>> return res?
> It certainly feels like something is needed for targets where growth is
> in the opposite direction -- but my guess is that without a concrete
> case that triggers on those targets (just the PA in 64 bit mode and
> stormy?) we'll probably get it wrong in one way or another.  Hence my
> suggestion that we assert rather than try to handle it and silently
> generate incorrect code in the process.

However, this function is expected to return negative numbers
when there is no overlap i.e. in the vast majority of cases when this
bug doesn't manifest. So asserting that it's positive is just
going to ICE at -O2 in almost any code.

 From reading config/stormy16/stormy-abi it seems to me that we don't
pass arguments partially in stormy16, so this code would never be called
there. That leaves pa as the potential problematic target.
I don't suppose there's an easy way to test on pa? My checkout of binutils
doesn't seem to include a sim target for it.

Kyrill

>
>
> Jeff
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-21  8:30       ` Kyrill Tkachov
@ 2015-04-21 14:09         ` Jeff Law
  2015-04-21 17:33           ` Kyrill Tkachov
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-04-21 14:09 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>
>  From reading config/stormy16/stormy-abi it seems to me that we don't
> pass arguments partially in stormy16, so this code would never be called
> there. That leaves pa as the potential problematic target.
> I don't suppose there's an easy way to test on pa? My checkout of binutils
> doesn't seem to include a sim target for it.
No simulator, no machines in the testfarm, the box I had access to via 
parisc-linux.org seems dead and my ancient PA overheats well before a 
bootstrap could complete.  I often regret knowing about the backwards 
way many things were done on the PA because it makes me think about 
cases that only matter on dead architectures.


Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-21 14:09         ` Jeff Law
@ 2015-04-21 17:33           ` Kyrill Tkachov
  2015-04-22 11:51             ` Kyrill Tkachov
  2015-04-27 20:13             ` Jeff Law
  0 siblings, 2 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-21 17:33 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim


On 21/04/15 15:09, Jeff Law wrote:
> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>   From reading config/stormy16/stormy-abi it seems to me that we don't
>> pass arguments partially in stormy16, so this code would never be called
>> there. That leaves pa as the potential problematic target.
>> I don't suppose there's an easy way to test on pa? My checkout of binutils
>> doesn't seem to include a sim target for it.
> No simulator, no machines in the testfarm, the box I had access to via
> parisc-linux.org seems dead and my ancient PA overheats well before a
> bootstrap could complete.  I often regret knowing about the backwards
> way many things were done on the PA because it makes me think about
> cases that only matter on dead architectures.

So what should be the action plan here? I can't add an assert on
positive result as a negative result is valid.

We want to catch the case where this would cause trouble on
pa, or change the patch until we're confident that it's fine
for pa.

That being said, reading the documentation of STACK_GROWS_UPWARD
and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
where this would cause trouble on pa.

Is the problem that in the function:

+/* Add SIZE to X and check whether it's greater than Y.
+   If it is, return the constant amount by which it's greater or smaller.
+   If the two are not statically comparable (for example, X and Y contain
+   different registers) return -1.  This is used in expand_push_insn to
+   figure out if reading SIZE bytes from location X will end up reading from
+   location Y.  */
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  return INTVAL (sub);
+}

for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
so the function should something like the following?

static int
memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
{
#ifdef ARGS_GROW_DOWNWARD
   rtx tmp = plus_constant (Pmode, x, -size);
#else
   rtx tmp = plus_constant (Pmode, x, size);
#endif
   rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);

   if (!CONST_INT_P (sub))
     return -1;

#ifdef ARGS_GROW_DOWNWARD
   return INTVAL (-sub);
#else
   return INTVAL (sub);
#endif
}

now, say for x == sp + 4,  y == sp + 8, size == 16:
This would be a problematic case for arm, so this code on arm
(where ARGS_GROW_DOWNWARD is *not* defined) would return
12, which is the number of bytes that overlap.

On a target where ARGS_GROW_DOWNWARD is defined this would return
-20, meaning that no overlap occurs (because we read in the descending
direction from x, IIUC).


Thanks,
Kyrill

>
>
> Jeff
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-21 17:33           ` Kyrill Tkachov
@ 2015-04-22 11:51             ` Kyrill Tkachov
  2015-04-27 10:12               ` Kyrill Tkachov
  2015-04-27 20:13             ` Jeff Law
  1 sibling, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-22 11:51 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim

[-- Attachment #1: Type: text/plain, Size: 4297 bytes --]


On 21/04/15 18:33, Kyrill Tkachov wrote:
> On 21/04/15 15:09, Jeff Law wrote:
>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>    From reading config/stormy16/stormy-abi it seems to me that we don't
>>> pass arguments partially in stormy16, so this code would never be called
>>> there. That leaves pa as the potential problematic target.
>>> I don't suppose there's an easy way to test on pa? My checkout of binutils
>>> doesn't seem to include a sim target for it.
>> No simulator, no machines in the testfarm, the box I had access to via
>> parisc-linux.org seems dead and my ancient PA overheats well before a
>> bootstrap could complete.  I often regret knowing about the backwards
>> way many things were done on the PA because it makes me think about
>> cases that only matter on dead architectures.
> So what should be the action plan here? I can't add an assert on
> positive result as a negative result is valid.
>
> We want to catch the case where this would cause trouble on
> pa, or change the patch until we're confident that it's fine
> for pa.
>
> That being said, reading the documentation of STACK_GROWS_UPWARD
> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
> where this would cause trouble on pa.
>
> Is the problem that in the function:
>
> +/* Add SIZE to X and check whether it's greater than Y.
> +   If it is, return the constant amount by which it's greater or smaller.
> +   If the two are not statically comparable (for example, X and Y contain
> +   different registers) return -1.  This is used in expand_push_insn to
> +   figure out if reading SIZE bytes from location X will end up reading from
> +   location Y.  */
> +static int
> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> +{
> +  rtx tmp = plus_constant (Pmode, x, size);
> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
> +
> +  if (!CONST_INT_P (sub))
> +    return -1;
> +
> +  return INTVAL (sub);
> +}
>
> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
> so the function should something like the following?
>
> static int
> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> {
> #ifdef ARGS_GROW_DOWNWARD
>     rtx tmp = plus_constant (Pmode, x, -size);
> #else
>     rtx tmp = plus_constant (Pmode, x, size);
> #endif
>     rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>
>     if (!CONST_INT_P (sub))
>       return -1;
>
> #ifdef ARGS_GROW_DOWNWARD
>     return INTVAL (-sub);
> #else
>     return INTVAL (sub);
> #endif
> }
>
> now, say for x == sp + 4,  y == sp + 8, size == 16:
> This would be a problematic case for arm, so this code on arm
> (where ARGS_GROW_DOWNWARD is *not* defined) would return
> 12, which is the number of bytes that overlap.
>
> On a target where ARGS_GROW_DOWNWARD is defined this would return
> -20, meaning that no overlap occurs (because we read in the descending
> direction from x, IIUC).
Hi Jeff,

Here's an attempt to make this more concrete.
Only the memory_load_overlap function has changed.
This time I tried to take into account the case when
ARGS_GROW_DOWNWARD.

Take the case where x == sp, y == sp + 8, size == 16.
For arm, this would return 8 as that is the number of bytes
that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
would return -1 as we're reading down from x rather than up
towards y.

In the case when x == sp + 8, y == sp, size == 16
This would return -1 on arm since we're reading upwards from x
and thefore no overlap would happen.

On pa, this would return 8, which I think is the right thing.
But again, I don't have access to any pa means of testing.

What do you think of this approach?

Thanks,
Kyrill

P.S. I've included the testcase from Honggyu in the patch.

2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.

>
>
> Thanks,
> Kyrill
>
>>
>> Jeff
>>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: expr.patch --]
[-- Type: text/x-patch; name=expr.patch, Size: 4795 bytes --]

commit 39c9cb0bff1088cffaf31208b9a735d2bc0c505a
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/expr.c b/gcc/expr.c
index 530a944..ccf94e0 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,36 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   in all other cases.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x,
+#ifdef ARGS_GROW_DOWNWARD
+			   -size);
+#else
+			   size);
+#endif
+
+
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  HOST_WIDE_INT val =
+#ifdef ARGS_GROW_DOWNWARD
+    -INTVAL (sub);
+#else
+     INTVAL (sub);
+#endif
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
@@ -4179,6 +4209,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4343,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else
+		overlapping = 0;
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4411,9 +4474,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4483,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-22 11:51             ` Kyrill Tkachov
@ 2015-04-27 10:12               ` Kyrill Tkachov
  2015-04-27 13:16                 ` John David Anglin
  0 siblings, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-27 10:12 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim, dave.anglin


On 22/04/15 12:51, Kyrill Tkachov wrote:
> On 21/04/15 18:33, Kyrill Tkachov wrote:
>> On 21/04/15 15:09, Jeff Law wrote:
>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>     From reading config/stormy16/stormy-abi it seems to me that we don't
>>>> pass arguments partially in stormy16, so this code would never be called
>>>> there. That leaves pa as the potential problematic target.
>>>> I don't suppose there's an easy way to test on pa? My checkout of binutils
>>>> doesn't seem to include a sim target for it.
>>> No simulator, no machines in the testfarm, the box I had access to via
>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>> bootstrap could complete.  I often regret knowing about the backwards
>>> way many things were done on the PA because it makes me think about
>>> cases that only matter on dead architectures.
>> So what should be the action plan here? I can't add an assert on
>> positive result as a negative result is valid.
>>
>> We want to catch the case where this would cause trouble on
>> pa, or change the patch until we're confident that it's fine
>> for pa.
>>
>> That being said, reading the documentation of STACK_GROWS_UPWARD
>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>> where this would cause trouble on pa.
>>
>> Is the problem that in the function:
>>
>> +/* Add SIZE to X and check whether it's greater than Y.
>> +   If it is, return the constant amount by which it's greater or smaller.
>> +   If the two are not statically comparable (for example, X and Y contain
>> +   different registers) return -1.  This is used in expand_push_insn to
>> +   figure out if reading SIZE bytes from location X will end up reading from
>> +   location Y.  */
>> +static int
>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> +{
>> +  rtx tmp = plus_constant (Pmode, x, size);
>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>> +
>> +  if (!CONST_INT_P (sub))
>> +    return -1;
>> +
>> +  return INTVAL (sub);
>> +}
>>
>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>> so the function should something like the following?
>>
>> static int
>> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> {
>> #ifdef ARGS_GROW_DOWNWARD
>>      rtx tmp = plus_constant (Pmode, x, -size);
>> #else
>>      rtx tmp = plus_constant (Pmode, x, size);
>> #endif
>>      rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>
>>      if (!CONST_INT_P (sub))
>>        return -1;
>>
>> #ifdef ARGS_GROW_DOWNWARD
>>      return INTVAL (-sub);
>> #else
>>      return INTVAL (sub);
>> #endif
>> }
>>
>> now, say for x == sp + 4,  y == sp + 8, size == 16:
>> This would be a problematic case for arm, so this code on arm
>> (where ARGS_GROW_DOWNWARD is *not* defined) would return
>> 12, which is the number of bytes that overlap.
>>
>> On a target where ARGS_GROW_DOWNWARD is defined this would return
>> -20, meaning that no overlap occurs (because we read in the descending
>> direction from x, IIUC).
> Hi Jeff,
>
> Here's an attempt to make this more concrete.
> Only the memory_load_overlap function has changed.
> This time I tried to take into account the case when
> ARGS_GROW_DOWNWARD.
>
> Take the case where x == sp, y == sp + 8, size == 16.
> For arm, this would return 8 as that is the number of bytes
> that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
> would return -1 as we're reading down from x rather than up
> towards y.
>
> In the case when x == sp + 8, y == sp, size == 16
> This would return -1 on arm since we're reading upwards from x
> and thefore no overlap would happen.
>
> On pa, this would return 8, which I think is the right thing.
> But again, I don't have access to any pa means of testing.
>
> What do you think of this approach?

Hi Dave,

Would it be possible for you to test this patch on a 64-bit hppa
or at least bootstrap it?
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html

There is a concern that it may potentially affect the passing of
complex arguments partially on the stack and partially in regs
on pa because of the way the args and stack grow on that target.

Unfortunately I don't have access to any hardware or simulators.
It would help a lot with getting this patch in.

Thanks,
Kyrill


>
> Thanks,
> Kyrill
>
> P.S. I've included the testcase from Honggyu in the patch.
>
> 2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/65358
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.
>
> 2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>       PR target/65358
>       * gcc.dg/pr65358.c: New test.
>
>>
>> Thanks,
>> Kyrill
>>
>>> Jeff
>>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-27 10:12               ` Kyrill Tkachov
@ 2015-04-27 13:16                 ` John David Anglin
  2015-05-06 18:57                   ` John David Anglin
  0 siblings, 1 reply; 22+ messages in thread
From: John David Anglin @ 2015-04-27 13:16 UTC (permalink / raw)
  To: Kyrill Tkachov, Jeff Law, GCC Patches; +Cc: Honggyu Kim

On 2015-04-27 6:12 AM, Kyrill Tkachov wrote:
>
> On 22/04/15 12:51, Kyrill Tkachov wrote:
>> On 21/04/15 18:33, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>     From reading config/stormy16/stormy-abi it seems to me that we 
>>>>> don't
>>>>> pass arguments partially in stormy16, so this code would never be 
>>>>> called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of 
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or 
>>> smaller.
>>> +   If the two are not statically comparable (for example, X and Y 
>>> contain
>>> +   different registers) return -1.  This is used in 
>>> expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up 
>>> reading from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>>>
>>> static int
>>> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> {
>>> #ifdef ARGS_GROW_DOWNWARD
>>>      rtx tmp = plus_constant (Pmode, x, -size);
>>> #else
>>>      rtx tmp = plus_constant (Pmode, x, size);
>>> #endif
>>>      rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>>
>>>      if (!CONST_INT_P (sub))
>>>        return -1;
>>>
>>> #ifdef ARGS_GROW_DOWNWARD
>>>      return INTVAL (-sub);
>>> #else
>>>      return INTVAL (sub);
>>> #endif
>>> }
>>>
>>> now, say for x == sp + 4,  y == sp + 8, size == 16:
>>> This would be a problematic case for arm, so this code on arm
>>> (where ARGS_GROW_DOWNWARD is *not* defined) would return
>>> 12, which is the number of bytes that overlap.
>>>
>>> On a target where ARGS_GROW_DOWNWARD is defined this would return
>>> -20, meaning that no overlap occurs (because we read in the descending
>>> direction from x, IIUC).
>> Hi Jeff,
>>
>> Here's an attempt to make this more concrete.
>> Only the memory_load_overlap function has changed.
>> This time I tried to take into account the case when
>> ARGS_GROW_DOWNWARD.
>>
>> Take the case where x == sp, y == sp + 8, size == 16.
>> For arm, this would return 8 as that is the number of bytes
>> that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
>> would return -1 as we're reading down from x rather than up
>> towards y.
>>
>> In the case when x == sp + 8, y == sp, size == 16
>> This would return -1 on arm since we're reading upwards from x
>> and thefore no overlap would happen.
>>
>> On pa, this would return 8, which I think is the right thing.
>> But again, I don't have access to any pa means of testing.
>>
>> What do you think of this approach?
>
> Hi Dave,
>
> Would it be possible for you to test this patch on a 64-bit hppa
> or at least bootstrap it?
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html
I started a build and test with your patch on hppa64-hp-hpux11.11 this 
morning.

>
> There is a concern that it may potentially affect the passing of
> complex arguments partially on the stack and partially in regs
> on pa because of the way the args and stack grow on that target.
>
> Unfortunately I don't have access to any hardware or simulators.
> It would help a lot with getting this patch in.
If you write to linux-parisc@vger.kernel.org, arrangements can be made for
an account on a Debian parisc linux machine for development testing.

Helge Deller has arranged for some new machines since we took over the
Debian buildd infrastructure  for parisc.  More info is here:
https://parisc.wiki.kernel.org/index.php/Main_Page

>
> Thanks,
> Kyrill
>
>
>>
>> Thanks,
>> Kyrill
>>
>> P.S. I've included the testcase from Honggyu in the patch.
>>
>> 2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/65358
>>       * expr.c (memory_load_overlap): New function.
>>       (emit_push_insn): When pushing partial args to the stack would
>>       clobber the register part load the overlapping part into a pseudo
>>       and put it into the hard reg after pushing.
>>
>> 2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>
>>
>>       PR target/65358
>>       * gcc.dg/pr65358.c: New test.
>>
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>> Jeff
>>>>
>
>
>

Regards,
Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-21 17:33           ` Kyrill Tkachov
  2015-04-22 11:51             ` Kyrill Tkachov
@ 2015-04-27 20:13             ` Jeff Law
  2015-04-28 10:19               ` Kyrill Tkachov
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-04-27 20:13 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>
> On 21/04/15 15:09, Jeff Law wrote:
>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>   From reading config/stormy16/stormy-abi it seems to me that we don't
>>> pass arguments partially in stormy16, so this code would never be called
>>> there. That leaves pa as the potential problematic target.
>>> I don't suppose there's an easy way to test on pa? My checkout of
>>> binutils
>>> doesn't seem to include a sim target for it.
>> No simulator, no machines in the testfarm, the box I had access to via
>> parisc-linux.org seems dead and my ancient PA overheats well before a
>> bootstrap could complete.  I often regret knowing about the backwards
>> way many things were done on the PA because it makes me think about
>> cases that only matter on dead architectures.
>
> So what should be the action plan here? I can't add an assert on
> positive result as a negative result is valid.
>
> We want to catch the case where this would cause trouble on
> pa, or change the patch until we're confident that it's fine
> for pa.
>
> That being said, reading the documentation of STACK_GROWS_UPWARD
> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
> where this would cause trouble on pa.
>
> Is the problem that in the function:
>
> +/* Add SIZE to X and check whether it's greater than Y.
> +   If it is, return the constant amount by which it's greater or smaller.
> +   If the two are not statically comparable (for example, X and Y contain
> +   different registers) return -1.  This is used in expand_push_insn to
> +   figure out if reading SIZE bytes from location X will end up reading
> from
> +   location Y.  */
> +static int
> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> +{
> +  rtx tmp = plus_constant (Pmode, x, size);
> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
> +
> +  if (!CONST_INT_P (sub))
> +    return -1;
> +
> +  return INTVAL (sub);
> +}
>
> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
> so the function should something like the following?
So I had to go back and compile some simple examples.

References to outgoing arguments will be SP relative.  References to the 
incoming arguments will be ARGP relative.  And that brings me to the 
another issue.  Isn't X in this context the incoming argument slot and 
the destination an outgoing argument slot?

If so, the approach of memory_load_overlap simply won't work on a target 
with calling conventions like the PA.  And you might really want to 
consider punting for these kind of calling conventions

If you hadn't already done the work, I'd suggest punting for any case 
where we have args partially in regs and partially in memory :-)

More thoughts when I can get an hour or two to remind myself how all 
this stuff works on the PA.

I will note that testing on the PA is unlikely to show anything simply 
because it uses 8 parameter passing registers.  So it's rare to pass 
anything in memory at all.  Even rarer to have something partially in 
memory and partially in registers.



Jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-27 20:13             ` Jeff Law
@ 2015-04-28 10:19               ` Kyrill Tkachov
  2015-04-30 12:09                 ` Kyrill Tkachov
  2015-05-01 18:51                 ` Jeff Law
  0 siblings, 2 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-28 10:19 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim

[-- Attachment #1: Type: text/plain, Size: 4328 bytes --]


On 27/04/15 21:13, Jeff Law wrote:
> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>> On 21/04/15 15:09, Jeff Law wrote:
>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>    From reading config/stormy16/stormy-abi it seems to me that we don't
>>>> pass arguments partially in stormy16, so this code would never be called
>>>> there. That leaves pa as the potential problematic target.
>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>> binutils
>>>> doesn't seem to include a sim target for it.
>>> No simulator, no machines in the testfarm, the box I had access to via
>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>> bootstrap could complete.  I often regret knowing about the backwards
>>> way many things were done on the PA because it makes me think about
>>> cases that only matter on dead architectures.
>> So what should be the action plan here? I can't add an assert on
>> positive result as a negative result is valid.
>>
>> We want to catch the case where this would cause trouble on
>> pa, or change the patch until we're confident that it's fine
>> for pa.
>>
>> That being said, reading the documentation of STACK_GROWS_UPWARD
>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>> where this would cause trouble on pa.
>>
>> Is the problem that in the function:
>>
>> +/* Add SIZE to X and check whether it's greater than Y.
>> +   If it is, return the constant amount by which it's greater or smaller.
>> +   If the two are not statically comparable (for example, X and Y contain
>> +   different registers) return -1.  This is used in expand_push_insn to
>> +   figure out if reading SIZE bytes from location X will end up reading
>> from
>> +   location Y.  */
>> +static int
>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> +{
>> +  rtx tmp = plus_constant (Pmode, x, size);
>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>> +
>> +  if (!CONST_INT_P (sub))
>> +    return -1;
>> +
>> +  return INTVAL (sub);
>> +}
>>
>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>> so the function should something like the following?
> So I had to go back and compile some simple examples.
>
> References to outgoing arguments will be SP relative.  References to the
> incoming arguments will be ARGP relative.  And that brings me to the
> another issue.  Isn't X in this context the incoming argument slot and
> the destination an outgoing argument slot?
>
> If so, the approach of memory_load_overlap simply won't work on a target
> with calling conventions like the PA.  And you might really want to
> consider punting for these kind of calling conventions

Ok, thanks for the guidance.
How about this? This patch disables sibcall optimisation when
encountering a partial argument when ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD.
Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have
partial arguments anyway on PA due to the large number of argument regs.

I tested this on arm and bootstrapped on x86_64.
I am now going through the process of getting access to a Debian PA machine to
give it a test there (thanks Dave!)

Ok if testing comes clean?

Thanks,
Kyrill

2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * calls.c (expand_call): Cancel sibcall optimisation when encountering
     partial argument on targets with ARGS_GROW_DOWNWARD and
     !STACK_GROWS_DOWNWARD.
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.

>
> If you hadn't already done the work, I'd suggest punting for any case
> where we have args partially in regs and partially in memory :-)
>
> More thoughts when I can get an hour or two to remind myself how all
> this stuff works on the PA.
>
> I will note that testing on the PA is unlikely to show anything simply
> because it uses 8 parameter passing registers.  So it's rare to pass
> anything in memory at all.  Even rarer to have something partially in
> memory and partially in registers.
>
>
>
> Jeff
>
>


[-- Attachment #2: expr.patch --]
[-- Type: text/x-patch, Size: 5157 bytes --]

commit 61beb521f63db2f7178e2c41e3747982ddcc8869
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index 3be7ca5..8159232 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3229,6 +3229,13 @@ expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
+	      sibcall_failure = 1;
+	      break;
+#endif
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
diff --git a/gcc/expr.c b/gcc/expr.c
index 530a944..071a335 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,24 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   in all other cases.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
@@ -4179,6 +4197,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4331,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else
+		overlapping = 0;
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4411,9 +4462,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4471,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-28 10:19               ` Kyrill Tkachov
@ 2015-04-30 12:09                 ` Kyrill Tkachov
  2015-05-01 18:51                 ` Jeff Law
  1 sibling, 0 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-04-30 12:09 UTC (permalink / raw)
  To: Kyrill Tkachov, Jeff Law, GCC Patches; +Cc: Honggyu Kim


On 28/04/15 10:54, Kyrill Tkachov wrote:
> On 27/04/15 21:13, Jeff Law wrote:
>> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>     From reading config/stormy16/stormy-abi it seems to me that we don't
>>>>> pass arguments partially in stormy16, so this code would never be called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or smaller.
>>> +   If the two are not statically comparable (for example, X and Y contain
>>> +   different registers) return -1.  This is used in expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up reading
>>> from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>> So I had to go back and compile some simple examples.
>>
>> References to outgoing arguments will be SP relative.  References to the
>> incoming arguments will be ARGP relative.  And that brings me to the
>> another issue.  Isn't X in this context the incoming argument slot and
>> the destination an outgoing argument slot?
>>
>> If so, the approach of memory_load_overlap simply won't work on a target
>> with calling conventions like the PA.  And you might really want to
>> consider punting for these kind of calling conventions
> Ok, thanks for the guidance.
> How about this? This patch disables sibcall optimisation when
> encountering a partial argument when ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD.
> Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have
> partial arguments anyway on PA due to the large number of argument regs.
>
> I tested this on arm and bootstrapped on x86_64.
> I am now going through the process of getting access to a Debian PA machine to
> give it a test there (thanks Dave!)
>
> Ok if testing comes clean?
Hi Jeff,

So I got access to an hppa machine.
But as mentioned here https://gcc.gnu.org/ml/gcc/2015-04/msg00364.html
I don't think I can bootstrap and run a 64-bit pa testsuite.
I've verified that the compiler builds, but is there anything more I can
do in testing this patch?
The latest version is at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01713.html
and disables sibcall optimisation on a target like pa when a partial argument is encountered.

Thanks,
Kyrill


>
> Thanks,
> Kyrill
>
> 2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/65358
>       * calls.c (expand_call): Cancel sibcall optimisation when encountering
>       partial argument on targets with ARGS_GROW_DOWNWARD and
>       !STACK_GROWS_DOWNWARD.
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.
>
> 2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>       PR target/65358
>       * gcc.dg/pr65358.c: New test.
>
>> If you hadn't already done the work, I'd suggest punting for any case
>> where we have args partially in regs and partially in memory :-)
>>
>> More thoughts when I can get an hour or two to remind myself how all
>> this stuff works on the PA.
>>
>> I will note that testing on the PA is unlikely to show anything simply
>> because it uses 8 parameter passing registers.  So it's rare to pass
>> anything in memory at all.  Even rarer to have something partially in
>> memory and partially in registers.
>>
>>
>>
>> Jeff
>>
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-28 10:19               ` Kyrill Tkachov
  2015-04-30 12:09                 ` Kyrill Tkachov
@ 2015-05-01 18:51                 ` Jeff Law
  2015-05-11  9:28                   ` Kyrill Tkachov
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-05-01 18:51 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 04/28/2015 03:54 AM, Kyrill Tkachov wrote:
>
> On 27/04/15 21:13, Jeff Law wrote:
>> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>    From reading config/stormy16/stormy-abi it seems to me that we
>>>>> don't
>>>>> pass arguments partially in stormy16, so this code would never be
>>>>> called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or
>>> smaller.
>>> +   If the two are not statically comparable (for example, X and Y
>>> contain
>>> +   different registers) return -1.  This is used in expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up reading
>>> from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>> So I had to go back and compile some simple examples.
>>
>> References to outgoing arguments will be SP relative.  References to the
>> incoming arguments will be ARGP relative.  And that brings me to the
>> another issue.  Isn't X in this context the incoming argument slot and
>> the destination an outgoing argument slot?
>>
>> If so, the approach of memory_load_overlap simply won't work on a target
>> with calling conventions like the PA.  And you might really want to
>> consider punting for these kind of calling conventions
>
> Ok, thanks for the guidance.
> How about this? This patch disables sibcall optimisation when
> encountering a partial argument when ARGS_GROW_DOWNWARD &&
> !STACK_GROWS_DOWNWARD.
> Hopefully this shouldn't harm codegen on parisc if, as you say, it's
> rare to have
> partial arguments anyway on PA due to the large number of argument regs.
>
> I tested this on arm and bootstrapped on x86_64.
> I am now going through the process of getting access to a Debian PA
> machine to
> give it a test there (thanks Dave!)
>
> Ok if testing comes clean?
>
> Thanks,
> Kyrill
>
> 2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      PR target/65358
>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>      partial argument on targets with ARGS_GROW_DOWNWARD and
>      !STACK_GROWS_DOWNWARD.
>      * expr.c (memory_load_overlap): New function.
>      (emit_push_insn): When pushing partial args to the stack would
>      clobber the register part load the overlapping part into a pseudo
>      and put it into the hard reg after pushing.
>
> 2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>      PR target/65358
>      * gcc.dg/pr65358.c: New test.
The more I think about this, the more I think it's an ugly can of worms 
and maybe we should just disable sibcalls for partial arguments.  I 
doubt it's a big performance issue in general.

In addition to the argument/stack direction stuff, I've been pondering 
the stack/frame/arg pointer issues.  Your approach assumes that the 
incoming and outgoing areas are always referenced off the same base 
register.  If they aren't, then the routine returns no overlap.

But we'd need to consider the case where we have a reference to the arg 
or frame pointer which later gets rewritten into a stack pointer 
relative address.

Is it too late at the point were you do the checks to reject the sibling 
call?  If not, then maybe the overlap routine should return a tri-state. 
  No overlap, overlap, don't know.  The last would be used when the two 
addresses use a different register.

Jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-04-27 13:16                 ` John David Anglin
@ 2015-05-06 18:57                   ` John David Anglin
  0 siblings, 0 replies; 22+ messages in thread
From: John David Anglin @ 2015-05-06 18:57 UTC (permalink / raw)
  To: Kyrill Tkachov, Jeff Law, GCC Patches; +Cc: Honggyu Kim

On 2015-04-27 9:16 AM, John David Anglin wrote:
>> Hi Dave,
>>
>> Would it be possible for you to test this patch on a 64-bit hppa
>> or at least bootstrap it?
>> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html
> I started a build and test with your patch on hppa64-hp-hpux11.11 this 
> morning.

The patch didn't cause any regressions on hppa2.0w-hp-hpux11.11 and 
hppa64-hp-hpux11.11.

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-05-01 18:51                 ` Jeff Law
@ 2015-05-11  9:28                   ` Kyrill Tkachov
  2015-05-12 22:12                     ` Jeff Law
  0 siblings, 1 reply; 22+ messages in thread
From: Kyrill Tkachov @ 2015-05-11  9:28 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim

[-- Attachment #1: Type: text/plain, Size: 6964 bytes --]


On 01/05/15 19:51, Jeff Law wrote:
> On 04/28/2015 03:54 AM, Kyrill Tkachov wrote:
>>
>> On 27/04/15 21:13, Jeff Law wrote:
>>> On 04/21/2015 11:33 AM, Kyrill Tkachov wrote:
>>>> On 21/04/15 15:09, Jeff Law wrote:
>>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>>    From reading config/stormy16/stormy-abi it seems to me that we
>>>>>> don't
>>>>>> pass arguments partially in stormy16, so this code would never be
>>>>>> called
>>>>>> there. That leaves pa as the potential problematic target.
>>>>>> I don't suppose there's an easy way to test on pa? My checkout of
>>>>>> binutils
>>>>>> doesn't seem to include a sim target for it.
>>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>>> way many things were done on the PA because it makes me think about
>>>>> cases that only matter on dead architectures.
>>>> So what should be the action plan here? I can't add an assert on
>>>> positive result as a negative result is valid.
>>>>
>>>> We want to catch the case where this would cause trouble on
>>>> pa, or change the patch until we're confident that it's fine
>>>> for pa.
>>>>
>>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>>> where this would cause trouble on pa.
>>>>
>>>> Is the problem that in the function:
>>>>
>>>> +/* Add SIZE to X and check whether it's greater than Y.
>>>> +   If it is, return the constant amount by which it's greater or
>>>> smaller.
>>>> +   If the two are not statically comparable (for example, X and Y
>>>> contain
>>>> +   different registers) return -1.  This is used in expand_push_insn to
>>>> +   figure out if reading SIZE bytes from location X will end up reading
>>>> from
>>>> +   location Y.  */
>>>> +static int
>>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>>> +{
>>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>>> +
>>>> +  if (!CONST_INT_P (sub))
>>>> +    return -1;
>>>> +
>>>> +  return INTVAL (sub);
>>>> +}
>>>>
>>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>>> so the function should something like the following?
>>> So I had to go back and compile some simple examples.
>>>
>>> References to outgoing arguments will be SP relative. References to the
>>> incoming arguments will be ARGP relative.  And that brings me to the
>>> another issue.  Isn't X in this context the incoming argument slot and
>>> the destination an outgoing argument slot?
>>>
>>> If so, the approach of memory_load_overlap simply won't work on a target
>>> with calling conventions like the PA.  And you might really want to
>>> consider punting for these kind of calling conventions
>>
>> Ok, thanks for the guidance.
>> How about this? This patch disables sibcall optimisation when
>> encountering a partial argument when ARGS_GROW_DOWNWARD &&
>> !STACK_GROWS_DOWNWARD.
>> Hopefully this shouldn't harm codegen on parisc if, as you say, it's
>> rare to have
>> partial arguments anyway on PA due to the large number of argument regs.
>>
>> I tested this on arm and bootstrapped on x86_64.
>> I am now going through the process of getting access to a Debian PA
>> machine to
>> give it a test there (thanks Dave!)
>>
>> Ok if testing comes clean?
>>
>> Thanks,
>> Kyrill
>>
>> 2015-04-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR target/65358
>>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>>      partial argument on targets with ARGS_GROW_DOWNWARD and
>>      !STACK_GROWS_DOWNWARD.
>>      * expr.c (memory_load_overlap): New function.
>>      (emit_push_insn): When pushing partial args to the stack would
>>      clobber the register part load the overlapping part into a pseudo
>>      and put it into the hard reg after pushing.
>>
>> 2015-04-28  Honggyu Kim  <hong.gyu.kim@lge.com>
>>
>>      PR target/65358
>>      * gcc.dg/pr65358.c: New test.
> The more I think about this, the more I think it's an ugly can of worms and maybe we should just disable sibcalls for partial arguments.  I doubt it's a big performance issue in general.

We already have quite a bit of code in calls.c to detect cases with partial argument overlap for the
explicit purpose of allowing sibcalls when partial arguments occur in the general case. However, that
code only detects when a partial argument overlaps with other arguments in a call. In this PR the
partial argument overlaps with itself. It would be a shame to disable sibcalls for all partial arguments
when there is already infrastructure in place to handle them.

>
>
> In addition to the argument/stack direction stuff, I've been pondering the stack/frame/arg pointer issues.  Your approach assumes that the incoming and outgoing areas are always referenced off the same base register.  If they aren't, then 
> the routine returns no overlap.
>
> But we'd need to consider the case where we have a reference to the arg or frame pointer which later gets rewritten into a stack pointer relative address.
>
> Is it too late at the point were you do the checks to reject the sibling call?  If not, then maybe the overlap routine should return a tri-state.  No overlap, overlap, don't know.  The last would be used when the two addresses use a 
> different register.

Ok, here is my attempt at that. The overlap functions returns -2 when it cannot staticall compare the
two pointers (i.e. when the base registers are different) and the caller then disables sibcalls.
The code in calls.c that calls this code will undo any emitted instructions in the meantime if sibcall
optimisation fails. This required me to change the type of emit_push_insn to bool and add an extra
parameter, so this patch touches a bit more code than the original version.

Bootstrapped on x86_64 and tested on arm. The testcase in this PR still performs a sibcall correctly on arm.

What do you think of this?

Thanks,
Kyrill


2015-05-11  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.  Change return type
     to bool.  Add bool argument.
     * expr.h (emit_push_insn): Change return type to bool.
     Add bool argument.
     * calls.c (expand_call): Cancel sibcall optimisation when encountering
     partial argument on targets with ARGS_GROW_DOWNWARD and
     !STACK_GROWS_DOWNWARD.
     (emit_library_call_value_1): Update callsite of emit_push_insn.
     (store_one_arg): Likewise.


2015-05-11  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.

>
> Jeff
>
>


[-- Attachment #2: expr.patch --]
[-- Type: text/x-patch, Size: 9529 bytes --]

commit 5b596f10846b6d3b143442a306801c8262d8b10a
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index caa7d60..81ef2c9 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3225,6 +3225,13 @@ expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
+	      sibcall_failure = 1;
+	      break;
+#endif
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
@@ -4270,7 +4277,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 			  partial, reg, 0, argblock,
 			  GEN_INT (argvec[argnum].locate.offset.constant),
 			  reg_parm_stack_space,
-			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad));
+			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad), false);
 
 	  /* Now mark the segment we just used.  */
 	  if (ACCUMULATE_OUTGOING_ARGS)
@@ -4880,10 +4887,11 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
 
       /* This isn't already where we want it on the stack, so put it there.
 	 This can either be done with push or copy insns.  */
-      emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
+      if (!emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
 		      parm_align, partial, reg, used - size, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), true))
+	sibcall_failure = 1;
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.  */
@@ -4988,7 +4996,7 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
       emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), size_rtx,
 		      parm_align, partial, reg, excess, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), false);
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.
diff --git a/gcc/expr.c b/gcc/expr.c
index 25aa11f..712fa0b 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,12 +4121,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   if there is no overlap or -2 if we can't determing
+   (for example when X and Y have different base registers).  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -2;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
    carry mode info).
    SIZE is an rtx for the size of data to be copied (in bytes),
    needed only if X is BLKmode.
+   Return true if successful.  May return false if asked to push a
+   partial argument during a sibcall optimisation (as specified by
+   SIBCALL_P) and the incoming and outgoing pointers cannot be shown
+   to not overlap.
 
    ALIGN (in bits) is maximum alignment we can assume.
 
@@ -4152,11 +4175,11 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
    for arguments passed in registers.  If nonzero, it will be the number
    of bytes required.  */
 
-void
+bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 		unsigned int align, int partial, rtx reg, int extra,
 		rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
-		rtx alignment_pad)
+		rtx alignment_pad, bool sibcall_p)
 {
   rtx xinner;
   enum direction stack_direction
@@ -4179,6 +4202,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4336,43 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else if (overlapping == -1)
+		overlapping = 0;
+	      /* Could not determine whether there is overlap.
+	         Fail the sibcall.  */
+	      else
+		{
+		  overlapping = 0;
+		  if (sibcall_p)
+		    return false;
+		}
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4363,12 +4427,13 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	 has a size a multiple of a word.  */
       for (i = size - 1; i >= not_stack; i--)
 	if (i >= not_stack + offset)
-	  emit_push_insn (operand_subword_force (x, i, mode),
+	  if (!emit_push_insn (operand_subword_force (x, i, mode),
 			  word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,
 			  0, args_addr,
 			  GEN_INT (args_offset + ((i - not_stack + skip)
 						  * UNITS_PER_WORD)),
-			  reg_parm_stack_space, alignment_pad);
+			  reg_parm_stack_space, alignment_pad, sibcall_p))
+	    return false;
     }
   else
     {
@@ -4411,9 +4476,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4485,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
@@ -4432,6 +4502,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   if (alignment_pad && args_addr == 0)
     anti_adjust_stack (alignment_pad);
+
+  return true;
 }
 \f
 /* Return X if X can be used as a subtarget in a sequence of arithmetic
diff --git a/gcc/expr.h b/gcc/expr.h
index 867852e..5fcc13f 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -218,8 +218,8 @@ extern rtx emit_move_resolve_push (machine_mode, rtx);
 extern rtx push_block (rtx, int, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
-extern void emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-			    int, rtx, int, rtx, rtx, int, rtx);
+extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
+			    int, rtx, int, rtx, rtx, int, rtx, bool);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-05-11  9:28                   ` Kyrill Tkachov
@ 2015-05-12 22:12                     ` Jeff Law
  2015-05-27 14:00                       ` Kyrill Tkachov
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2015-05-12 22:12 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Honggyu Kim

On 05/11/2015 03:28 AM, Kyrill Tkachov wrote:
>> The more I think about this, the more I think it's an ugly can of
>> worms and maybe we should just disable sibcalls for partial
>> arguments.  I doubt it's a big performance issue in general.
>
> We already have quite a bit of code in calls.c to detect cases with
> partial argument overlap for the
> explicit purpose of allowing sibcalls when partial arguments occur in
> the general case. However, that
> code only detects when a partial argument overlaps with other arguments
> in a call. In this PR the
> partial argument overlaps with itself. It would be a shame to disable
> sibcalls for all partial arguments
> when there is already infrastructure in place to handle them.
I didn't even realize we had support for partial arguments in sibcalls. 
   Ah, Kazu added that in 2005, I totally missed it.  I probably would 
have suggested failing the sibcall for those cases back then too...

Is there any way to re-use that infrastructure to deal with the case at 
hand?



>
>>
>>
>> In addition to the argument/stack direction stuff, I've been pondering
>> the stack/frame/arg pointer issues.  Your approach assumes that the
>> incoming and outgoing areas are always referenced off the same base
>> register.  If they aren't, then the routine returns no overlap.
>>
>> But we'd need to consider the case where we have a reference to the
>> arg or frame pointer which later gets rewritten into a stack pointer
>> relative address.
>>
>> Is it too late at the point were you do the checks to reject the
>> sibling call?  If not, then maybe the overlap routine should return a
>> tri-state.  No overlap, overlap, don't know.  The last would be used
>> when the two addresses use a different register.
>
> Ok, here is my attempt at that. The overlap functions returns -2 when it
> cannot staticall compare the
> two pointers (i.e. when the base registers are different) and the caller
> then disables sibcalls.
> The code in calls.c that calls this code will undo any emitted
> instructions in the meantime if sibcall
> optimisation fails. This required me to change the type of
> emit_push_insn to bool and add an extra
> parameter, so this patch touches a bit more code than the original version.
>
> Bootstrapped on x86_64 and tested on arm. The testcase in this PR still
> performs a sibcall correctly on arm.
>
> What do you think of this?
>
> Thanks,
> Kyrill
>
>
> 2015-05-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>      PR target/65358
>      * expr.c (memory_load_overlap): New function.
>      (emit_push_insn): When pushing partial args to the stack would
>      clobber the register part load the overlapping part into a pseudo
>      and put it into the hard reg after pushing.  Change return type
>      to bool.  Add bool argument.
>      * expr.h (emit_push_insn): Change return type to bool.
>      Add bool argument.
>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>      partial argument on targets with ARGS_GROW_DOWNWARD and
>      !STACK_GROWS_DOWNWARD.
>      (emit_library_call_value_1): Update callsite of emit_push_insn.
>      (store_one_arg): Likewise.
>
>
> 2015-05-11  Honggyu Kim <hong.gyu.kim@lge.com>
>
>      PR target/65358
>      * gcc.dg/pr65358.c: New test.
>
>>
>> Jeff
>>
>>
>
>
> expr.patch
>
>
> commit 5b596f10846b6d3b143442a306801c8262d8b10a
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Wed Mar 18 13:42:37 2015 +0000
>
>      [expr.c] PR 65358 Avoid clobbering partial argument during sibcall
>
> diff --git a/gcc/calls.c b/gcc/calls.c
> index caa7d60..81ef2c9 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -3225,6 +3225,13 @@ expand_call (tree exp, rtx target, int ignore)
>   	    {
>   	      rtx_insn *before_arg = get_last_insn ();
>
> +	     /* On targets with weird calling conventions (e.g. PA) it's
> +		hard to ensure that all cases of argument overlap between
> +		stack and registers work.  Play it safe and bail out.  */
> +#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
> +	      sibcall_failure = 1;
> +	      break;
> +#endif
So we're trying to get away from this kind of conditional compilation.

Instead we want to write

if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)

ARGS_GROW_DOWNWARD is already a testable value.  But 
STACK_GROWS_DOWNWARD is not.  The way folks have been dealing with this 
is something like this after the #includes:

/* Redefine STACK_GROWS_DOWNWARD in terms of 0 or 1.  */
#ifdef STACK_GROWS_DOWNWARD
# undef STACK_GROWS_DOWNWARD
# define STACK_GROWS_DOWNWARD 1
#else
# define STACK_GROWS_DOWNWARD 0
#endif


With that in place you can change the test into the more desirable
if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)



> diff --git a/gcc/expr.c b/gcc/expr.c
> index 25aa11f..712fa0b 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -4121,12 +4121,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
>   }
>   #endif
>
> +/* If reading SIZE bytes from X will end up reading from
> +   Y return the number of bytes that overlap.  Return -1
> +   if there is no overlap or -2 if we can't determing
s/determing/determine/

> +   partial argument during a sibcall optimisation (as specified by
s/optimisation/optimization/


I guess I'm still not real comfortable that we've got the issues here 
resolved and if we're going to try and support this case, then re-using 
the existing infrastructure would be better.

I'll approve with the changes noted above, but would ask that you look 
into trying to re-use the existing infrastructure as a follow-up.

Jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
  2015-05-12 22:12                     ` Jeff Law
@ 2015-05-27 14:00                       ` Kyrill Tkachov
  0 siblings, 0 replies; 22+ messages in thread
From: Kyrill Tkachov @ 2015-05-27 14:00 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Honggyu Kim

[-- Attachment #1: Type: text/plain, Size: 5996 bytes --]

Hi Jeff,

On 12/05/15 23:04, Jeff Law wrote:
> On 05/11/2015 03:28 AM, Kyrill Tkachov wrote:
>>> The more I think about this, the more I think it's an ugly can of
>>> worms and maybe we should just disable sibcalls for partial
>>> arguments.  I doubt it's a big performance issue in general.
>>
>> We already have quite a bit of code in calls.c to detect cases with
>> partial argument overlap for the
>> explicit purpose of allowing sibcalls when partial arguments occur in
>> the general case. However, that
>> code only detects when a partial argument overlaps with other arguments
>> in a call. In this PR the
>> partial argument overlaps with itself. It would be a shame to disable
>> sibcalls for all partial arguments
>> when there is already infrastructure in place to handle them.
> I didn't even realize we had support for partial arguments in sibcalls.   Ah, Kazu added that in 2005, I totally missed it.  I probably would have suggested failing the sibcall for those cases back then too...
>
> Is there any way to re-use that infrastructure to deal with the case at hand?
>
>
>
>>
>>>
>>>
>>> In addition to the argument/stack direction stuff, I've been pondering
>>> the stack/frame/arg pointer issues.  Your approach assumes that the
>>> incoming and outgoing areas are always referenced off the same base
>>> register.  If they aren't, then the routine returns no overlap.
>>>
>>> But we'd need to consider the case where we have a reference to the
>>> arg or frame pointer which later gets rewritten into a stack pointer
>>> relative address.
>>>
>>> Is it too late at the point were you do the checks to reject the
>>> sibling call?  If not, then maybe the overlap routine should return a
>>> tri-state.  No overlap, overlap, don't know.  The last would be used
>>> when the two addresses use a different register.
>>
>> Ok, here is my attempt at that. The overlap functions returns -2 when it
>> cannot staticall compare the
>> two pointers (i.e. when the base registers are different) and the caller
>> then disables sibcalls.
>> The code in calls.c that calls this code will undo any emitted
>> instructions in the meantime if sibcall
>> optimisation fails. This required me to change the type of
>> emit_push_insn to bool and add an extra
>> parameter, so this patch touches a bit more code than the original version.
>>
>> Bootstrapped on x86_64 and tested on arm. The testcase in this PR still
>> performs a sibcall correctly on arm.
>>
>> What do you think of this?
>>
>> Thanks,
>> Kyrill
>>
>>
>> 2015-05-11  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>      PR target/65358
>>      * expr.c (memory_load_overlap): New function.
>>      (emit_push_insn): When pushing partial args to the stack would
>>      clobber the register part load the overlapping part into a pseudo
>>      and put it into the hard reg after pushing.  Change return type
>>      to bool.  Add bool argument.
>>      * expr.h (emit_push_insn): Change return type to bool.
>>      Add bool argument.
>>      * calls.c (expand_call): Cancel sibcall optimisation when encountering
>>      partial argument on targets with ARGS_GROW_DOWNWARD and
>>      !STACK_GROWS_DOWNWARD.
>>      (emit_library_call_value_1): Update callsite of emit_push_insn.
>>      (store_one_arg): Likewise.
>>
>>
>> 2015-05-11  Honggyu Kim <hong.gyu.kim@lge.com>
>>
>>      PR target/65358
>>      * gcc.dg/pr65358.c: New test.
>>
>>>
>>> Jeff
>>>
>>>
>>
>>
>> expr.patch
>>
>>
>> commit 5b596f10846b6d3b143442a306801c8262d8b10a
>> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
>> Date:   Wed Mar 18 13:42:37 2015 +0000
>>
>>      [expr.c] PR 65358 Avoid clobbering partial argument during sibcall
>>
>> diff --git a/gcc/calls.c b/gcc/calls.c
>> index caa7d60..81ef2c9 100644
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -3225,6 +3225,13 @@ expand_call (tree exp, rtx target, int ignore)
>>           {
>>             rtx_insn *before_arg = get_last_insn ();
>>
>> +         /* On targets with weird calling conventions (e.g. PA) it's
>> +        hard to ensure that all cases of argument overlap between
>> +        stack and registers work.  Play it safe and bail out. */
>> +#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD)
>> +          sibcall_failure = 1;
>> +          break;
>> +#endif
> So we're trying to get away from this kind of conditional compilation.
>
> Instead we want to write
>
> if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
>
> ARGS_GROW_DOWNWARD is already a testable value.  But STACK_GROWS_DOWNWARD is not.  The way folks have been dealing with this is something like this after the #includes:
>
> /* Redefine STACK_GROWS_DOWNWARD in terms of 0 or 1.  */
> #ifdef STACK_GROWS_DOWNWARD
> # undef STACK_GROWS_DOWNWARD
> # define STACK_GROWS_DOWNWARD 1
> #else
> # define STACK_GROWS_DOWNWARD 0
> #endif
>
>
> With that in place you can change the test into the more desirable
> if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
>
>
>
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index 25aa11f..712fa0b 100644
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -4121,12 +4121,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
>>   }
>>   #endif
>>
>> +/* If reading SIZE bytes from X will end up reading from
>> +   Y return the number of bytes that overlap.  Return -1
>> +   if there is no overlap or -2 if we can't determing
> s/determing/determine/
>
>> +   partial argument during a sibcall optimisation (as specified by
> s/optimisation/optimization/
>
>
> I guess I'm still not real comfortable that we've got the issues here resolved and if we're going to try and support this case, then re-using the existing infrastructure would be better.
>
> I'll approve with the changes noted above, but would ask that you look into trying to re-use the existing infrastructure as a follow-up.

Thanks, I'm attaching what I committed with r223753.
I've raised PR 66307 for the infrastructure work.

Thanks again for helping beat this into shape.

Kyrill

>
> Jeff
>
>


[-- Attachment #2: expr-2.patch --]
[-- Type: text/x-patch, Size: 9559 bytes --]

commit 462eda51471ef6cb0ffcf4d45c8aa464c3d750e0
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/calls.c b/gcc/calls.c
index afe61f4..2158eba 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3234,6 +3234,15 @@ expand_call (tree exp, rtx target, int ignore)
 	    {
 	      rtx_insn *before_arg = get_last_insn ();
 
+	     /* On targets with weird calling conventions (e.g. PA) it's
+		hard to ensure that all cases of argument overlap between
+		stack and registers work.  Play it safe and bail out.  */
+	      if (ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD)
+		{
+		  sibcall_failure = 1;
+		  break;
+		}
+
 	      if (store_one_arg (&args[i], argblock, flags,
 				 adjusted_args_size.var != 0,
 				 reg_parm_stack_space)
@@ -4276,7 +4285,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 			  partial, reg, 0, argblock,
 			  GEN_INT (argvec[argnum].locate.offset.constant),
 			  reg_parm_stack_space,
-			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad));
+			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad), false);
 
 	  /* Now mark the segment we just used.  */
 	  if (ACCUMULATE_OUTGOING_ARGS)
@@ -4886,10 +4895,11 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
 
       /* This isn't already where we want it on the stack, so put it there.
 	 This can either be done with push or copy insns.  */
-      emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
+      if (!emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), NULL_RTX,
 		      parm_align, partial, reg, used - size, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), true))
+	sibcall_failure = 1;
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.  */
@@ -4994,7 +5004,7 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
       emit_push_insn (arg->value, arg->mode, TREE_TYPE (pval), size_rtx,
 		      parm_align, partial, reg, excess, argblock,
 		      ARGS_SIZE_RTX (arg->locate.offset), reg_parm_stack_space,
-		      ARGS_SIZE_RTX (arg->locate.alignment_pad));
+		      ARGS_SIZE_RTX (arg->locate.alignment_pad), false);
 
       /* Unless this is a partially-in-register argument, the argument is now
 	 in the stack.
diff --git a/gcc/expr.c b/gcc/expr.c
index a613beb..1dd1cf3 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4104,12 +4104,35 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   if there is no overlap or -2 if we can't determine
+   (for example when X and Y have different base registers).  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x, size);
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -2;
+
+  HOST_WIDE_INT val = INTVAL (sub);
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
    carry mode info).
    SIZE is an rtx for the size of data to be copied (in bytes),
    needed only if X is BLKmode.
+   Return true if successful.  May return false if asked to push a
+   partial argument during a sibcall optimization (as specified by
+   SIBCALL_P) and the incoming and outgoing pointers cannot be shown
+   to not overlap.
 
    ALIGN (in bits) is maximum alignment we can assume.
 
@@ -4135,11 +4158,11 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type)
    for arguments passed in registers.  If nonzero, it will be the number
    of bytes required.  */
 
-void
+bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 		unsigned int align, int partial, rtx reg, int extra,
 		rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
-		rtx alignment_pad)
+		rtx alignment_pad, bool sibcall_p)
 {
   rtx xinner;
   enum direction stack_direction = STACK_GROWS_DOWNWARD ? downward : upward;
@@ -4157,6 +4180,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4287,6 +4314,43 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else if (overlapping == -1)
+		overlapping = 0;
+	      /* Could not determine whether there is overlap.
+	         Fail the sibcall.  */
+	      else
+		{
+		  overlapping = 0;
+		  if (sibcall_p)
+		    return false;
+		}
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4341,12 +4405,13 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	 has a size a multiple of a word.  */
       for (i = size - 1; i >= not_stack; i--)
 	if (i >= not_stack + offset)
-	  emit_push_insn (operand_subword_force (x, i, mode),
+	  if (!emit_push_insn (operand_subword_force (x, i, mode),
 			  word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,
 			  0, args_addr,
 			  GEN_INT (args_offset + ((i - not_stack + skip)
 						  * UNITS_PER_WORD)),
-			  reg_parm_stack_space, alignment_pad);
+			  reg_parm_stack_space, alignment_pad, sibcall_p))
+	    return false;
     }
   else
     {
@@ -4389,9 +4454,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4399,9 +4463,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
@@ -4410,6 +4480,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   if (alignment_pad && args_addr == 0)
     anti_adjust_stack (alignment_pad);
+
+  return true;
 }
 \f
 /* Return X if X can be used as a subtarget in a sequence of arithmetic
diff --git a/gcc/expr.h b/gcc/expr.h
index e3afa8d..7b28ffd 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -219,8 +219,8 @@ extern rtx emit_move_resolve_push (machine_mode, rtx);
 extern rtx push_block (rtx, int, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
-extern void emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-			    int, rtx, int, rtx, rtx, int, rtx);
+extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
+			    int, rtx, int, rtx, rtx, int, rtx, bool);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-05-27 13:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19 14:39 [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall Kyrill Tkachov
2015-03-27 10:06 ` Kyrill Tkachov
2015-03-29 11:29 ` Honggyu Kim
2015-04-13 14:01 ` Kyrill Tkachov
2015-04-13 16:33   ` Jeff Law
2015-04-17 17:26 ` Jeff Law
2015-04-20  8:25   ` Kyrill Tkachov
2015-04-20 18:02     ` Jeff Law
2015-04-21  8:30       ` Kyrill Tkachov
2015-04-21 14:09         ` Jeff Law
2015-04-21 17:33           ` Kyrill Tkachov
2015-04-22 11:51             ` Kyrill Tkachov
2015-04-27 10:12               ` Kyrill Tkachov
2015-04-27 13:16                 ` John David Anglin
2015-05-06 18:57                   ` John David Anglin
2015-04-27 20:13             ` Jeff Law
2015-04-28 10:19               ` Kyrill Tkachov
2015-04-30 12:09                 ` Kyrill Tkachov
2015-05-01 18:51                 ` Jeff Law
2015-05-11  9:28                   ` Kyrill Tkachov
2015-05-12 22:12                     ` Jeff Law
2015-05-27 14:00                       ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).