public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* rs6000 stack_tie mishap again
@ 2016-03-23 16:24 Olivier Hainque
  2016-03-24  7:51 ` Alan Modra
  2016-03-28  4:48 ` Segher Boessenkool
  0 siblings, 2 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-03-23 16:24 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 6405 bytes --]

Hello,

This is a proposal to address what I think has been a long-standing, very
nasty latent code generation problem, which just manifested in-house on a
proprietary testcase.

This is intricate, so long email ...

Visible misbehavior
====================

The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
epilogue sequence like

   addi %r11,%r1,184    # temp pointer within frame

   addi %r1,%r11,104    # release frame

   evldd %r21,16(%r11)  # restore registers
   ...                  # ...
   evldd %r31,96(%r11)  # ...

   blr                  # return

This is causing troubles in situations where interrupt code somehow quicks in
between the frame release and the blr, clobbering the stack slots before the
register restore insn have run.

Triggering conditions
=====================

We are witnessing a situation where the rs6000 stack_tie mechanism is
ineffective, out of a missed dependency between the stack_tie insn and some
register restores.

We have observed this with a gcc 4.7 back-end and weren't able to reproduce
with a more recent version. I believe, however, that the problem is still
latent, just requiring extremely precise conditions to trigger the problematic
scheduling decision.

At the RTL level, with our 4.7 based compiler, the chain of events is as
follows:

Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to:

------------ p.adb.216r.split4:

(insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650])
        (high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high}
...

(note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG)
...
... sequence of register restores ...

(insn 894 893 895 32 (set (reg:SI 11 %r11)
        (plus:SI (reg/f:SI 1 %r1)
            (const_int 184 [0xb8]))) 
... 
(insn 907 906 908 32 (set (reg:V2SI 31 31)
        (mem:V2SI (plus:SI (reg:SI 11 11)
                (const_int 96 [0x60])) [0 S8 A64])) 
 
... STACK TIE insn, intended to prevent the scheduler from moving ...
... restores past this point, so they remain issued prior to the  ...
... following insn:
 
(insn 908 907 909 32 (parallel [
            (set (mem/c:BLK (reg/f:SI 1 1) [17 A8])
                (const_int 0 [0]))
            (set (mem/c:BLK (reg:SI 11 11) [17 A8])
                (const_int 0 [0]))
        ]) ../p.adb:84:8 681 {stack_tie}
     (nil))
 
... bumping r1 (stack pointer), releasing the stack frame:
 
(insn/f 909 908 910 32 (set (reg/f:SI 1 1)
        (plus:SI (reg:SI 11 11)
            (const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1}

The tie references a mem with alias set 17 (stack accesses). The restores
use alias set 0, which should conflict anyway, so this looks OK.

When sched2 kicks in, we however fail to establish a dependency between
907 (and other restores relative to r11) and the stack_tie.

At the bottom of the chain, we see write_dependence_p claiming absence
of dependence for 907 from

  if (! writep)
    {
      base = find_base_term (mem_addr);
      if (base && (GET_CODE (base) == LABEL_REF
		   || (GET_CODE (base) == SYMBOL_REF
		       && CONSTANT_POOL_ADDRESS_P (base))))
	return 0;
    }


with

(gdb) pr mem_addr
(plus:SI (reg:SI 11 11)
    (const_int 96 [0x60]))
 
and
 
(gdb) pr base
(symbol_ref/u:SI ("*.LC0") [flags 0x82])
 
coming from insn 710, despite 894 in between. Ug.

Further Analysis & patch proposal
=================================

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

      /* Walk the insns adding values to the new_reg_base_value array.  */
      for (i = 0; i < rpo_cnt; i++)
	{ ...
		  if (could_be_prologue_epilogue
		      && prologue_epilogue_contains (insn))
		    continue;

The motivation for this is unclear to me.

My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.

This is the basis of the proposed patch, which essentially disconnects the
shortcut above for !frame_related insns on targets which need precise alias
analysis within epilogues, as indicated by a new target hook.

On the key insn at hand, the frame_related bit was cleared on purpose,
per:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

  "1) Marking an instruction setting up r11 for use by _save64gpr_* as
   frame-related results in r11 being set as the cfa for the rest of the
   function.  That's bad when r11 gets used for other things later.
  "

I have verified that it cures the problem with our gcc-4.7 based toolchain,
and we have been exercising it in gcc-4.9 based compilers on lots of ppc and
e500 configurations without problems for a few weeks now

I'll be happy to perform further testing if required. This is pretty heavy for
me, however, so I'd rather have a discussion and a preliminary agreement on
the approach first.

This kind of issue isn't new. See for example the threads rooted at
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html
or https://gcc.gnu.org/ml/gcc-patches/2012-04/msg00070.html

The misbehavior it produces is a real pain, with arbitrary hazards very hard
to track down. We have been hit more than once already for a gain which is not
so obvious.

So, aside from the dependency issue which needs to be fixed somehow, I
think it would make sense to consider using a strong blockage mecanism in
expand_epilogue.

Thoughts ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

-----


2016-03-23  Olivier Hainque <hainque@adacore.com>

        * target.def (epilogue_aliases): New target hook.
        * function.c (prologue_epilogue_insns): Return bool and rewrite as
        combination of ...
        (prologue_insns, epilogue_insns): New functions.
        * rtl.h: Add prototypes for the new functions and adjust prototype
        of prologue_epilogue_insns for the return type change.
        * alias.c (init_alias_analysis): Honor targetm.epilogue_aliases.
    
        * config/rs6000/rs6000.c (TARGET_EPILOGUE_ALIASES): Redefine, return
        true.
    
        doc/
        * tm.texi.in (TARGET_EPILOGUE_ALIASES): Add new @hook entry.
        * tm.texi: Regenerate.


[-- Attachment #2: ppc-frame.diff --]
[-- Type: application/octet-stream, Size: 6275 bytes --]

commit 9b3a8400aeb43ad9ce0c753dc8fecac9db304da3
Author: Olivier Hainque <hainque@adacore.com>
Date:   Tue Mar 22 18:37:27 2016 +0100

    initial port from 4.9

diff --git a/gcc/alias.c b/gcc/alias.c
index 753e1af..09a709f 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -3241,9 +3241,12 @@ init_alias_analysis (void)
      insn chain until after reload has completed.  Thus,
      there is no sense wasting time checking if INSN is in
      the prologue/epilogue until after reload has completed.  */
-  bool could_be_prologue_epilogue = ((targetm.have_prologue ()
-				      || targetm.have_epilogue ())
-				     && reload_completed);
+
+  bool could_be_prologue = (targetm.have_prologue ()
+			    && reload_completed);
+
+  bool could_be_epilogue = (targetm.have_epilogue ()
+			    && reload_completed);
 
   pass = 0;
   do
@@ -3283,8 +3286,19 @@ init_alias_analysis (void)
 		{
 		  rtx note, set;
 
-		  if (could_be_prologue_epilogue
-		      && prologue_epilogue_contains (insn))
+		  /* Skip prologue instructions threaded onto the insn chain
+		     after reload, which bring nothing of interest to alias
+		     analysis within the function's body.  */
+		  if (could_be_prologue
+		      && prologue_contains (insn))
+		    continue;
+
+		  /* Likewise for epilogue instructions, except for possible
+		     needs within the epilogue sequence itself.  */
+		  if (could_be_epilogue
+		      && epilogue_contains (insn)
+		      && (!targetm.epilogue_aliases ()
+			  || RTX_FRAME_RELATED_P (insn)))
 		    continue;
 
 		  /* If this insn has a noalias note, process it,  Otherwise,
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fd4b7cc..cf5ca42 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1712,6 +1712,11 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
 #endif
 
+/* Our stack_tie mechanism requires visibility on intermediate
+   base register computations in the epilogue.  */
+#undef TARGET_EPILOGUE_ALIASES
+#define  TARGET_EPILOGUE_ALIASES  hook_bool_void_true
+
 /* Use a 32-bit anchor range.  This leads to sequences like:
 
 	addis	tmp,anchor,high
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 745910f..20c17f6 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11289,6 +11289,16 @@ or when the back end is in a partially-initialized state.
 outside of any function scope.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_EPILOGUE_ALIASES (void)
+This target hook should return @code{true} if the target needs
+alias analysis to look into operations performed by epilogue instructions
+*not* marked @code{RTX_FRAME_RELATED_P}.
+This needs only be redefined by targets which use precise memory references
+with specially assigned alias sets instead of a strong blockage to prevent
+the move of register restores past the stack frame release.
+Default return value is @code{false}.
+@end deftypefn
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f31c763..b4b6b1a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8039,6 +8039,8 @@ to by @var{ce_info}.
 
 @hook TARGET_SET_CURRENT_FUNCTION
 
+@hook TARGET_EPILOGUE_ALIASES
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/function.c b/gcc/function.c
index 1ac8e26..f025f21 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5743,14 +5743,28 @@ contains (const_rtx insn, hash_table<insn_cache_hasher> *hash)
   return hash->find (const_cast<rtx> (insn)) != NULL;
 }
 
-int
+/* Whether INSN is part of the function's prologue sequence.  */
+
+bool
+prologue_contains (const_rtx insn)
+{
+  return contains (insn, prologue_insn_hash);
+}
+
+/* Whether INSN is part of the function's epilogue sequence.  */
+
+bool
+epilogue_contains (const_rtx insn)
+{
+  return contains (insn, epilogue_insn_hash);
+}
+
+/* Whether INSN is part of the function's prologue or epilogue sequence.  */
+
+bool
 prologue_epilogue_contains (const_rtx insn)
 {
-  if (contains (insn, prologue_insn_hash))
-    return 1;
-  if (contains (insn, epilogue_insn_hash))
-    return 1;
-  return 0;
+  return prologue_contains (insn) || epilogue_contains (insn);
 }
 
 /* Insert use of return register before the end of BB.  */
diff --git a/gcc/function.h b/gcc/function.h
index 501ef68..81d9e50 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -628,7 +628,9 @@ extern void clobber_return_register (void);
 extern void expand_function_end (void);
 extern rtx get_arg_pointer_save_area (void);
 extern void maybe_copy_prologue_epilogue_insn (rtx, rtx);
-extern int prologue_epilogue_contains (const_rtx);
+extern bool prologue_contains (const_rtx);
+extern bool epilogue_contains (const_rtx);
+extern bool prologue_epilogue_contains (const_rtx);
 extern void emit_return_into_block (bool simple_p, basic_block bb);
 extern void set_return_jump_label (rtx_insn *);
 extern bool active_insn_between (rtx_insn *head, rtx_insn *tail);
diff --git a/gcc/target.def b/gcc/target.def
index 20f2b32..9276858 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3417,6 +3417,19 @@ move would be greater than that of a library call.",
         enum by_pieces_operation op, bool speed_p),
  default_use_by_pieces_infrastructure_p)
 
+/* Return true if a function must have and use a frame pointer.  */
+DEFHOOK
+(epilogue_aliases,
+ "This target hook should return @code{true} if the target needs\n"
+ "alias analysis to look into operations performed by epilogue instructions\n"
+ "*not* marked @code{RTX_FRAME_RELATED_P}.\n"
+ "This needs only be redefined by targets which use precise memory references\n"
+ "with specially assigned alias sets instead of a strong blockage to prevent\n"
+ "the move of register restores past the stack frame release.\n"
+ "Default return value is @code{false}.",
+ bool, (void),
+ hook_bool_void_false)
+
 DEFHOOK
 (optab_supported_p,
  "Return true if the optimizers should use optab @var{op} with\n\

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-23 16:24 rs6000 stack_tie mishap again Olivier Hainque
@ 2016-03-24  7:51 ` Alan Modra
  2016-03-24 10:32   ` Olivier Hainque
  2016-03-28 19:58   ` Richard Henderson
  2016-03-28  4:48 ` Segher Boessenkool
  1 sibling, 2 replies; 27+ messages in thread
From: Alan Modra @ 2016-03-24  7:51 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: GCC Patches, Richard Henderson

On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
> The reason why 894 is not accounted in the base ref computation is because it
> is part of the epilogue sequence, and init_alias_analysis has:
> 
>       /* Walk the insns adding values to the new_reg_base_value array.  */
>       for (i = 0; i < rpo_cnt; i++)
> 	{ ...
> 		  if (could_be_prologue_epilogue
> 		      && prologue_epilogue_contains (insn))
> 		    continue;
> 
> The motivation for this is unclear to me.

https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

> My rough understanding is that we probably really care about frame_related
> insns only here, at least on targets where the flag is supposed to be set
> accurately.

Also, possibly just prologue insns.  So you might be able to modify
init_alias_analysis just to ignore the prologue and skip any need for
yet another hook.

Let's see what rth thinks.  He did say the patch might need to be
redone.  :)
https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

[snip]
> So, aside from the dependency issue which needs to be fixed somehow, I
> think it would make sense to consider using a strong blockage mecanism in
> expand_epilogue.

That's what we both said here
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html

and David agreed too
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html

but if you can have the alias analysis changes accepted that would be
even better.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-24  7:51 ` Alan Modra
@ 2016-03-24 10:32   ` Olivier Hainque
  2016-03-24 17:00     ` Jeff Law
  2016-03-28 19:58   ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Olivier Hainque @ 2016-03-24 10:32 UTC (permalink / raw)
  To: Alan Modra; +Cc: GCC Patches, Richard Henderson

Hello Alan,

> On 24 Mar 2016, at 05:10, Alan Modra <amodra@gmail.com> wrote:
> 
>> 		  if (could_be_prologue_epilogue
>> 		      && prologue_epilogue_contains (insn))
>> 		    continue;

> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

Ah, interesting, thanks!

>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
> 
> Also, possibly just prologue insns.  So you might be able to modify
> init_alias_analysis just to ignore the prologue and skip any need for
> yet another hook.

Which would be good.

> Let's see what rth thinks.

Definitely.

> He did say the patch might need to be redone.  :)
> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

And here we have a case :)

> [snip]
>> So, aside from the dependency issue which needs to be fixed somehow, I
>> think it would make sense to consider using a strong blockage mecanism in
>> expand_epilogue.
> 
> That's what we both said here
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html
> 
> and David agreed too
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html
> 
> but if you can have the alias analysis changes accepted that would be
> even better.

I'd really like to come to a resolution we're confident is robust,
because these are really very nasty bugs.

To tell the truth, my current feeling is that relying on the frame_related
bit still seems fragile (*) so I'd be happier with something stronger.

(*) First, it's not easy to be positive that all the insns we'd need to
catch are not frame_related, even if only looking at the current rs6000
epilogue expander. Second, it's very easy to consider flipping one such
bit for whatever reason and not think about this kind of implications.

Thanks for your feedback on this!

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-24 10:32   ` Olivier Hainque
@ 2016-03-24 17:00     ` Jeff Law
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Law @ 2016-03-24 17:00 UTC (permalink / raw)
  To: Olivier Hainque, Alan Modra; +Cc: GCC Patches, Richard Henderson

On 03/24/2016 02:17 AM, Olivier Hainque wrote:
>> [snip]
>>> So, aside from the dependency issue which needs to be fixed somehow, I
>>> think it would make sense to consider using a strong blockage mecanism in
>>> expand_epilogue.
>>
>> That's what we both said here
>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html
>>
>> and David agreed too
>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html
>>
>> but if you can have the alias analysis changes accepted that would be
>> even better.
>
> I'd really like to come to a resolution we're confident is robust,
> because these are really very nasty bugs.
The robust solution is to have a scheduling barrier just before the 
point where the stack is deallocated.

The alternative some folks have suggested would be for the generic parts 
of the compiler to add the scheduling barrier before the stack pointer 
adjustment.  I wouldn't object to that.


Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-23 16:24 rs6000 stack_tie mishap again Olivier Hainque
  2016-03-24  7:51 ` Alan Modra
@ 2016-03-28  4:48 ` Segher Boessenkool
  2016-03-28 11:23   ` Olivier Hainque
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2016-03-28  4:48 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: GCC Patches

On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
> The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
> epilogue sequence like
> 
>    addi %r11,%r1,184    # temp pointer within frame

The normal -m32 compiler here generates code like

	lwz 11,0(1)

and try as I might I cannot get it to fail.  Maybe because the GPR11
setup here involves a load?

>    addi %r1,%r11,104    # release frame
> 
>    evldd %r21,16(%r11)  # restore registers
>    ...                  # ...
>    evldd %r31,96(%r11)  # ...
> 
>    blr                  # return

> We have observed this with a gcc 4.7 back-end and weren't able to reproduce
> with a more recent version.

This makes it not a regression and thus out of scope for GCC 6.  We're
supposed to stabilise things now ;-)

>   if (! writep)
>     {
>       base = find_base_term (mem_addr);
>       if (base && (GET_CODE (base) == LABEL_REF
> 		   || (GET_CODE (base) == SYMBOL_REF
> 		       && CONSTANT_POOL_ADDRESS_P (base))))
> 	return 0;
>     }
> 
> 
> with
> 
> (gdb) pr mem_addr
> (plus:SI (reg:SI 11 11)
>     (const_int 96 [0x60]))
>  
> and
>  
> (gdb) pr base
> (symbol_ref/u:SI ("*.LC0") [flags 0x82])
>  
> coming from insn 710, despite 894 in between. Ug.

Yeah that is just Wrong.

> The reason why 894 is not accounted in the base ref computation is because it
> is part of the epilogue sequence, and init_alias_analysis has:
> 
>       /* Walk the insns adding values to the new_reg_base_value array.  */
>       for (i = 0; i < rpo_cnt; i++)
> 	{ ...
> 		  if (could_be_prologue_epilogue
> 		      && prologue_epilogue_contains (insn))
> 		    continue;
> 
> The motivation for this is unclear to me.

Alan linked to the history.  It seems clear that just considering the
prologue is enough to fix the original problem (frame pointer setup),
and problems like yours cannot happen in the prologue.

Better would be not to have this hack at all.

> My rough understanding is that we probably really care about frame_related
> insns only here, at least on targets where the flag is supposed to be set
> accurately.

On targets with DWARF2 unwind info the flag should be set on those insns
that need unwind info.  This does not include all insns in the epilogue
that access the frame, so I don't think this will help you?

> This is the basis of the proposed patch, which essentially disconnects the
> shortcut above for !frame_related insns on targets which need precise alias
> analysis within epilogues, as indicated by a new target hook.

Eww.  Isn't that really all targets that schedule the epilogue at all?

> On the key insn at hand, the frame_related bit was cleared on purpose,
> per:
> 
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html
> 
>   "1) Marking an instruction setting up r11 for use by _save64gpr_* as
>    frame-related results in r11 being set as the cfa for the rest of the
>    function.  That's bad when r11 gets used for other things later.
>   "

And that is correct.

> So, aside from the dependency issue which needs to be fixed somehow, I
> think it would make sense to consider using a strong blockage mecanism in
> expand_epilogue.

It would be very nice if we could directly express "the set of GPR1 should
stay behind any frame accesses", yeah.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-28  4:48 ` Segher Boessenkool
@ 2016-03-28 11:23   ` Olivier Hainque
  2016-03-28 12:44     ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Olivier Hainque @ 2016-03-28 11:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Olivier Hainque, GCC Patches


> On Mar 28, 2016, at 05:01 , Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> The normal -m32 compiler here generates code like
> 
> 	lwz 11,0(1)
> 
> and try as I might I cannot get it to fail.  Maybe because the GPR11
> setup here involves a load?

You need to have had r11 last used to designate a global
symbol as part of the function body (in order to have base_term
designate a symbol_ref etc), and then have the scheduler
decide that moving across is a good idea. It's certainly not
an easy combination to trigger.

>> We have observed this with a gcc 4.7 back-end and weren't able to reproduce
>> with a more recent version.
> 
> This makes it not a regression and thus out of scope for GCC 6.  We're
> supposed to stabilise things now ;-)

Sure, no problem if this is only for gcc 7. I posted the message
while the details were still fresh in my mind.

>> 		  if (could_be_prologue_epilogue
>> 		      && prologue_epilogue_contains (insn))
>> 		    continue;
>> 
>> The motivation for this is unclear to me.
> 
> Alan linked to the history.

Right


>  It seems clear that just considering the
> prologue is enough to fix the original problem (frame pointer setup),
> and problems like yours cannot happen in the prologue.

Right. What is unclear is if it's correct to consider only
prologues here. ISTM we arrange to produce CFI for epilogues
as well today, at least in some circumstances, and maybe the
issue Richard had with prologues at the time would need to
be addressed for epilogues as well today.

> Better would be not to have this hack at all.

Sure.

>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
> 
> On targets with DWARF2 unwind info the flag should be set on those insns
> that need unwind info.  This does not include all insns in the epilogue
> that access the frame, so I don't think this will help you?

My idea was that, maybe, the insns we need to see for alias analysis
are precisely those for which the bit should not be set, which happened
to be exactly the case in the problematic situation we hit. But as I said,
I'm not 100% convinced the reasoning is globally correct.

>> This is the basis of the proposed patch, which essentially disconnects the
>> shortcut above for !frame_related insns on targets which need precise alias
>> analysis within epilogues, as indicated by a new target hook.
> 
> Eww.  Isn't that really all targets that schedule the epilogue at all?

Yes. Most of them use stronger barriers which the dependency analyser
knows about, so aren't affected by this. 

That's a possible alternative approach for rs6000.

Thanks for your feedback,

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-28 11:23   ` Olivier Hainque
@ 2016-03-28 12:44     ` Segher Boessenkool
  2016-03-30  9:40       ` Olivier Hainque
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2016-03-28 12:44 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: GCC Patches

On Mon, Mar 28, 2016 at 12:45:03PM +0200, Olivier Hainque wrote:
> > The normal -m32 compiler here generates code like
> > 
> > 	lwz 11,0(1)
> > 
> > and try as I might I cannot get it to fail.  Maybe because the GPR11
> > setup here involves a load?
> 
> You need to have had r11 last used to designate a global
> symbol as part of the function body (in order to have base_term
> designate a symbol_ref etc), and then have the scheduler
> decide that moving across is a good idea. It's certainly not
> an easy combination to trigger.

Yes, I did that (with some asm's).  Like this:

===
void g(int, char *);

int dum;

void f(int x)
{
        char big[200000];
        g(x, big);
        g(x, big);
        register void *p asm("r11") = &dum;
        asm("" : : "r"(p));
}
===

and the end of that becomes

===
        bl g     # 11   *call_nonlocal_sysvsi/1 [length = 4]
        lis 11,dum@ha    # 12   elf_high        [length = 4]
        la 11,dum@l(11)  # 13   elf_low [length = 4]
        lwz 11,0(1)      # 33   *movsi_internal1/3      [length = 4]
        lwz 0,4(11)      # 34   *movsi_internal1/3      [length = 4]
        lwz 31,-4(11)    # 36   *movsi_internal1/3      [length = 4]
        mr 1,11  # 38   *movsi_internal1/1      [length = 4]
        mtlr 0   # 35   *movsi_internal1/9      [length = 4]
        blr      # 39   *return_internal_si     [length = 4]
===

> > It seems clear that just considering the
> > prologue is enough to fix the original problem (frame pointer setup),
> > and problems like yours cannot happen in the prologue.
> 
> Right. What is unclear is if it's correct to consider only
> prologues here. ISTM we arrange to produce CFI for epilogues
> as well today, at least in some circumstances, and maybe the
> issue Richard had with prologues at the time would need to
> be addressed for epilogues as well today.

Correct is to do alias analysis in both the prologue and epilogue.
If we don't, ports have to do all kinds of stack ties etc. to fake it.

Currently we do neither, so doing just one is a step in the right
direction.

> > Better would be not to have this hack at all.
> 
> Sure.
> 
> >> My rough understanding is that we probably really care about frame_related
> >> insns only here, at least on targets where the flag is supposed to be set
> >> accurately.
> > 
> > On targets with DWARF2 unwind info the flag should be set on those insns
> > that need unwind info.  This does not include all insns in the epilogue
> > that access the frame, so I don't think this will help you?
> 
> My idea was that, maybe, the insns we need to see for alias analysis
> are precisely those for which the bit should not be set, which happened
> to be exactly the case in the problematic situation we hit. But as I said,
> I'm not 100% convinced the reasoning is globally correct.

All the register restores do not have the flag set, in most cases.
But they can in others (say, a target that does not optimise the CFI
stuff very well).

> >> This is the basis of the proposed patch, which essentially disconnects the
> >> shortcut above for !frame_related insns on targets which need precise alias
> >> analysis within epilogues, as indicated by a new target hook.
> > 
> > Eww.  Isn't that really all targets that schedule the epilogue at all?
> 
> Yes. Most of them use stronger barriers which the dependency analyser
> knows about, so aren't affected by this. 
> 
> That's a possible alternative approach for rs6000.

A clobber of scratch should work yes.  It's really heavy handed though,
we can move the GPR1 restore quite freely otherwise (in shrink-wrap),
but perhaps allowing scheduling to move it doesn't buy us much at all.

This doesn't solve the problem however that other dependencies in the
epilogue can be messed up in similar ways.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-24  7:51 ` Alan Modra
  2016-03-24 10:32   ` Olivier Hainque
@ 2016-03-28 19:58   ` Richard Henderson
  2016-04-08  8:25     ` Olivier Hainque
  2016-04-11 10:15     ` Olivier Hainque
  1 sibling, 2 replies; 27+ messages in thread
From: Richard Henderson @ 2016-03-28 19:58 UTC (permalink / raw)
  To: Alan Modra, Olivier Hainque; +Cc: GCC Patches

On 03/23/2016 09:10 PM, Alan Modra wrote:
> On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
>> The reason why 894 is not accounted in the base ref computation is because it
>> is part of the epilogue sequence, and init_alias_analysis has:
>>
>>        /* Walk the insns adding values to the new_reg_base_value array.  */
>>        for (i = 0; i < rpo_cnt; i++)
>> 	{ ...
>> 		  if (could_be_prologue_epilogue
>> 		      && prologue_epilogue_contains (insn))
>> 		    continue;
>>
>> The motivation for this is unclear to me.
>
> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html
>
>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
>
> Also, possibly just prologue insns.  So you might be able to modify
> init_alias_analysis just to ignore the prologue and skip any need for
> yet another hook.
>
> Let's see what rth thinks.  He did say the patch might need to be
> redone.  :)
> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

I be surprised if this is works as expected without side-effects.  You've now 
exposed the restore of the frame pointer to alias analysis, and it's probably 
not seen as constant anymore.  As you reference, I expect that any patch that 
opens the epilogue to such scrutiny is going to have to special-case the frame 
pointer as well.

That said, as Segher points out later in the thread, one can arrange for hard 
regs within the body to bleed into temporaries used within the epilogue, which 
is bad.  So perhaps this is exactly what's needed longer-term.  More 
investigation is required.

But I expect for stage4, the best solution is to strengthen the stack_tie 
pattern to block all memory.  Early scheduling of the stack frame deallocation 
(a simple logic insn) can't really be that important to performance.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-28 12:44     ` Segher Boessenkool
@ 2016-03-30  9:40       ` Olivier Hainque
  2016-03-30 15:15         ` Alan Modra
  0 siblings, 1 reply; 27+ messages in thread
From: Olivier Hainque @ 2016-03-30  9:40 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Olivier Hainque, GCC Patches, Alan Modra

Hello Segher,

> On Mar 28, 2016, at 13:18 , Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
>> You need to have had r11 last used to designate a global
>> symbol as part of the function body (in order to have base_term
>> designate a symbol_ref etc), and then have the scheduler
>> decide that moving across is a good idea. It's certainly not
>> an easy combination to trigger.
> 
> Yes, I did that (with some asm's).  Like this:
> 
> ===
> void g(int, char *);
> 
> int dum;
> 
> void f(int x)
> {
>        char big[200000];
>        g(x, big);
>        g(x, big);
>        register void *p asm("r11") = &dum;
>        asm("" : : "r"(p));
> }

Ah, I see, thanks. In this instance, the problem doesn't
trigger because CONSTANT_POOL_ADDRESS_P (base) is false in

  base = find_base_term (true_mem_addr);
  if (! writep
      && base
      && (GET_CODE (base) == LABEL_REF
	  || (GET_CODE (base) == SYMBOL_REF
	      && CONSTANT_POOL_ADDRESS_P (base))))
    return 0;

  (part of write_dependence_p)

With a minor variation:

void g(int, char *);

void f(int x)
{
       char big[200000];
 start:
       g(x, big);
       g(x, big);
       register void *p asm("r11") = &&start;
       asm("" : : "r"(p));
       asm("" : : :"r28");
       asm("" : : :"r29");
       asm("" : : :"r30");
}

I'm getting:

        lis 11,.L2@ha
        la 11,.L2@l(11)

        lwz 11,0(1)
        lwz 0,4(11)
        lwz 28,-16(11) 

	mr 1,11

	mtlr 0
	lwz 29,-12(11)
	lwz 30,-8(11)
	lwz 31,-4(11)

	blr

out of a powerpc-elf close-to-tunk compiler, despite the
presence of a stack_tie insn at the rtl level.

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-30  9:40       ` Olivier Hainque
@ 2016-03-30 15:15         ` Alan Modra
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Modra @ 2016-03-30 15:15 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: Segher Boessenkool, GCC Patches

On Wed, Mar 30, 2016 at 11:02:41AM +0200, Olivier Hainque wrote:
> void g(int, char *);
> 
> void f(int x)
> {
>        char big[200000];
>  start:
>        g(x, big);
>        g(x, big);
>        register void *p asm("r11") = &&start;
>        asm("" : : "r"(p));
>        asm("" : : :"r28");
>        asm("" : : :"r29");
>        asm("" : : :"r30");
> }
> 
> I'm getting:
> 
>         lis 11,.L2@ha
>         la 11,.L2@l(11)
> 
>         lwz 11,0(1)
>         lwz 0,4(11)
>         lwz 28,-16(11) 
> 
> 	mr 1,11
> 
> 	mtlr 0
> 	lwz 29,-12(11)
> 	lwz 30,-8(11)
> 	lwz 31,-4(11)
> 
> 	blr

BTW, the exact sequence you get depends on -mcpu (not surprising), but
yes, I see register restores after the "mr 1,11" too.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-28 19:58   ` Richard Henderson
@ 2016-04-08  8:25     ` Olivier Hainque
  2016-04-08 15:37       ` Segher Boessenkool
  2016-04-11 10:15     ` Olivier Hainque
  1 sibling, 1 reply; 27+ messages in thread
From: Olivier Hainque @ 2016-04-08  8:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Olivier Hainque, Alan Modra, GCC Patches

Hello Richard & Alan,

Thanks for your feedback. Back on it after a few extra
experiments.

> On Mar 28, 2016, at 19:41 , Richard Henderson <rth@redhat.com> wrote:
> 
>> Let's see what rth thinks.  He did say the patch might need to be
>> redone.  :)
>> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html
> 
> I be surprised if this is works as expected without side-effects.  You've now exposed the restore of the frame pointer to alias analysis, and it's probably not seen as constant anymore.  As you reference, I expect that any patch that opens the epilogue to such scrutiny is going to have to special-case the frame pointer as well.

> That said, as Segher points out later in the thread, one can arrange for hard regs within the body to bleed into temporaries used within the epilogue, which is bad.  So perhaps this is exactly what's needed longer-term.  More investigation is required.

I'd be happy to help, as much as I can.

Can you please expand a bit on the kind of side-effects you'd expect
and hint at useful directions you believe the investigation should
take ? I have some ideas and would like to make sure they're in
line with what you think would be most relevant :)

> But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.

My feeling as well. At least, it can't be important enough to warrant
a sustained exposure to the kind of bug we're discussing here.

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-08  8:25     ` Olivier Hainque
@ 2016-04-08 15:37       ` Segher Boessenkool
  2016-04-08 16:01         ` Olivier Hainque
  0 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2016-04-08 15:37 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: Richard Henderson, Alan Modra, GCC Patches

On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote:
> > But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.
> 
> My feeling as well. At least, it can't be important enough to warrant
> a sustained exposure to the kind of bug we're discussing here.

Is it a regression?  Changing this in stage 4, and this late in stage 4,
is super invasive.  Wrt performance, well, I'd like to see numbers :-/


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-08 15:37       ` Segher Boessenkool
@ 2016-04-08 16:01         ` Olivier Hainque
  0 siblings, 0 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-04-08 16:01 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Olivier Hainque, Richard Henderson, Alan Modra, GCC Patches


> On Apr 8, 2016, at 17:37 , Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Apr 08, 2016 at 10:24:50AM +0200, Olivier Hainque wrote:
>>> But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.
>> 
>> My feeling as well. At least, it can't be important enough to warrant
>> a sustained exposure to the kind of bug we're discussing here.
> 
> Is it a regression?  Changing this in stage 4, and this late in stage 4,
> is super invasive.  Wrt performance, well, I'd like to see numbers :-/

Sorry, my comment was ambiguous: "My feeling as well" etc was in reaction
to Richard's second sentence. I didn't mean to comment on what we want to
do or not for stage4.

Regarding perfs, I agree that having numbers would be good.

Nevertheless, I'd rather see some perf drop than just hoping for this not
to show up, in any case, and I remain convinced that whatever we gain seems
unlikely to be worth the risk of hitting this bug.

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-28 19:58   ` Richard Henderson
  2016-04-08  8:25     ` Olivier Hainque
@ 2016-04-11 10:15     ` Olivier Hainque
  2016-04-14 15:47       ` Andreas Krebbel
  2016-04-14 22:42       ` Segher Boessenkool
  1 sibling, 2 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-04-11 10:15 UTC (permalink / raw)
  To: Alan Modra
  Cc: Olivier Hainque, GCC Patches, Richard Henderson, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]


> On Mar 28, 2016, at 19:41 , Richard Henderson <rth@redhat.com> wrote:
> 
> But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.

Something like the attached patch, at least for next stage1 until the
more general issue is sorted out ?

Only basic testing at this point. I can do more if considered appropriate.

Thanks for your feedback,

With Kind Regards,

Olivier

	* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
        (clobber mem:BLK (scratch)) to the set of effects, blocking
	all memory accesses.


[-- Attachment #2: stronger_rs6000_tie.diff --]
[-- Type: application/octet-stream, Size: 909 bytes --]

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 132a202..c5f86d0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24221,7 +24221,16 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
 	    && REGNO (fp) == HARD_FRAME_POINTER_REGNUM)))
     regs[i++] = fp;
 
-  p = rtvec_alloc (i);
+  /* Unfortunately, the effect of prologue and epilogue insn is mostly ignored
+     from init_alias_analysis, and this is problematic with respect to our
+     temporary use of regular registers to address the frame (as conveyed by
+     our FP argument).  We have to resort to a non-subtle memory blockage
+     mechanism until this is resolved.  */
+
+  p = rtvec_alloc (i+1);
+  RTVEC_ELT (p, i) = 
+    gen_rtx_CLOBBER (VOIDmode, gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)));
+
   while (--i >= 0)
     {
       rtx mem = gen_frame_mem (BLKmode, regs[i]);

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-11 10:15     ` Olivier Hainque
@ 2016-04-14 15:47       ` Andreas Krebbel
  2016-04-14 16:50         ` Jeff Law
  2016-04-14 22:42       ` Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Andreas Krebbel @ 2016-04-14 15:47 UTC (permalink / raw)
  To: Olivier Hainque, Alan Modra
  Cc: GCC Patches, Richard Henderson, Segher Boessenkool, uweigand

On 04/11/2016 12:15 PM, Olivier Hainque wrote:
> 
>> On Mar 28, 2016, at 19:41 , Richard Henderson <rth@redhat.com> wrote:
>>
>> But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.
> 
> Something like the attached patch, at least for next stage1 until the
> more general issue is sorted out ?
> 
> Only basic testing at this point. I can do more if considered appropriate.
> 
> Thanks for your feedback,
> 
> With Kind Regards,
> 
> Olivier
> 
> 	* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
>         (clobber mem:BLK (scratch)) to the set of effects, blocking
> 	all memory accesses.

We have the same problem on S/390 and I'm also looking for a way to solve that. I'm not sure about
the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a memory write over a
(clobber (mem))?

The patch I'm currently testing follows a proposal from Joseph Myers in 2011:
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00977.html

+ [(set (mem:BLK (scratch))
+ (unspec:BLK [(match_operand:BLK 0 "memory_operand" "+m")] UNSPEC_TIE))]

What I'm wondering is whether the memory read is actually needed here?! Wouldn't the following have
the same effect?

[(set (mem:BLK (scratch)) (unspec:BLK [(const_int 0)] UNSPEC_TIE))]

To my understanding this should block memory reads and writes from being moved across.
The MEM probably should be set up to be in the frame alias set?

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-14 15:47       ` Andreas Krebbel
@ 2016-04-14 16:50         ` Jeff Law
  2016-04-14 17:10           ` Olivier Hainque
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Law @ 2016-04-14 16:50 UTC (permalink / raw)
  To: Andreas Krebbel, Olivier Hainque, Alan Modra
  Cc: GCC Patches, Richard Henderson, Segher Boessenkool, uweigand

On 04/14/2016 09:47 AM, Andreas Krebbel wrote:
> On 04/11/2016 12:15 PM, Olivier Hainque wrote:
>>
>>> On Mar 28, 2016, at 19:41 , Richard Henderson <rth@redhat.com> wrote:
>>>
>>> But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.
>>
>> Something like the attached patch, at least for next stage1 until the
>> more general issue is sorted out ?
>>
>> Only basic testing at this point. I can do more if considered appropriate.
>>
>> Thanks for your feedback,
>>
>> With Kind Regards,
>>
>> Olivier
>>
>> 	* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Add a
>>          (clobber mem:BLK (scratch)) to the set of effects, blocking
>> 	all memory accesses.
>
> We have the same problem on S/390 and I'm also looking for a way to solve that. I'm not sure about
> the (clobber (mem)) approach. Wouldn't it be theoretically possible to move a memory write over a
> (clobber (mem))?
I thought we had code to handle this case specially, but I can't 
immediately find it in sched-deps.c.

Some ports also use an unspec_volatile to achieve the same effect:

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

(define_insn "blockage"
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
   ""
   ""
   [(set_attr "length" "0")])

;; Do not schedule instructions accessing memory across this point.

(define_expand "memory_blockage"
   [(set (match_dup 0)
         (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
   ""
{
   operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
   MEM_VOLATILE_P (operands[0]) = 1;
})

(define_insn "*memory_blockage"
   [(set (match_operand:BLK 0)
         (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
   ""
   ""
   [(set_attr "length" "0")])


Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-14 16:50         ` Jeff Law
@ 2016-04-14 17:10           ` Olivier Hainque
  2016-04-15  4:37             ` Andreas Krebbel
  2016-04-15 16:42             ` Jeff Law
  0 siblings, 2 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-04-14 17:10 UTC (permalink / raw)
  To: Jeff Law
  Cc: Andreas Krebbel, Alan Modra, GCC Patches, Richard Henderson,
	Segher Boessenkool, uweigand


> On 14 Apr 2016, at 18:50, Jeff Law <law@redhat.com> wrote:
> 
> I thought we had code to handle this case specially, but I can't immediately find it in sched-deps.c.

Unless I misunderstood what you were exactly looking for,
the special code is in alias.c. E.g. write_dependence_p:

  /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
     This is used in epilogue deallocation functions.  */
  ...

> Some ports also use an unspec_volatile to achieve the same effect:
> 
> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
> ;; all of memory.  This blocks insns from being moved across this point.
> 
> (define_insn "blockage"
>  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
>  ""
>  ""
>  [(set_attr "length" "0")])

Yes, I pondered this one and thought that a memory barrier
would be less aggressive, while good enough.

> ;; Do not schedule instructions accessing memory across this point.
> 
> (define_expand "memory_blockage"
>  [(set (match_dup 0)
>        (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
>  ""
> {
>  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
>  MEM_VOLATILE_P (operands[0]) = 1;
> })
> 
> (define_insn "*memory_blockage"
>  [(set (match_operand:BLK 0)
>        (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BLOCKAGE))]
>  ""
>  ""
>  [(set_attr "length" "0")])

Emitting just that alone isn't enough though. The first attempt
I made did that and failed because the whole 

  reg-restore
  reg-restore
  tie (mem blockage only)
  
sequence was allowed to move past the stack pointer reset when
it's performed as a mere register to register move.

So I ended up adding the clobber mem:blk scratch to the existing
tie parallel, which references the stack pointer register as
well so is forbidden to move past it's update.

Olivier


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-11 10:15     ` Olivier Hainque
  2016-04-14 15:47       ` Andreas Krebbel
@ 2016-04-14 22:42       ` Segher Boessenkool
  2016-04-15 15:17         ` Olivier Hainque
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2016-04-14 22:42 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: Alan Modra, GCC Patches, Richard Henderson

On Mon, Apr 11, 2016 at 12:15:10PM +0200, Olivier Hainque wrote:
> 
> > On Mar 28, 2016, at 19:41 , Richard Henderson <rth@redhat.com> wrote:
> > 
> > But I expect for stage4, the best solution is to strengthen the stack_tie pattern to block all memory.  Early scheduling of the stack frame deallocation (a simple logic insn) can't really be that important to performance.
> 
> Something like the attached patch, at least for next stage1 until the
> more general issue is sorted out ?

It's a bit heavy; as a workaround, we may want this clobber in the stack
deallocation in the epilogue, but not in all other places stack_tie is
used.

And for stage 1 we do not really want a workaround, we want to fix the
actual problem ;-)


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-14 17:10           ` Olivier Hainque
@ 2016-04-15  4:37             ` Andreas Krebbel
  2016-04-15  7:43               ` Olivier Hainque
  2016-04-15 16:42             ` Jeff Law
  1 sibling, 1 reply; 27+ messages in thread
From: Andreas Krebbel @ 2016-04-15  4:37 UTC (permalink / raw)
  To: Olivier Hainque, Jeff Law
  Cc: Alan Modra, GCC Patches, Richard Henderson, Segher Boessenkool, uweigand

On 04/14/2016 07:10 PM, Olivier Hainque wrote:
> 
>> On 14 Apr 2016, at 18:50, Jeff Law <law@redhat.com> wrote:
>>
>> I thought we had code to handle this case specially, but I can't immediately find it in sched-deps.c.
> 
> Unless I misunderstood what you were exactly looking for,
> the special code is in alias.c. E.g. write_dependence_p:
> 
>   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>      This is used in epilogue deallocation functions.  */
>   ...

Ok thanks. I've verified that the dependencies are also generated when
using this expression in a clobber.

> Emitting just that alone isn't enough though. The first attempt
> I made did that and failed because the whole 
> 
>   reg-restore
>   reg-restore
>   tie (mem blockage only)
>   
> sequence was allowed to move past the stack pointer reset when
> it's performed as a mere register to register move.
> 
> So I ended up adding the clobber mem:blk scratch to the existing
> tie parallel, which references the stack pointer register as
> well so is forbidden to move past it's update.

I've seen the same on S/390 when restoring the stack pointer
from a floating point reg. But I think it is not limited to
reg-reg stack pointer restores.  Potentially this could also
happen with the stack pointer decrement in the prologue which
could also get moved across a mem only barrier.

Bye,

-Andreas-


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-15  4:37             ` Andreas Krebbel
@ 2016-04-15  7:43               ` Olivier Hainque
  0 siblings, 0 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-04-15  7:43 UTC (permalink / raw)
  To: Andreas Krebbel
  Cc: Olivier Hainque, Jeff Law, Alan Modra, GCC Patches,
	Richard Henderson, Segher Boessenkool, uweigand


> On Apr 15, 2016, at 06:37 , Andreas Krebbel <krebbel@linux.vnet.ibm.com> wrote:
>>  /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>>     This is used in epilogue deallocation functions.  */
>>  ...
> 
> Ok thanks. I've verified that the dependencies are also generated when
> using this expression in a clobber.

Ok, thanks. This is consistent with what I've seen.

>> sequence was allowed to move past the stack pointer reset when
>> it's performed as a mere register to register move.
> 
> I've seen the same on S/390 when restoring the stack pointer
> from a floating point reg. But I think it is not limited to
> reg-reg stack pointer restores.  Potentially this could also
> happen with the stack pointer decrement in the prologue which
> could also get moved across a mem only barrier.

Indeed.

With Kind Regards,

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-14 22:42       ` Segher Boessenkool
@ 2016-04-15 15:17         ` Olivier Hainque
  0 siblings, 0 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-04-15 15:17 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Olivier Hainque, Alan Modra, GCC Patches, Richard Henderson


> On Apr 15, 2016, at 00:42 , Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
>> 
>> Something like the attached patch, at least for next stage1 until the
>> more general issue is sorted out ?
> 
> It's a bit heavy; as a workaround, we may want this clobber in the stack
> deallocation in the epilogue, but not in all other places stack_tie is
> used.

OK, I can add an epilogue_p argument or something like that.

> And for stage 1 we do not really want a workaround, we want to fix the
> actual problem ;-)

Sure. It might take some time though, and we might want
a workaround in the interim. Maintainers' call I suppose :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-14 17:10           ` Olivier Hainque
  2016-04-15  4:37             ` Andreas Krebbel
@ 2016-04-15 16:42             ` Jeff Law
  2016-04-15 17:05               ` Olivier Hainque
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2016-04-15 16:42 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: Andreas Krebbel, Alan Modra, GCC Patches, Richard Henderson,
	Segher Boessenkool, uweigand

On 04/14/2016 11:10 AM, Olivier Hainque wrote:
>
>> On 14 Apr 2016, at 18:50, Jeff Law <law@redhat.com> wrote:
>>
>> I thought we had code to handle this case specially, but I can't immediately find it in sched-deps.c.
>
> Unless I misunderstood what you were exactly looking for,
> the special code is in alias.c. E.g. write_dependence_p:
>
>    /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>       This is used in epilogue deallocation functions.  */
*That's* the one I was looking for  :-)



>    ...
>
>> Some ports also use an unspec_volatile to achieve the same effect:
>>
>> ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
>> ;; all of memory.  This blocks insns from being moved across this point.
>>
>> (define_insn "blockage"
>>   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
>>   ""
>>   ""
>>   [(set_attr "length" "0")])
>
> Yes, I pondered this one and thought that a memory barrier
> would be less aggressive, while good enough.
But don't you run the risk that the stack could be deallocated before 
the restores are done?  This came up on the PA port a long time ago. 
IIRC the situations was something like this:

We had a frame pointer and all the restores were being done via the 
frame pointer.  So the scheduler moved the stack pointer adjustment up 
past the register restores.  At which point the restores were reading 
from unallocated stack space.

99.99% of the time it didn't cause a problem.  But if we got an 
interrupt in that brief window, boom, the interrupt handler could 
allocate a frame on the current stack, store some data in there which 
clobbered the saved register state.

Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-15 16:42             ` Jeff Law
@ 2016-04-15 17:05               ` Olivier Hainque
  2016-04-15 17:26                 ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Olivier Hainque @ 2016-04-15 17:05 UTC (permalink / raw)
  To: Jeff Law
  Cc: Olivier Hainque, Andreas Krebbel, Alan Modra, GCC Patches,
	Richard Henderson, Segher Boessenkool, uweigand


> On Apr 15, 2016, at 18:42 , Jeff Law <law@redhat.com> wrote:

>>   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything.
>>      This is used in epilogue deallocation functions.  */
> *That's* the one I was looking for  :-)

:-)

>> Yes, I pondered this one and thought that a memory barrier
>> would be less aggressive, while good enough.
> But don't you run the risk that the stack could be deallocated before the restores are done?  This came up on the PA port a long time ago. IIRC the situations was something like this:
> 
> We had a frame pointer and all the restores were being done via the frame pointer.  So the scheduler moved the stack pointer adjustment up past the register restores.  At which point the restores were reading from unallocated stack space.
> 
> 99.99% of the time it didn't cause a problem.  But if we got an interrupt in that brief window, boom, the interrupt handler could allocate a frame on the current stack, store some data in there which clobbered the saved register state.

Yes, that's exactly the kind of problem we're fighting and, indeed,
the memory barrier alone isn't enough.

Here, the idea is to add a memory barrier to the set of things that the rs6000
back-end already does, which tries to be better than a complete scheduling
barrier (see rs6000_emit_stack_tie), but turns out to still fail in some corner
cases due to alias analysis intricacies (original problem description at
the very beginning of this now long thread: 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html)

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-04-15 17:05               ` Olivier Hainque
@ 2016-04-15 17:26                 ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2016-04-15 17:26 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: Jeff Law, Andreas Krebbel, Alan Modra, GCC Patches,
	Richard Henderson, uweigand

On Fri, Apr 15, 2016 at 07:05:25PM +0200, Olivier Hainque wrote:
> > But don't you run the risk that the stack could be deallocated before the restores are done?  This came up on the PA port a long time ago. IIRC the situations was something like this:
> > 
> > We had a frame pointer and all the restores were being done via the frame pointer.  So the scheduler moved the stack pointer adjustment up past the register restores.  At which point the restores were reading from unallocated stack space.
> > 
> > 99.99% of the time it didn't cause a problem.  But if we got an interrupt in that brief window, boom, the interrupt handler could allocate a frame on the current stack, store some data in there which clobbered the saved register state.
> 
> Yes, that's exactly the kind of problem we're fighting and, indeed,
> the memory barrier alone isn't enough.
> 
> Here, the idea is to add a memory barrier to the set of things that the rs6000
> back-end already does, which tries to be better than a complete scheduling
> barrier (see rs6000_emit_stack_tie), but turns out to still fail in some corner
> cases due to alias analysis intricacies (original problem description at
> the very beginning of this now long thread: 
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01337.html)

I think the best thing to do is add the clobber-of-mem-scratch to the
final stack deallocate (as a parallel).  I don't see anything else that
will work reliably.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-24  8:17 ` Alan Modra
@ 2016-03-24 10:17   ` Olivier Hainque
  0 siblings, 0 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-03-24 10:17 UTC (permalink / raw)
  To: Alan Modra; +Cc: David Edelsohn, GCC Patches


> On 24 Mar 2016, at 05:58, Alan Modra <amodra@gmail.com> wrote:
> 
> On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
>> The description and
>> references to prior SPE prologue and epilogue changes do not confirm a
>> wider problem.
> 
> There's a good chance this affects ABI_V4 large stack frames too.
> If restoring regs inline we'll be using r11 as a base, just like SPE
> does with moderate sized stack frames when restoring 64-bit regs.

Exactly. If I'm not mistaken, the set of problematic cases encompasses
everything which uses in the epilogue, as a base, a register which
might have been used last to designate a global object in the function
body. There are such uses of at least r11 not limited to SPE.

Olivier

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
  2016-03-23 17:42 David Edelsohn
@ 2016-03-24  8:17 ` Alan Modra
  2016-03-24 10:17   ` Olivier Hainque
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Modra @ 2016-03-24  8:17 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Olivier Hainque, GCC Patches

On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
> The description and
> references to prior SPE prologue and epilogue changes do not confirm a
> wider problem.

There's a good chance this affects ABI_V4 large stack frames too.
If restoring regs inline we'll be using r11 as a base, just like SPE
does with moderate sized stack frames when restoring 64-bit regs.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: rs6000 stack_tie mishap again
@ 2016-03-23 17:42 David Edelsohn
  2016-03-24  8:17 ` Alan Modra
  0 siblings, 1 reply; 27+ messages in thread
From: David Edelsohn @ 2016-03-23 17:42 UTC (permalink / raw)
  To: Olivier Hainque, Alan Modra; +Cc: GCC Patches

First, SPE has not been maintained and little participation from
Freescale.  I would rather deprecate all SPE support.  SPE ABI is
broken by design.

I find the approach very heavy-handed.  If you want to enable the
target hook for SPE *only*, that's fine with me.  The description and
references to prior SPE prologue and epilogue changes do not confirm a
wider problem.

- David

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-04-15 17:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-23 16:24 rs6000 stack_tie mishap again Olivier Hainque
2016-03-24  7:51 ` Alan Modra
2016-03-24 10:32   ` Olivier Hainque
2016-03-24 17:00     ` Jeff Law
2016-03-28 19:58   ` Richard Henderson
2016-04-08  8:25     ` Olivier Hainque
2016-04-08 15:37       ` Segher Boessenkool
2016-04-08 16:01         ` Olivier Hainque
2016-04-11 10:15     ` Olivier Hainque
2016-04-14 15:47       ` Andreas Krebbel
2016-04-14 16:50         ` Jeff Law
2016-04-14 17:10           ` Olivier Hainque
2016-04-15  4:37             ` Andreas Krebbel
2016-04-15  7:43               ` Olivier Hainque
2016-04-15 16:42             ` Jeff Law
2016-04-15 17:05               ` Olivier Hainque
2016-04-15 17:26                 ` Segher Boessenkool
2016-04-14 22:42       ` Segher Boessenkool
2016-04-15 15:17         ` Olivier Hainque
2016-03-28  4:48 ` Segher Boessenkool
2016-03-28 11:23   ` Olivier Hainque
2016-03-28 12:44     ` Segher Boessenkool
2016-03-30  9:40       ` Olivier Hainque
2016-03-30 15:15         ` Alan Modra
2016-03-23 17:42 David Edelsohn
2016-03-24  8:17 ` Alan Modra
2016-03-24 10:17   ` Olivier Hainque

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).