public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: rs6000 stack_tie mishap again
@ 2016-03-23 17:42 David Edelsohn
  2016-03-24  8:17 ` Alan Modra
  0 siblings, 1 reply; 27+ messages in thread
From: David Edelsohn @ 2016-03-23 17:42 UTC (permalink / raw)
  To: Olivier Hainque, Alan Modra; +Cc: GCC Patches

First, SPE has not been maintained and little participation from
Freescale.  I would rather deprecate all SPE support.  SPE ABI is
broken by design.

I find the approach very heavy-handed.  If you want to enable the
target hook for SPE *only*, that's fine with me.  The description and
references to prior SPE prologue and epilogue changes do not confirm a
wider problem.

- David

^ permalink raw reply	[flat|nested] 27+ messages in thread
* rs6000 stack_tie mishap again
@ 2016-03-23 16:24 Olivier Hainque
  2016-03-24  7:51 ` Alan Modra
  2016-03-28  4:48 ` Segher Boessenkool
  0 siblings, 2 replies; 27+ messages in thread
From: Olivier Hainque @ 2016-03-23 16:24 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 6405 bytes --]

Hello,

This is a proposal to address what I think has been a long-standing, very
nasty latent code generation problem, which just manifested in-house on a
proprietary testcase.

This is intricate, so long email ...

Visible misbehavior
====================

The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
epilogue sequence like

   addi %r11,%r1,184    # temp pointer within frame

   addi %r1,%r11,104    # release frame

   evldd %r21,16(%r11)  # restore registers
   ...                  # ...
   evldd %r31,96(%r11)  # ...

   blr                  # return

This is causing troubles in situations where interrupt code somehow quicks in
between the frame release and the blr, clobbering the stack slots before the
register restore insn have run.

Triggering conditions
=====================

We are witnessing a situation where the rs6000 stack_tie mechanism is
ineffective, out of a missed dependency between the stack_tie insn and some
register restores.

We have observed this with a gcc 4.7 back-end and weren't able to reproduce
with a more recent version. I believe, however, that the problem is still
latent, just requiring extremely precise conditions to trigger the problematic
scheduling decision.

At the RTL level, with our 4.7 based compiler, the chain of events is as
follows:

Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to:

------------ p.adb.216r.split4:

(insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650])
        (high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high}
...

(note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG)
...
... sequence of register restores ...

(insn 894 893 895 32 (set (reg:SI 11 %r11)
        (plus:SI (reg/f:SI 1 %r1)
            (const_int 184 [0xb8]))) 
... 
(insn 907 906 908 32 (set (reg:V2SI 31 31)
        (mem:V2SI (plus:SI (reg:SI 11 11)
                (const_int 96 [0x60])) [0 S8 A64])) 
 
... STACK TIE insn, intended to prevent the scheduler from moving ...
... restores past this point, so they remain issued prior to the  ...
... following insn:
 
(insn 908 907 909 32 (parallel [
            (set (mem/c:BLK (reg/f:SI 1 1) [17 A8])
                (const_int 0 [0]))
            (set (mem/c:BLK (reg:SI 11 11) [17 A8])
                (const_int 0 [0]))
        ]) ../p.adb:84:8 681 {stack_tie}
     (nil))
 
... bumping r1 (stack pointer), releasing the stack frame:
 
(insn/f 909 908 910 32 (set (reg/f:SI 1 1)
        (plus:SI (reg:SI 11 11)
            (const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1}

The tie references a mem with alias set 17 (stack accesses). The restores
use alias set 0, which should conflict anyway, so this looks OK.

When sched2 kicks in, we however fail to establish a dependency between
907 (and other restores relative to r11) and the stack_tie.

At the bottom of the chain, we see write_dependence_p claiming absence
of dependence for 907 from

  if (! writep)
    {
      base = find_base_term (mem_addr);
      if (base && (GET_CODE (base) == LABEL_REF
		   || (GET_CODE (base) == SYMBOL_REF
		       && CONSTANT_POOL_ADDRESS_P (base))))
	return 0;
    }


with

(gdb) pr mem_addr
(plus:SI (reg:SI 11 11)
    (const_int 96 [0x60]))
 
and
 
(gdb) pr base
(symbol_ref/u:SI ("*.LC0") [flags 0x82])
 
coming from insn 710, despite 894 in between. Ug.

Further Analysis & patch proposal
=================================

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

      /* Walk the insns adding values to the new_reg_base_value array.  */
      for (i = 0; i < rpo_cnt; i++)
	{ ...
		  if (could_be_prologue_epilogue
		      && prologue_epilogue_contains (insn))
		    continue;

The motivation for this is unclear to me.

My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.

This is the basis of the proposed patch, which essentially disconnects the
shortcut above for !frame_related insns on targets which need precise alias
analysis within epilogues, as indicated by a new target hook.

On the key insn at hand, the frame_related bit was cleared on purpose,
per:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

  "1) Marking an instruction setting up r11 for use by _save64gpr_* as
   frame-related results in r11 being set as the cfa for the rest of the
   function.  That's bad when r11 gets used for other things later.
  "

I have verified that it cures the problem with our gcc-4.7 based toolchain,
and we have been exercising it in gcc-4.9 based compilers on lots of ppc and
e500 configurations without problems for a few weeks now

I'll be happy to perform further testing if required. This is pretty heavy for
me, however, so I'd rather have a discussion and a preliminary agreement on
the approach first.

This kind of issue isn't new. See for example the threads rooted at
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html
or https://gcc.gnu.org/ml/gcc-patches/2012-04/msg00070.html

The misbehavior it produces is a real pain, with arbitrary hazards very hard
to track down. We have been hit more than once already for a gain which is not
so obvious.

So, aside from the dependency issue which needs to be fixed somehow, I
think it would make sense to consider using a strong blockage mecanism in
expand_epilogue.

Thoughts ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

-----


2016-03-23  Olivier Hainque <hainque@adacore.com>

        * target.def (epilogue_aliases): New target hook.
        * function.c (prologue_epilogue_insns): Return bool and rewrite as
        combination of ...
        (prologue_insns, epilogue_insns): New functions.
        * rtl.h: Add prototypes for the new functions and adjust prototype
        of prologue_epilogue_insns for the return type change.
        * alias.c (init_alias_analysis): Honor targetm.epilogue_aliases.
    
        * config/rs6000/rs6000.c (TARGET_EPILOGUE_ALIASES): Redefine, return
        true.
    
        doc/
        * tm.texi.in (TARGET_EPILOGUE_ALIASES): Add new @hook entry.
        * tm.texi: Regenerate.


[-- Attachment #2: ppc-frame.diff --]
[-- Type: application/octet-stream, Size: 6275 bytes --]

commit 9b3a8400aeb43ad9ce0c753dc8fecac9db304da3
Author: Olivier Hainque <hainque@adacore.com>
Date:   Tue Mar 22 18:37:27 2016 +0100

    initial port from 4.9

diff --git a/gcc/alias.c b/gcc/alias.c
index 753e1af..09a709f 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -3241,9 +3241,12 @@ init_alias_analysis (void)
      insn chain until after reload has completed.  Thus,
      there is no sense wasting time checking if INSN is in
      the prologue/epilogue until after reload has completed.  */
-  bool could_be_prologue_epilogue = ((targetm.have_prologue ()
-				      || targetm.have_epilogue ())
-				     && reload_completed);
+
+  bool could_be_prologue = (targetm.have_prologue ()
+			    && reload_completed);
+
+  bool could_be_epilogue = (targetm.have_epilogue ()
+			    && reload_completed);
 
   pass = 0;
   do
@@ -3283,8 +3286,19 @@ init_alias_analysis (void)
 		{
 		  rtx note, set;
 
-		  if (could_be_prologue_epilogue
-		      && prologue_epilogue_contains (insn))
+		  /* Skip prologue instructions threaded onto the insn chain
+		     after reload, which bring nothing of interest to alias
+		     analysis within the function's body.  */
+		  if (could_be_prologue
+		      && prologue_contains (insn))
+		    continue;
+
+		  /* Likewise for epilogue instructions, except for possible
+		     needs within the epilogue sequence itself.  */
+		  if (could_be_epilogue
+		      && epilogue_contains (insn)
+		      && (!targetm.epilogue_aliases ()
+			  || RTX_FRAME_RELATED_P (insn)))
 		    continue;
 
 		  /* If this insn has a noalias note, process it,  Otherwise,
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fd4b7cc..cf5ca42 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1712,6 +1712,11 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
 #endif
 
+/* Our stack_tie mechanism requires visibility on intermediate
+   base register computations in the epilogue.  */
+#undef TARGET_EPILOGUE_ALIASES
+#define  TARGET_EPILOGUE_ALIASES  hook_bool_void_true
+
 /* Use a 32-bit anchor range.  This leads to sequences like:
 
 	addis	tmp,anchor,high
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 745910f..20c17f6 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11289,6 +11289,16 @@ or when the back end is in a partially-initialized state.
 outside of any function scope.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_EPILOGUE_ALIASES (void)
+This target hook should return @code{true} if the target needs
+alias analysis to look into operations performed by epilogue instructions
+*not* marked @code{RTX_FRAME_RELATED_P}.
+This needs only be redefined by targets which use precise memory references
+with specially assigned alias sets instead of a strong blockage to prevent
+the move of register restores past the stack frame release.
+Default return value is @code{false}.
+@end deftypefn
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f31c763..b4b6b1a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8039,6 +8039,8 @@ to by @var{ce_info}.
 
 @hook TARGET_SET_CURRENT_FUNCTION
 
+@hook TARGET_EPILOGUE_ALIASES
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/function.c b/gcc/function.c
index 1ac8e26..f025f21 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5743,14 +5743,28 @@ contains (const_rtx insn, hash_table<insn_cache_hasher> *hash)
   return hash->find (const_cast<rtx> (insn)) != NULL;
 }
 
-int
+/* Whether INSN is part of the function's prologue sequence.  */
+
+bool
+prologue_contains (const_rtx insn)
+{
+  return contains (insn, prologue_insn_hash);
+}
+
+/* Whether INSN is part of the function's epilogue sequence.  */
+
+bool
+epilogue_contains (const_rtx insn)
+{
+  return contains (insn, epilogue_insn_hash);
+}
+
+/* Whether INSN is part of the function's prologue or epilogue sequence.  */
+
+bool
 prologue_epilogue_contains (const_rtx insn)
 {
-  if (contains (insn, prologue_insn_hash))
-    return 1;
-  if (contains (insn, epilogue_insn_hash))
-    return 1;
-  return 0;
+  return prologue_contains (insn) || epilogue_contains (insn);
 }
 
 /* Insert use of return register before the end of BB.  */
diff --git a/gcc/function.h b/gcc/function.h
index 501ef68..81d9e50 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -628,7 +628,9 @@ extern void clobber_return_register (void);
 extern void expand_function_end (void);
 extern rtx get_arg_pointer_save_area (void);
 extern void maybe_copy_prologue_epilogue_insn (rtx, rtx);
-extern int prologue_epilogue_contains (const_rtx);
+extern bool prologue_contains (const_rtx);
+extern bool epilogue_contains (const_rtx);
+extern bool prologue_epilogue_contains (const_rtx);
 extern void emit_return_into_block (bool simple_p, basic_block bb);
 extern void set_return_jump_label (rtx_insn *);
 extern bool active_insn_between (rtx_insn *head, rtx_insn *tail);
diff --git a/gcc/target.def b/gcc/target.def
index 20f2b32..9276858 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3417,6 +3417,19 @@ move would be greater than that of a library call.",
         enum by_pieces_operation op, bool speed_p),
  default_use_by_pieces_infrastructure_p)
 
+/* Return true if a function must have and use a frame pointer.  */
+DEFHOOK
+(epilogue_aliases,
+ "This target hook should return @code{true} if the target needs\n"
+ "alias analysis to look into operations performed by epilogue instructions\n"
+ "*not* marked @code{RTX_FRAME_RELATED_P}.\n"
+ "This needs only be redefined by targets which use precise memory references\n"
+ "with specially assigned alias sets instead of a strong blockage to prevent\n"
+ "the move of register restores past the stack frame release.\n"
+ "Default return value is @code{false}.",
+ bool, (void),
+ hook_bool_void_false)
+
 DEFHOOK
 (optab_supported_p,
  "Return true if the optimizers should use optab @var{op} with\n\

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-04-15 17:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-23 17:42 rs6000 stack_tie mishap again David Edelsohn
2016-03-24  8:17 ` Alan Modra
2016-03-24 10:17   ` Olivier Hainque
  -- strict thread matches above, loose matches on Subject: below --
2016-03-23 16:24 Olivier Hainque
2016-03-24  7:51 ` Alan Modra
2016-03-24 10:32   ` Olivier Hainque
2016-03-24 17:00     ` Jeff Law
2016-03-28 19:58   ` Richard Henderson
2016-04-08  8:25     ` Olivier Hainque
2016-04-08 15:37       ` Segher Boessenkool
2016-04-08 16:01         ` Olivier Hainque
2016-04-11 10:15     ` Olivier Hainque
2016-04-14 15:47       ` Andreas Krebbel
2016-04-14 16:50         ` Jeff Law
2016-04-14 17:10           ` Olivier Hainque
2016-04-15  4:37             ` Andreas Krebbel
2016-04-15  7:43               ` Olivier Hainque
2016-04-15 16:42             ` Jeff Law
2016-04-15 17:05               ` Olivier Hainque
2016-04-15 17:26                 ` Segher Boessenkool
2016-04-14 22:42       ` Segher Boessenkool
2016-04-15 15:17         ` Olivier Hainque
2016-03-28  4:48 ` Segher Boessenkool
2016-03-28 11:23   ` Olivier Hainque
2016-03-28 12:44     ` Segher Boessenkool
2016-03-30  9:40       ` Olivier Hainque
2016-03-30 15:15         ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).