public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [i386] Recompute the frame layout less often
@ 2017-05-14  9:11 Bernd Edlinger
  2017-05-14  9:58 ` Daniel Santos
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-14  9:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 452 bytes --]

Hi,


this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386
backend to avoid re-computing the frame layout when not really
necessary.

It simplifies the logic in ix86_compute_frame_layout by removing
the use_fast_prologue_epilogue_nregs, which is no longer necessary,
because the frame layout can no longer change spontaneously.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-frame-layout-i386.txt --]
[-- Type: text/plain, Size: 804 bytes --]

2017-05-14  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/i386/i386.c (ix86_can_use_return_insn_p): Use the ix86_frame
	data structure directly.
	(ix86_initial_elimination_offset): Likewise.
	(ix86_expand_prologue): Likewise.
	(ix86_expand_epilogue): Likewise.
	(ix86_expand_split_stack_prologue): Likewise.
	(ix86_compute_frame_layout): Remove frame parameter ...
	(TARGET_COMPUTE_FRAME_LAYOUT): ... and export it as a target hook.
	(ix86_finalize_stack_realign_flags): Call ix86_compute_frame_layout
	only if necessary.
	(ix86_init_machine_status): Don't set use_fast_prologue_epilogue_nregs.
	(ix86_frame): Move from here ...
	* config/i386/i386.h (ix86_frame): ... to here.
	(machine_function): Remove use_fast_prologue_epilogue_nregs, cache the
	complete ix86_frame data structure instead.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout-i386.diff --]
[-- Type: text/x-patch; name="patch-frame-layout-i386.diff", Size: 9223 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 248005)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2441,53 +2441,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- regs_save_offset
-   [padding0]
-
-   [saved SSE regs]
-					<- sse_regs_save_offset
-   [padding1]          |
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2579,7 +2532,7 @@ static unsigned int ix86_function_arg_boundary (ma
 						const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -12007,7 +11960,7 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
     return 0;
 
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
 	  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12493,8 +12446,7 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
     return frame.hard_frame_pointer_offset;
@@ -12533,8 +12485,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -12570,19 +12523,11 @@ static void
      in doing anything except PUSHs.  */
   if (TARGET_SEH)
     cfun->machine->use_fast_prologue_epilogue = false;
-
-  /* During reload iteration the amount of registers saved can change.
-     Recompute the value as needed.  Do not recompute when amount of registers
-     didn't change as reload does multiple calls to the function and does not
-     expect the decision to change within single iteration.  */
-  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-           && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs)
+  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
     {
       int count = frame->nregs;
       struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-      cfun->machine->use_fast_prologue_epilogue_nregs = count;
-
       /* The fast prologue uses move instead of push to save registers.  This
          is significantly longer, but also executes faster as modern hardware
          can execute the moves in parallel, but can't do that for push/pop.
@@ -13701,6 +13646,7 @@ ix86_finalize_stack_realign_flags (void)
        < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
 	  ? crtl->max_used_stack_slot_alignment
 	  : crtl->stack_alignment_needed));
+  bool recompute_frame_layout_p = false;
 
   if (crtl->stack_realign_finalized)
     {
@@ -13750,8 +13696,12 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
+		if (crtl->stack_realign_needed != stack_realign)
+		  recompute_frame_layout_p = true;
 		crtl->stack_realign_needed = stack_realign;
 		crtl->stack_realign_finalized = true;
+		if (recompute_frame_layout_p)
+		  ix86_compute_frame_layout ();
 		return;
 	      }
 	}
@@ -13782,10 +13732,15 @@ ix86_finalize_stack_realign_flags (void)
       df_scan_blocks ();
       df_compute_regs_ever_live (true);
       df_analyze ();
+      recompute_frame_layout_p = true;
     }
 
+  if (crtl->stack_realign_needed != stack_realign)
+    recompute_frame_layout_p = true;
   crtl->stack_realign_needed = stack_realign;
   crtl->stack_realign_finalized = true;
+  if (recompute_frame_layout_p)
+    ix86_compute_frame_layout ();
 }
 
 /* Delete SET_GOT right after entry block if it is allocated to reg.  */
@@ -13841,7 +13796,7 @@ ix86_expand_prologue (void)
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
 
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
     {
@@ -14509,7 +14464,7 @@ ix86_expand_epilogue (int style)
   bool using_drap;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   m->fs.sp_valid = (!frame_pointer_needed
 		    || (crtl->sp_is_unchanging
@@ -15019,7 +14974,7 @@ ix86_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
@@ -28715,7 +28670,6 @@ ix86_init_machine_status (void)
   struct machine_function *f;
 
   f = ggc_cleared_alloc<machine_function> ();
-  f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
 
   return f;
@@ -52040,6 +51994,9 @@ ix86_run_selftests (void)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT ix86_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED ix86_frame_pointer_required
 
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 248005)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2449,6 +2449,53 @@ enum avx_u128_state
 /* Machine specific frame tracking during prologue/epilogue generation.  */
 
 #ifndef USED_FOR_TARGET
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- regs_save_offset
+   [padding0]
+
+   [saved SSE regs]
+					<- sse_regs_save_offset
+   [padding1]          |
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct GTY(()) ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
 struct GTY(()) machine_frame_state
 {
   /* This pair tracks the currently active CFA as reg+offset.  When reg
@@ -2507,9 +2554,8 @@ struct GTY(()) machine_function {
   int varargs_fpr_size;
   int optimize_mode_switching[MAX_386_ENTITIES];
 
-  /* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE
-     has been computed for.  */
-  int use_fast_prologue_epilogue_nregs;
+  /* Cached initial frame layout for the current function.  */
+  struct ix86_frame frame;
 
   /* For -fsplit-stack support: A stack local which holds a pointer to
      the stack arguments for a function with a variable number of

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-14  9:11 [PATCH] [i386] Recompute the frame layout less often Bernd Edlinger
@ 2017-05-14  9:58 ` Daniel Santos
  2017-05-14 10:27   ` Uros Bizjak
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Santos @ 2017-05-14  9:58 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches; +Cc: Uros Bizjak

On 05/14/2017 02:42 AM, Bernd Edlinger wrote:
> Hi,
>
>
> this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386
> backend to avoid re-computing the frame layout when not really
> necessary.
>
> It simplifies the logic in ix86_compute_frame_layout by removing
> the use_fast_prologue_epilogue_nregs, which is no longer necessary,
> because the frame layout can no longer change spontaneously.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.

I think Uros is about to commit my improvements to ms to sysv abi calls, 
which is a large change and will conflict with your patch. I've added 
several new fields to struct ix86_frame that will need to be merged (and 
moved to i386.h).  I believe that my only explicit check of 
crtl->stack_realign_finalized is during pro/epilogue expand, and not in 
ix86_compute_frame_layout.  A former incarnation of my patches needed 
ix86_compute_frame_layout to be called *after* it was set, but I believe 
that is no longer the case, and so shouldn't conflict, but retesting 
should certainly be done.

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01338.html

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-14  9:58 ` Daniel Santos
@ 2017-05-14 10:27   ` Uros Bizjak
  2017-05-14 18:11     ` Bernd Edlinger
  0 siblings, 1 reply; 29+ messages in thread
From: Uros Bizjak @ 2017-05-14 10:27 UTC (permalink / raw)
  To: Daniel Santos; +Cc: Bernd Edlinger, gcc-patches

On Sun, May 14, 2017 at 11:16 AM, Daniel Santos <daniel.santos@pobox.com> wrote:
> On 05/14/2017 02:42 AM, Bernd Edlinger wrote:
>>
>> Hi,
>>
>>
>> this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386
>> backend to avoid re-computing the frame layout when not really
>> necessary.
>>
>> It simplifies the logic in ix86_compute_frame_layout by removing
>> the use_fast_prologue_epilogue_nregs, which is no longer necessary,
>> because the frame layout can no longer change spontaneously.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>
>
> I think Uros is about to commit my improvements to ms to sysv abi calls,
> which is a large change and will conflict with your patch. I've added
> several new fields to struct ix86_frame that will need to be merged (and
> moved to i386.h).  I believe that my only explicit check of
> crtl->stack_realign_finalized is during pro/epilogue expand, and not in
> ix86_compute_frame_layout.  A former incarnation of my patches needed
> ix86_compute_frame_layout to be called *after* it was set, but I believe
> that is no longer the case, and so shouldn't conflict, but retesting should
> certainly be done.

Yes, the mcall-ms2sysv-xlogues patch was committed to mainline, please
re-test and re-send the patch.

Uros.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-14 10:27   ` Uros Bizjak
@ 2017-05-14 18:11     ` Bernd Edlinger
  2017-05-14 20:37       ` Daniel Santos
  2017-05-15  2:23       ` Daniel Santos
  0 siblings, 2 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-14 18:11 UTC (permalink / raw)
  To: Uros Bizjak, Daniel Santos; +Cc: gcc-patches

Hi Daniel,

there is one thing I don't understand in your patch:
That is, it introduces a static value:

/* Registers who's save & restore will be managed by stubs called from
    pro/epilogue.  */
static HARD_REG_SET GTY(()) stub_managed_regs;

This seems to be set as a side effect of ix86_compute_frame_layout,
and depends on the register usage of the current function.
But values that depend on the current function need usually be
attached to cfun->machine, because the passes can run in parallel
unless I am completely mistaken, and the stub_managed_regs may
therefore be computed from a different function.


Bernd.

On 05/14/17 12:25, Uros Bizjak wrote:
> On Sun, May 14, 2017 at 11:16 AM, Daniel Santos <daniel.santos@pobox.com> wrote:
>> On 05/14/2017 02:42 AM, Bernd Edlinger wrote:
>>>
>>> Hi,
>>>
>>>
>>> this patch uses the new TARGET_COMPUTE_FRAME_LAYOUT hook in the i386
>>> backend to avoid re-computing the frame layout when not really
>>> necessary.
>>>
>>> It simplifies the logic in ix86_compute_frame_layout by removing
>>> the use_fast_prologue_epilogue_nregs, which is no longer necessary,
>>> because the frame layout can no longer change spontaneously.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
>>
>>
>> I think Uros is about to commit my improvements to ms to sysv abi calls,
>> which is a large change and will conflict with your patch. I've added
>> several new fields to struct ix86_frame that will need to be merged (and
>> moved to i386.h).  I believe that my only explicit check of
>> crtl->stack_realign_finalized is during pro/epilogue expand, and not in
>> ix86_compute_frame_layout.  A former incarnation of my patches needed
>> ix86_compute_frame_layout to be called *after* it was set, but I believe
>> that is no longer the case, and so shouldn't conflict, but retesting should
>> certainly be done.
>
> Yes, the mcall-ms2sysv-xlogues patch was committed to mainline, please
> re-test and re-send the patch.
>
> Uros.
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-14 18:11     ` Bernd Edlinger
@ 2017-05-14 20:37       ` Daniel Santos
  2017-05-15  2:23       ` Daniel Santos
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-14 20:37 UTC (permalink / raw)
  To: Bernd Edlinger, Uros Bizjak; +Cc: gcc-patches

On 05/14/2017 11:31 AM, Bernd Edlinger wrote:
> Hi Daniel,
>
> there is one thing I don't understand in your patch:
> That is, it introduces a static value:
>
> /* Registers who's save & restore will be managed by stubs called from
>      pro/epilogue.  */
> static HARD_REG_SET GTY(()) stub_managed_regs;
>
> This seems to be set as a side effect of ix86_compute_frame_layout,
> and depends on the register usage of the current function.
> But values that depend on the current function need usually be
> attached to cfun->machine, because the passes can run in parallel
> unless I am completely mistaken, and the stub_managed_regs may
> therefore be computed from a different function.
>
>
> Bernd.

I'm relatively new to GCC and still learning.  However, there are quite 
a lot of static TU variables in i386.c like this.  I am not aware of gcc 
having parallelism support, but if it were to be added then all of these 
TU variables should probably be moved to some class or struct (like 
cfun->machine) to reduce the number of TLS lookups required (which I 
presume is a little more expensive than a this/offset calculation).  
Having this (as well as other variables) in such a struct is better 
design IMO, but as I said, I'm still learning GCC's architecture, idioms 
and patterns.  (I should add that I don't really understand the GTY 
memory management either. :)

To be clear on class xlogue_layout, the only instances of this class are 
const and could be shared across multiple threads.  It is dependent upon 
the cfun->machine as well as the global struct rtl_data crtl, but is not 
so entangled that were these proper C++ classes (with private data) that 
it would need to be a friend -- it only needs read-access to their data 
members.

To be honest, it's a strange feeling programming in a mixture of C and 
C++ idioms, but I know it was only recently converted to C++ so I think 
it's better to try to use only one or the other in a given function.  
But if I were going to do this all OO, then ix86_compute_frame_layout 
would be a member function of ix86_frame (which would be a 
specialization of some generic "frame" class), machine_function would be 
class ix86_machine_function with it's own compute_frame_layout that 
called ix86_frame::compute_frame_layout, etc.  If I really wanted to go 
nuts, I would consider making class function, et.al. template classes 
with machine_function and machine_function_state part of the object 
instead of pointers to separate objects to reduce accesses down to a 
single this/offset, but now I I'm *really* digressing...

Please free to move it.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-14 18:11     ` Bernd Edlinger
  2017-05-14 20:37       ` Daniel Santos
@ 2017-05-15  2:23       ` Daniel Santos
  2017-05-15 20:41         ` Bernd Edlinger
  2017-05-17 18:41         ` Bernd Edlinger
  1 sibling, 2 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-15  2:23 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches

On 05/14/2017 11:31 AM, Bernd Edlinger wrote:
> Hi Daniel,
>
> there is one thing I don't understand in your patch:
> That is, it introduces a static value:
>
> /* Registers who's save & restore will be managed by stubs called from
>      pro/epilogue.  */
> static HARD_REG_SET GTY(()) stub_managed_regs;
>
> This seems to be set as a side effect of ix86_compute_frame_layout,
> and depends on the register usage of the current function.
> But values that depend on the current function need usually be
> attached to cfun->machine, because the passes can run in parallel
> unless I am completely mistaken, and the stub_managed_regs may
> therefore be computed from a different function.
>
>
> Bernd.

I should add that if you want to run faster tests just on the ms to sysv 
abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if 
that succeeds run the full testsuite.

Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-15  2:23       ` Daniel Santos
@ 2017-05-15 20:41         ` Bernd Edlinger
  2017-05-15 23:52           ` Daniel Santos
  2017-05-17 18:41         ` Bernd Edlinger
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-15 20:41 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches

On 05/15/17 03:39, Daniel Santos wrote:
> On 05/14/2017 11:31 AM, Bernd Edlinger wrote:
>> Hi Daniel,
>>
>> there is one thing I don't understand in your patch:
>> That is, it introduces a static value:
>>
>> /* Registers who's save & restore will be managed by stubs called from
>>      pro/epilogue.  */
>> static HARD_REG_SET GTY(()) stub_managed_regs;
>>
>> This seems to be set as a side effect of ix86_compute_frame_layout,
>> and depends on the register usage of the current function.
>> But values that depend on the current function need usually be
>> attached to cfun->machine, because the passes can run in parallel
>> unless I am completely mistaken, and the stub_managed_regs may
>> therefore be computed from a different function.
>>
>>
>> Bernd.
>
> I should add that if you want to run faster tests just on the ms to sysv
> abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
> that succeeds run the full testsuite.
>
> Daniel

Unfortunately I encounter a serious problem when my patch is used
ontop of your patch, Yes, the test suite ran without error, but then
I tried to trigger the warning and that tripped an ICE.
The reason is that cfun->machine->call_ms2sysv can be set to true
*after* reload_completed, which can be seen using the following
patch:

Index: i386.c
===================================================================
--- i386.c	(revision 248031)
+++ i386.c	(working copy)
@@ -29320,7 +29320,10 @@

        /* Set here, but it may get cleared later.  */
        if (TARGET_CALL_MS2SYSV_XLOGUES)
+      {
+	gcc_assert(!reload_completed);
  	cfun->machine->call_ms2sysv = true;
+      }
      }

    if (vec_len > 1)


That assertion is triggered in this test case:

cat test.c
int test()
{
   __builtin_printf("test\n");
   return 0;
}

gcc -mabi=ms -mcall-ms2sysv-xlogues -fsplit-stack -c test.c
test.c: In function 'test':
test.c:5:1: internal compiler error: in ix86_expand_call, at 
config/i386/i386.c:29324
  }
  ^
0x13390a4 ix86_expand_call(rtx_def*, rtx_def*, rtx_def*, rtx_def*, 
rtx_def*, bool)
	../../gcc-trunk/gcc/config/i386/i386.c:29324
0x1317494 ix86_expand_split_stack_prologue()
	../../gcc-trunk/gcc/config/i386/i386.c:15920
0x162ba21 gen_split_stack_prologue()
	../../gcc-trunk/gcc/config/i386/i386.md:12556
0x12f3f30 target_gen_split_stack_prologue
	../../gcc-trunk/gcc/config/i386/i386.md:12325
0xb237b3 make_split_prologue_seq
	../../gcc-trunk/gcc/function.c:5822
0xb23a08 thread_prologue_and_epilogue_insns()
	../../gcc-trunk/gcc/function.c:5958
0xb24840 rest_of_handle_thread_prologue_and_epilogue
	../../gcc-trunk/gcc/function.c:6428
0xb248c0 execute
	../../gcc-trunk/gcc/function.c:6470
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


so, in ix86_expand_split_stack_prologue
we first call:
   ix86_finalize_stack_realign_flags ();
   ix86_compute_frame_layout (&frame);

and later:
   call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn),
                                 GEN_INT (UNITS_PER_WORD), constm1_rtx,
                                 pop, false);

which changes a flag with a huge impact on the frame layout, but there
is no absolutely no way how the frame layout can change once it is
finalized.


Any Thoughts?


Bernd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-15 20:41         ` Bernd Edlinger
@ 2017-05-15 23:52           ` Daniel Santos
  2017-05-16  5:42             ` Daniel Santos
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Santos @ 2017-05-15 23:52 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches

On 05/15/2017 03:39 PM, Bernd Edlinger wrote:
> On 05/15/17 03:39, Daniel Santos wrote:
>> On 05/14/2017 11:31 AM, Bernd Edlinger wrote:
>>> Hi Daniel,
>>>
>>> there is one thing I don't understand in your patch:
>>> That is, it introduces a static value:
>>>
>>> /* Registers who's save & restore will be managed by stubs called from
>>>       pro/epilogue.  */
>>> static HARD_REG_SET GTY(()) stub_managed_regs;
>>>
>>> This seems to be set as a side effect of ix86_compute_frame_layout,
>>> and depends on the register usage of the current function.
>>> But values that depend on the current function need usually be
>>> attached to cfun->machine, because the passes can run in parallel
>>> unless I am completely mistaken, and the stub_managed_regs may
>>> therefore be computed from a different function.
>>>
>>>
>>> Bernd.
>> I should add that if you want to run faster tests just on the ms to sysv
>> abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
>> that succeeds run the full testsuite.
>>
>> Daniel
> Unfortunately I encounter a serious problem when my patch is used
> ontop of your patch, Yes, the test suite ran without error, but then
> I tried to trigger the warning and that tripped an ICE.
> The reason is that cfun->machine->call_ms2sysv can be set to true
> *after* reload_completed, which can be seen using the following
> patch:
>
> Index: i386.c
> ===================================================================
> --- i386.c	(revision 248031)
> +++ i386.c	(working copy)
> @@ -29320,7 +29320,10 @@
>
>          /* Set here, but it may get cleared later.  */
>          if (TARGET_CALL_MS2SYSV_XLOGUES)
> +      {
> +	gcc_assert(!reload_completed);
>    	cfun->machine->call_ms2sysv = true;
> +      }
>        }
>
>      if (vec_len > 1)
>
>
> That assertion is triggered in this test case:
>
> cat test.c
> int test()
> {
>     __builtin_printf("test\n");
>     return 0;
> }
>
> gcc -mabi=ms -mcall-ms2sysv-xlogues -fsplit-stack -c test.c
> test.c: In function 'test':
> test.c:5:1: internal compiler error: in ix86_expand_call, at
> config/i386/i386.c:29324
>    }
>    ^
> 0x13390a4 ix86_expand_call(rtx_def*, rtx_def*, rtx_def*, rtx_def*,
> rtx_def*, bool)
> 	../../gcc-trunk/gcc/config/i386/i386.c:29324
> 0x1317494 ix86_expand_split_stack_prologue()
> 	../../gcc-trunk/gcc/config/i386/i386.c:15920
> 0x162ba21 gen_split_stack_prologue()
> 	../../gcc-trunk/gcc/config/i386/i386.md:12556
> 0x12f3f30 target_gen_split_stack_prologue
> 	../../gcc-trunk/gcc/config/i386/i386.md:12325
> 0xb237b3 make_split_prologue_seq
> 	../../gcc-trunk/gcc/function.c:5822
> 0xb23a08 thread_prologue_and_epilogue_insns()
> 	../../gcc-trunk/gcc/function.c:5958
> 0xb24840 rest_of_handle_thread_prologue_and_epilogue
> 	../../gcc-trunk/gcc/function.c:6428
> 0xb248c0 execute
> 	../../gcc-trunk/gcc/function.c:6470
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
> so, in ix86_expand_split_stack_prologue
> we first call:
>     ix86_finalize_stack_realign_flags ();
>     ix86_compute_frame_layout (&frame);
>
> and later:
>     call_insn = ix86_expand_call (NULL_RTX, gen_rtx_MEM (QImode, fn),
>                                   GEN_INT (UNITS_PER_WORD), constm1_rtx,
>                                   pop, false);
>
> which changes a flag with a huge impact on the frame layout, but there
> is no absolutely no way how the frame layout can change once it is
> finalized.
>
>
> Any Thoughts?
>
>
> Bernd.

Well, my intention was actually to punt on those cases, but I hadn't 
actually tested with -fsplit-stack.  It looks like 
ix86_expand_split_stack_prologue calls ix86_expand_call, and I hadn't 
anticipated it getting called after the last call to 
ix86_compute_frame_layout(), which your patch has probably eliminated.  
In the case of -fsplit-stack, I'm testing the macro flag_split_stack 
which (currently) just expands to check the global flag, so this could 
instead be done in ix86_option_override_internal () instead, but I think 
it highlights a somewhat deeper problem.

Rather or not m->call_ms2sysv is set determines which stack layout is 
used when ix86_compute_frame_layout() runs.  But if we can run 
expand_call after the final time ix86_compute_frame_layout() then we 
have a problem.  It looks like ix86_expand_split_stack_prologue is the 
only function that manually calls ix86_expand_call, but maybe it would 
be better to modify the test to something like this:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a78819d6b3f..c36383f6962 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
         }
  
        /* Set here, but it may get cleared later.  */
-      if (TARGET_CALL_MS2SYSV_XLOGUES)
+      if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed)
         cfun->machine->call_ms2sysv = true;
      }
  

Or even use the same incompatibility tests from 
ix86_compute_frame_layout (and possibly move them to a static).

But I didn't get an ICE when using a build from a few days ago. I've 
updated to the current HEAD but I'm having problems with the test 
failing even though there's no errors, as if expect has some finite 
buffer for all of the warning spam and it kills the job.  I'm running 
another test to try and figure that out, but I have to run, so I'll get 
back with you on the results.

Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-15 23:52           ` Daniel Santos
@ 2017-05-16  5:42             ` Daniel Santos
  2017-05-16  8:52               ` Bernd Edlinger
  2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
  0 siblings, 2 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-16  5:42 UTC (permalink / raw)
  To: Bernd Edlinger, Ian Lance Taylor; +Cc: gcc-patches

Ian, would you mind looking at this please?  A combination of my 
-mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when 
ix86_expand_split_stack_prologue() calls ix86_expand_call().

On 05/15/2017 06:46 PM, Daniel Santos wrote:
> Rather or not m->call_ms2sysv is set determines which stack layout is 
> used when ix86_compute_frame_layout() runs. But if we can run 
> expand_call after the final time ix86_compute_frame_layout() then we 
> have a problem.  It looks like ix86_expand_split_stack_prologue is the 
> only function that manually calls ix86_expand_call, but maybe it would 
> be better to modify the test to something like this:
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index a78819d6b3f..c36383f6962 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -29325,7 +29325,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> callarg1,
>         }
>
>        /* Set here, but it may get cleared later.  */
> -      if (TARGET_CALL_MS2SYSV_XLOGUES)
> +      if (TARGET_CALL_MS2SYSV_XLOGUES && !reload_completed)
>         cfun->machine->call_ms2sysv = true;
>      }
>
>

Actually, I think this is wrong.  I happened to recall looking at the 
morestack code last year and remembered that it was all assembly.  I 
looked at it again and I don't see that it calls anything outside of 
it's implementation file (libgcc/config/i386/morestack.S) except for 
_Unwind_Resume and the calling function its self (I think it calls its 
caller).  It saves and restores rsi and rdi and doesn't use any sse 
registers, so it doesn't need to clobber all of the regs in the 
x86_64_ms_sysv_extra_clobbered_registers array.  I'm guessing that this 
should have it's own pattern instead of calling ix86_expand_call in the 
first place.

Of course, I'm the new guy here, so please enlighten me if I'm wrong.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16  5:42             ` Daniel Santos
@ 2017-05-16  8:52               ` Bernd Edlinger
  2017-05-16 15:11                 ` Daniel Santos
  2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-16  8:52 UTC (permalink / raw)
  To: Daniel Santos, Ian Lance Taylor; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On 05/16/2017 07:00, Daniel Santos wrote:
>
> Ian, would you mind looking at this please?  A combination of my
> -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when
> ix86_expand_split_stack_prologue() calls ix86_expand_call().

Here is what I am testing currently.
I fixed the test case that crashed before, and it bootstraps and reg-tests cleanly.

It would be good to have test cases for each of the not-supported warnings that
can happen, so far I only managed to get a test case for -fsplit-stack.

But I am not sure if a similar problem exists with ix86_static_chain_on_stack.
It is not obvious if that can change after reload_completed or not.



Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-frame-layout-i386.diff --]
[-- Type: text/x-patch; name="patch-frame-layout-i386.diff", Size: 22263 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 248031)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+		 [NUM_X86_64_MS_CLOBBERED_REGS] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -2484,13 +2486,13 @@ class xlogue_layout {
      needs to store registers based upon data in the machine_function.  */
   HOST_WIDE_INT get_stack_space_used () const
   {
-    const struct machine_function &m = *cfun->machine;
-    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+    const struct machine_function *m = cfun->machine;
+    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
-    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
     return m_regs[last_reg].offset
-	    + (m.call_ms2sysv_pad_out ? 8 : 0)
-	    + STUB_INDEX_OFFSET;
+	   + (m->call_ms2sysv_pad_out ? 8 : 0)
+	   + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -2532,7 +2534,7 @@ class xlogue_layout {
   /* Lazy-inited cache of symbol names for stubs.  */
   char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
 
-  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+  static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT];
 };
 
 const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
@@ -2573,7 +2575,7 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
 };
 
 /* Instantiates all xlogue_layout instances.  */
-const struct xlogue_layout GTY(())
+const struct GTY(()) xlogue_layout
 xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
   xlogue_layout (0, false),
   xlogue_layout (8, false),
@@ -2583,7 +2585,8 @@ xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
 
 /* Return an appropriate const instance of xlogue_layout based upon values
    in cfun->machine and crtl.  */
-const struct xlogue_layout &xlogue_layout::get_instance ()
+const struct xlogue_layout &
+xlogue_layout::get_instance ()
 {
   enum xlogue_stub_sets stub_set;
   bool aligned_plus_8 = cfun->machine->call_ms2sysv_pad_in;
@@ -2607,10 +2610,11 @@ unsigned
 xlogue_layout::compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs)
 {
   bool hfp = frame_pointer_needed || stack_realign_fp;
-
   unsigned i, count;
   unsigned regno;
 
+  CLEAR_HARD_REG_SET (stub_managed_regs);
+
   for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; ++i)
     {
       regno = x86_64_ms_sysv_extra_clobbered_registers[i];
@@ -2630,8 +2634,8 @@ xlogue_layout::compute_stub_managed_regs (HARD_REG
       add_to_hard_reg_set (&stub_managed_regs, DImode, regno);
       ++count;
     }
-    gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
-    return count;
+  gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
+  return count;
 }
 
 /* Constructor for xlogue_layout.  */
@@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
   : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
     m_stack_align_off_in (stack_align_off_in)
 {
+  HOST_WIDE_INT offset = stack_align_off_in;
+  unsigned i, j;
+
   memset (m_regs, 0, sizeof (m_regs));
   memset (m_stub_names, 0, sizeof (m_stub_names));
-
-  HOST_WIDE_INT offset = stack_align_off_in;
-  unsigned i, j;
   for (i = j = 0; i < MAX_REGS; ++i)
     {
       unsigned regno = REG_ORDER[i];
@@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
       m_regs[j].regno    = regno;
       m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
     }
-    gcc_assert (j == m_nregs);
+  gcc_assert (j == m_nregs);
 }
 
-const char *xlogue_layout::get_stub_name (enum xlogue_stub stub,
-					  unsigned n_extra_regs) const
+const char *
+xlogue_layout::get_stub_name (enum xlogue_stub stub,
+			      unsigned n_extra_regs) const
 {
   xlogue_layout *writey_this = const_cast<xlogue_layout*>(this);
   char *name = writey_this->m_stub_names[stub][n_extra_regs];
@@ -2676,7 +2681,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
     {
       int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
 			  STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
-      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
+      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
     }
 
   return name;
@@ -2684,7 +2689,8 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
 
 /* Return rtx of a symbol ref for the entry point (based upon
    cfun->machine->call_ms2sysv_extra_regs) of the specified stub.  */
-rtx xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
+rtx
+xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
 {
   const unsigned n_extra_regs = cfun->machine->call_ms2sysv_extra_regs;
   gcc_checking_assert (n_extra_regs <= MAX_EXTRA_REGS);
@@ -2703,73 +2709,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- reg_save_offset
-   [padding0]
-					<- stack_realign_offset
-   [saved SSE regs]
-	OR
-   [stub-saved registers for ms x64 --> sysv clobbers
-			<- Start of out-of-line, stub-saved/restored regs
-			   (see libgcc/config/i386/(sav|res)ms64*.S)
-     [XMM6-15]
-     [RSI]
-     [RDI]
-     [?RBX]		only if RBX is clobbered
-     [?RBP]		only if RBP and RBX are clobbered
-     [?R12]		only if R12 and all previous regs are clobbered
-     [?R13]		only if R13 and all previous regs are clobbered
-     [?R14]		only if R14 and all previous regs are clobbered
-     [?R15]		only if R15 and all previous regs are clobbered
-			<- end of stub-saved/restored regs
-     [padding1]
-   ]
-					<- outlined_save_offset
-					<- sse_regs_save_offset
-   [padding2]
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
-  HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2861,7 +2800,7 @@ static unsigned int ix86_function_arg_boundary (ma
 						const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -12293,7 +12232,7 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
     return 0;
 
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
 	  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12634,10 +12573,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
-/* Registers who's save & restore will be managed by stubs called from
-   pro/epilogue.  */
-static HARD_REG_SET GTY(()) stub_managed_regs;
-
 /* Return true if register class CL should be an additional allocno
    class.  */
 
@@ -12718,10 +12653,17 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_r
 	}
     }
 
-  if (ignore_outlined && cfun->machine->call_ms2sysv
-      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
-    return false;
+  if (ignore_outlined && cfun->machine->call_ms2sysv)
+    {
+      /* Registers who's save & restore will be managed by stubs called from
+	 pro/epilogue.  */
+      HARD_REG_SET stub_managed_regs;
+      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
 
+      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+	return false;
+    }
+
   if (crtl->drap_reg
       && regno == REGNO (crtl->drap_reg)
       && !cfun->machine->no_drap_save_restore)
@@ -12787,8 +12729,7 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
     return frame.hard_frame_pointer_offset;
@@ -12818,13 +12759,16 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
-/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
-   warn_once is null, or *warn_once is zero.  */
-static void disable_call_ms2sysv_xlogues (const char *feature)
+/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
+static void warn_once_call_ms2sysv_xlogues (const char *feature)
 {
-  cfun->machine->call_ms2sysv = false;
-  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
-	   feature);
+  static bool warned_once = false;
+  if (!warned_once)
+    {
+      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
+	       feature);
+      warned_once = true;
+    }
 }
 
 /* When using -fsplit-stack, the allocation routines set a field in
@@ -12836,8 +12780,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
@@ -12845,46 +12790,40 @@ static void
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
-  CLEAR_HARD_REG_SET (stub_managed_regs);
-
   /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit
    * ms_abi functions that call a sysv function.  We now need to prune away
    * cases where it should be disabled.  */
   if (TARGET_64BIT && m->call_ms2sysv)
-  {
-    gcc_assert (TARGET_64BIT_MS_ABI);
-    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
-    gcc_assert (!TARGET_SEH);
+    {
+      gcc_assert (TARGET_64BIT_MS_ABI);
+      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
+      gcc_assert (!TARGET_SEH);
+      gcc_assert (TARGET_SSE);
 
-    if (!TARGET_SSE)
-      m->call_ms2sysv = false;
+      if (crtl->calls_eh_return)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
+	}
 
-    /* Don't break hot-patched functions.  */
-    else if (ix86_function_ms_hook_prologue (current_function_decl))
-      m->call_ms2sysv = false;
+      else if (ix86_static_chain_on_stack)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("static call chains");
+	}
 
-    /* TODO: Cases not yet examined.  */
-    else if (crtl->calls_eh_return)
-      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
+      /* Finally, compute which registers the stub will manage.  */
+      else
+        {
+	  HARD_REG_SET stub_managed_regs;
+	  unsigned count = xlogue_layout
+			   ::compute_stub_managed_regs (stub_managed_regs);
+	  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+	}
+    }
 
-    else if (ix86_static_chain_on_stack)
-      disable_call_ms2sysv_xlogues ("static call chains");
-
-    else if (ix86_using_red_zone ())
-      disable_call_ms2sysv_xlogues ("red zones");
-
-    else if (flag_split_stack)
-      disable_call_ms2sysv_xlogues ("split stack");
-
-    /* Finally, compute which registers the stub will manage.  */
-    else
-      {
-	unsigned count = xlogue_layout
-			 ::compute_stub_managed_regs (stub_managed_regs);
-	m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
-      }
-  }
-
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
   m->call_ms2sysv_pad_in = 0;
@@ -12916,19 +12855,11 @@ static void
      in doing anything except PUSHs.  */
   if (TARGET_SEH)
     m->use_fast_prologue_epilogue = false;
-
-  /* During reload iteration the amount of registers saved can change.
-     Recompute the value as needed.  Do not recompute when amount of registers
-     didn't change as reload does multiple calls to the function and does not
-     expect the decision to change within single iteration.  */
-  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	   && m->use_fast_prologue_epilogue_nregs != frame->nregs)
+  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
     {
       int count = frame->nregs;
       struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-      m->use_fast_prologue_epilogue_nregs = count;
-
       /* The fast prologue uses move instead of push to save registers.  This
          is significantly longer, but also executes faster as modern hardware
          can execute the moves in parallel, but can't do that for push/pop.
@@ -13145,7 +13076,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the stack pointer is valid for accessing the cfa_offset.  */
 
-static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.sp_valid && !(fs.sp_realigned
@@ -13154,7 +13086,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the frame pointer is valid for accessing the cfa_offset.  */
 
-static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
@@ -13164,9 +13097,10 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 /* Choose a base register based upon alignment requested, speed and/or
    size.  */
 
-static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
-			    HOST_WIDE_INT &base_offset,
-			    unsigned int align_reqested, unsigned int *align)
+static void
+choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+		HOST_WIDE_INT &base_offset,
+		unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
   unsigned int hfp_align;
@@ -14159,6 +14093,7 @@ ix86_finalize_stack_realign_flags (void)
        < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
 	  ? crtl->max_used_stack_slot_alignment
 	  : crtl->stack_alignment_needed));
+  bool recompute_frame_layout_p = false;
 
   if (crtl->stack_realign_finalized)
     {
@@ -14208,8 +14143,12 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
+		if (crtl->stack_realign_needed != stack_realign)
+		  recompute_frame_layout_p = true;
 		crtl->stack_realign_needed = stack_realign;
 		crtl->stack_realign_finalized = true;
+		if (recompute_frame_layout_p)
+		  ix86_compute_frame_layout ();
 		return;
 	      }
 	}
@@ -14240,10 +14179,15 @@ ix86_finalize_stack_realign_flags (void)
       df_scan_blocks ();
       df_compute_regs_ever_live (true);
       df_analyze ();
+      recompute_frame_layout_p = true;
     }
 
+  if (crtl->stack_realign_needed != stack_realign)
+    recompute_frame_layout_p = true;
   crtl->stack_realign_needed = stack_realign;
   crtl->stack_realign_finalized = true;
+  if (recompute_frame_layout_p)
+    ix86_compute_frame_layout ();
 }
 
 /* Delete SET_GOT right after entry block if it is allocated to reg.  */
@@ -14372,7 +14316,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
   m->fs.sp_realigned = false;
 
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
     {
@@ -15212,7 +15156,7 @@ ix86_expand_epilogue (int style)
   bool restore_stub_is_tail = false;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
   m->fs.sp_valid = stack_realign_fp
@@ -15757,7 +15701,7 @@ ix86_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
@@ -29320,7 +29264,27 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 
       /* Set here, but it may get cleared later.  */
       if (TARGET_CALL_MS2SYSV_XLOGUES)
-	cfun->machine->call_ms2sysv = true;
+	{
+	  if (!TARGET_SSE)
+	    ;
+
+	  /* Don't break hot-patched functions.  */
+	  else if (ix86_function_ms_hook_prologue (current_function_decl))
+	    ;
+
+	  /* TODO: Cases not yet examined.  */
+	  else if (ix86_using_red_zone ())
+	    warn_once_call_ms2sysv_xlogues ("red zones");
+
+	  else if (flag_split_stack)
+	    warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
+
+	  else
+	    {
+	      gcc_assert (!reload_completed);
+	      cfun->machine->call_ms2sysv = true;
+	    }
+	}
     }
 
   if (vec_len > 1)
@@ -29455,7 +29419,6 @@ ix86_init_machine_status (void)
   struct machine_function *f;
 
   f = ggc_cleared_alloc<machine_function> ();
-  f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
 
   return f;
@@ -52828,6 +52791,9 @@ ix86_run_selftests (void)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT ix86_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED ix86_frame_pointer_required
 
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 248031)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2163,10 +2163,6 @@ extern int const dbx_register_map[FIRST_PSEUDO_REG
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
-#define NUM_X86_64_MS_CLOBBERED_REGS \
-  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
-
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
   gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
@@ -2448,9 +2444,76 @@ enum avx_u128_state
 \f
 #define FASTCALL_PREFIX '@'
 \f
+#ifndef USED_FOR_TARGET
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- reg_save_offset
+   [padding0]
+					<- stack_realign_offset
+   [saved SSE regs]
+	OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+			<- Start of out-of-line, stub-saved/restored regs
+			   (see libgcc/config/i386/(sav|res)ms64*.S)
+     [XMM6-15]
+     [RSI]
+     [RDI]
+     [?RBX]		only if RBX is clobbered
+     [?RBP]		only if RBP and RBX are clobbered
+     [?R12]		only if R12 and all previous regs are clobbered
+     [?R13]		only if R13 and all previous regs are clobbered
+     [?R14]		only if R14 and all previous regs are clobbered
+     [?R15]		only if R15 and all previous regs are clobbered
+			<- end of stub-saved/restored regs
+     [padding1]
+   ]
+					<- outlined_save_offset
+					<- sse_regs_save_offset
+   [padding2]
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct GTY(()) ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
 /* Machine specific frame tracking during prologue/epilogue generation.  */
 
-#ifndef USED_FOR_TARGET
 struct GTY(()) machine_frame_state
 {
   /* This pair tracks the currently active CFA as reg+offset.  When reg
@@ -2520,9 +2583,8 @@ struct GTY(()) machine_function {
   int varargs_fpr_size;
   int optimize_mode_switching[MAX_386_ENTITIES];
 
-  /* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE
-     has been computed for.  */
-  int use_fast_prologue_epilogue_nregs;
+  /* Cached initial frame layout for the current function.  */
+  struct ix86_frame frame;
 
   /* For -fsplit-stack support: A stack local which holds a pointer to
      the stack arguments for a function with a variable number of

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16  8:52               ` Bernd Edlinger
@ 2017-05-16 15:11                 ` Daniel Santos
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-16 15:11 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Ian Lance Taylor, gcc-patches

On 05/16/2017 03:34 AM, Bernd Edlinger wrote:
> It would be good to have test cases for each of the not-supported warnings that
> can happen, so far I only managed to get a test case for -fsplit-stack.

Yes, I'm inclined to agree.  I'll try to get this done today or 
tomorrow.  I've also put in a limiter of one warning per TU.  One 
problem is that there isn't a way to disable the warning, so I may want 
to add that.

Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16  5:42             ` Daniel Santos
  2017-05-16  8:52               ` Bernd Edlinger
@ 2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
  2017-05-16 20:10                 ` Bernd Edlinger
  2017-05-16 21:35                 ` Daniel Santos
  1 sibling, 2 replies; 29+ messages in thread
From: Ian Lance Taylor via gcc-patches @ 2017-05-16 17:38 UTC (permalink / raw)
  To: Daniel Santos; +Cc: Bernd Edlinger, gcc-patches

On Mon, May 15, 2017 at 10:00 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
>
> Ian, would you mind looking at this please?  A combination of my
> -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when
> ix86_expand_split_stack_prologue() calls ix86_expand_call().

I don't have a lot of context here.  I assume that ms2sysv is going to
be used on Windows systems, where -fsplit-stack isn't really going to
work anyhow, so I think it would probably be OK that reject that
combination if it causes trouble.

Also, it's overkill for ix86_expand_split_stack_prologue to call
ix86_expand_call.  The call is always to __morestack, and __morestack
is written in assembler, so we could use a simpler version of
ix86_expand_call if that helps.  In particular we can decide that
__morestack doesn't clobber any unusual registers, if that is what is
causing the problem.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
@ 2017-05-16 20:10                 ` Bernd Edlinger
  2017-05-16 21:05                   ` Bernd Edlinger
  2017-05-17  2:12                   ` Daniel Santos
  2017-05-16 21:35                 ` Daniel Santos
  1 sibling, 2 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-16 20:10 UTC (permalink / raw)
  To: Ian Lance Taylor, Daniel Santos; +Cc: gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]

On 05/16/17 19:19, Ian Lance Taylor wrote:
> On Mon, May 15, 2017 at 10:00 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
>>
>> Ian, would you mind looking at this please?  A combination of my
>> -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when
>> ix86_expand_split_stack_prologue() calls ix86_expand_call().
>
> I don't have a lot of context here.  I assume that ms2sysv is going to
> be used on Windows systems, where -fsplit-stack isn't really going to
> work anyhow, so I think it would probably be OK that reject that
> combination if it causes trouble.
>
> Also, it's overkill for ix86_expand_split_stack_prologue to call
> ix86_expand_call.  The call is always to __morestack, and __morestack
> is written in assembler, so we could use a simpler version of
> ix86_expand_call if that helps.  In particular we can decide that
> __morestack doesn't clobber any unusual registers, if that is what is
> causing the problem.
>

I think I solved the problem with -fsplit-stack, I am not sure
if ix86_static_chain_on_stack might change after reload due to
final.c possibly calling targetm.calls.static_chain, but if that
is the case, that is an already pre-existing problem.

The goal of this patch is to make all decisions regarding the
frame layout before the reload pass, and to make sure that
the frame layout does not change unexpectedly it asserts
that the data that goes into the decision does not change
after reload_completed.

With the attached patch -fsplit-stack and the attribute ms_hook_prologue
is handed directly at the ix86_expand_call, because that data is
already known before expansion.

The calls_eh_return and ix86_static_chain_on_stack may become
known at a later time, but after reload it should not change any more.
To be sure, I added an assertion at ix86_static_chain, which the
regression test did not trigger, neither with -m64 nor with -m32.

I have bootstrapped the patch several times, and a few times I
encounterd a segfault in the garbage collection, but it did not
happen every time.  Currently I think that is unrelated to this patch.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-frame-layout-i386.txt --]
[-- Type: text/plain, Size: 1520 bytes --]

2017-05-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/i386/i386.c (x86_64_ms_sysv_extra_clobbered_registers): Make
	static.
	(xlogue_layout::get_stack_space_used, xlogue_layout::s_instances,
	xlogue_layout::get_instance, logue_layout::xlogue_layout,
	xlogue_layout::get_stub_name, xlogue_layout::get_stub_rtx,
	sp_valid_at, fp_valid_at, choose_basereg): Formatting.
	(xlogue_layout::compute_stub_managed_regs): Clear out param first.
	(stub_managed_regs): Remove.
	(ix86_save_reg): Use xlogue_layout::compute_stub_managed_regs.
	(disable_call_ms2sysv_xlogues): Rename to...
	(warn_once_call_ms2sysv_xlogues): ...this, and warn only once.
	(ix86_initial_elimination_offset, ix86_expand_call): Fix call_ms2sysv
	warning logic.
	(ix86_static_chain): Make sure that ix86_static_chain_on_stack can't
	change after reload_completed.
	(ix86_can_use_return_insn_p): Use the ix86_frame data structure
	directly.
	(ix86_expand_prologue): Likewise.
	(ix86_expand_epilogue): Likewise.
	(ix86_expand_split_stack_prologue): Likewise.
	(ix86_compute_frame_layout): Remove frame parameter ...
	(TARGET_COMPUTE_FRAME_LAYOUT): ... and export it as a target hook.
	(ix86_finalize_stack_realign_flags): Call ix86_compute_frame_layout
	only if necessary.
	(ix86_init_machine_status): Don't set use_fast_prologue_epilogue_nregs.
	(ix86_frame): Move from here ...
	* config/i386/i386.h (ix86_frame): ... to here.
	(machine_function): Remove use_fast_prologue_epilogue_nregs, cache the
	complete ix86_frame data structure instead.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout-i386.diff --]
[-- Type: text/x-patch; name="patch-frame-layout-i386.diff", Size: 23251 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 248031)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+		 [NUM_X86_64_MS_CLOBBERED_REGS] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -2484,13 +2486,13 @@ class xlogue_layout {
      needs to store registers based upon data in the machine_function.  */
   HOST_WIDE_INT get_stack_space_used () const
   {
-    const struct machine_function &m = *cfun->machine;
-    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+    const struct machine_function *m = cfun->machine;
+    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
-    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
     return m_regs[last_reg].offset
-	    + (m.call_ms2sysv_pad_out ? 8 : 0)
-	    + STUB_INDEX_OFFSET;
+	   + (m->call_ms2sysv_pad_out ? 8 : 0)
+	   + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -2532,7 +2534,7 @@ class xlogue_layout {
   /* Lazy-inited cache of symbol names for stubs.  */
   char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
 
-  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+  static const struct GTY(()) xlogue_layout s_instances[XLOGUE_SET_COUNT];
 };
 
 const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
@@ -2573,7 +2575,7 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
 };
 
 /* Instantiates all xlogue_layout instances.  */
-const struct xlogue_layout GTY(())
+const struct GTY(()) xlogue_layout
 xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
   xlogue_layout (0, false),
   xlogue_layout (8, false),
@@ -2583,7 +2585,8 @@ xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
 
 /* Return an appropriate const instance of xlogue_layout based upon values
    in cfun->machine and crtl.  */
-const struct xlogue_layout &xlogue_layout::get_instance ()
+const struct xlogue_layout &
+xlogue_layout::get_instance ()
 {
   enum xlogue_stub_sets stub_set;
   bool aligned_plus_8 = cfun->machine->call_ms2sysv_pad_in;
@@ -2607,10 +2610,11 @@ unsigned
 xlogue_layout::compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs)
 {
   bool hfp = frame_pointer_needed || stack_realign_fp;
-
   unsigned i, count;
   unsigned regno;
 
+  CLEAR_HARD_REG_SET (stub_managed_regs);
+
   for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; ++i)
     {
       regno = x86_64_ms_sysv_extra_clobbered_registers[i];
@@ -2630,8 +2634,8 @@ xlogue_layout::compute_stub_managed_regs (HARD_REG
       add_to_hard_reg_set (&stub_managed_regs, DImode, regno);
       ++count;
     }
-    gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
-    return count;
+  gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
+  return count;
 }
 
 /* Constructor for xlogue_layout.  */
@@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
   : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
     m_stack_align_off_in (stack_align_off_in)
 {
+  HOST_WIDE_INT offset = stack_align_off_in;
+  unsigned i, j;
+
   memset (m_regs, 0, sizeof (m_regs));
   memset (m_stub_names, 0, sizeof (m_stub_names));
-
-  HOST_WIDE_INT offset = stack_align_off_in;
-  unsigned i, j;
   for (i = j = 0; i < MAX_REGS; ++i)
     {
       unsigned regno = REG_ORDER[i];
@@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
       m_regs[j].regno    = regno;
       m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
     }
-    gcc_assert (j == m_nregs);
+  gcc_assert (j == m_nregs);
 }
 
-const char *xlogue_layout::get_stub_name (enum xlogue_stub stub,
-					  unsigned n_extra_regs) const
+const char *
+xlogue_layout::get_stub_name (enum xlogue_stub stub,
+			      unsigned n_extra_regs) const
 {
   xlogue_layout *writey_this = const_cast<xlogue_layout*>(this);
   char *name = writey_this->m_stub_names[stub][n_extra_regs];
@@ -2676,7 +2681,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
     {
       int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
 			  STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
-      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
+      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
     }
 
   return name;
@@ -2684,7 +2689,8 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
 
 /* Return rtx of a symbol ref for the entry point (based upon
    cfun->machine->call_ms2sysv_extra_regs) of the specified stub.  */
-rtx xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
+rtx
+xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
 {
   const unsigned n_extra_regs = cfun->machine->call_ms2sysv_extra_regs;
   gcc_checking_assert (n_extra_regs <= MAX_EXTRA_REGS);
@@ -2703,73 +2709,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- reg_save_offset
-   [padding0]
-					<- stack_realign_offset
-   [saved SSE regs]
-	OR
-   [stub-saved registers for ms x64 --> sysv clobbers
-			<- Start of out-of-line, stub-saved/restored regs
-			   (see libgcc/config/i386/(sav|res)ms64*.S)
-     [XMM6-15]
-     [RSI]
-     [RDI]
-     [?RBX]		only if RBX is clobbered
-     [?RBP]		only if RBP and RBX are clobbered
-     [?R12]		only if R12 and all previous regs are clobbered
-     [?R13]		only if R13 and all previous regs are clobbered
-     [?R14]		only if R14 and all previous regs are clobbered
-     [?R15]		only if R15 and all previous regs are clobbered
-			<- end of stub-saved/restored regs
-     [padding1]
-   ]
-					<- outlined_save_offset
-					<- sse_regs_save_offset
-   [padding2]
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
-  HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2861,7 +2800,7 @@ static unsigned int ix86_function_arg_boundary (ma
 						const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -12293,7 +12232,7 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
     return 0;
 
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
 	  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12634,10 +12573,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
-/* Registers who's save & restore will be managed by stubs called from
-   pro/epilogue.  */
-static HARD_REG_SET GTY(()) stub_managed_regs;
-
 /* Return true if register class CL should be an additional allocno
    class.  */
 
@@ -12718,10 +12653,17 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_r
 	}
     }
 
-  if (ignore_outlined && cfun->machine->call_ms2sysv
-      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
-    return false;
+  if (ignore_outlined && cfun->machine->call_ms2sysv)
+    {
+      /* Registers who's save & restore will be managed by stubs called from
+	 pro/epilogue.  */
+      HARD_REG_SET stub_managed_regs;
+      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
 
+      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+	return false;
+    }
+
   if (crtl->drap_reg
       && regno == REGNO (crtl->drap_reg)
       && !cfun->machine->no_drap_save_restore)
@@ -12787,8 +12729,7 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
     return frame.hard_frame_pointer_offset;
@@ -12818,13 +12759,16 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
-/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
-   warn_once is null, or *warn_once is zero.  */
-static void disable_call_ms2sysv_xlogues (const char *feature)
+/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
+static void warn_once_call_ms2sysv_xlogues (const char *feature)
 {
-  cfun->machine->call_ms2sysv = false;
-  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
-	   feature);
+  static bool warned_once = false;
+  if (!warned_once)
+    {
+      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
+	       feature);
+      warned_once = true;
+    }
 }
 
 /* When using -fsplit-stack, the allocation routines set a field in
@@ -12836,8 +12780,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
@@ -12845,46 +12790,41 @@ static void
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
-  CLEAR_HARD_REG_SET (stub_managed_regs);
-
   /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit
    * ms_abi functions that call a sysv function.  We now need to prune away
    * cases where it should be disabled.  */
   if (TARGET_64BIT && m->call_ms2sysv)
-  {
-    gcc_assert (TARGET_64BIT_MS_ABI);
-    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
-    gcc_assert (!TARGET_SEH);
+    {
+      gcc_assert (TARGET_64BIT_MS_ABI);
+      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
+      gcc_assert (!TARGET_SEH);
+      gcc_assert (TARGET_SSE);
+      gcc_assert (!ix86_using_red_zone ());
 
-    if (!TARGET_SSE)
-      m->call_ms2sysv = false;
+      if (crtl->calls_eh_return)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
+	}
 
-    /* Don't break hot-patched functions.  */
-    else if (ix86_function_ms_hook_prologue (current_function_decl))
-      m->call_ms2sysv = false;
+      else if (ix86_static_chain_on_stack)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("static call chains");
+	}
 
-    /* TODO: Cases not yet examined.  */
-    else if (crtl->calls_eh_return)
-      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
+      /* Finally, compute which registers the stub will manage.  */
+      else
+        {
+	  HARD_REG_SET stub_managed_regs;
+	  unsigned count = xlogue_layout
+			   ::compute_stub_managed_regs (stub_managed_regs);
+	  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+	}
+    }
 
-    else if (ix86_static_chain_on_stack)
-      disable_call_ms2sysv_xlogues ("static call chains");
-
-    else if (ix86_using_red_zone ())
-      disable_call_ms2sysv_xlogues ("red zones");
-
-    else if (flag_split_stack)
-      disable_call_ms2sysv_xlogues ("split stack");
-
-    /* Finally, compute which registers the stub will manage.  */
-    else
-      {
-	unsigned count = xlogue_layout
-			 ::compute_stub_managed_regs (stub_managed_regs);
-	m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
-      }
-  }
-
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
   m->call_ms2sysv_pad_in = 0;
@@ -12916,19 +12856,11 @@ static void
      in doing anything except PUSHs.  */
   if (TARGET_SEH)
     m->use_fast_prologue_epilogue = false;
-
-  /* During reload iteration the amount of registers saved can change.
-     Recompute the value as needed.  Do not recompute when amount of registers
-     didn't change as reload does multiple calls to the function and does not
-     expect the decision to change within single iteration.  */
-  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	   && m->use_fast_prologue_epilogue_nregs != frame->nregs)
+  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
     {
       int count = frame->nregs;
       struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-      m->use_fast_prologue_epilogue_nregs = count;
-
       /* The fast prologue uses move instead of push to save registers.  This
          is significantly longer, but also executes faster as modern hardware
          can execute the moves in parallel, but can't do that for push/pop.
@@ -13145,7 +13077,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the stack pointer is valid for accessing the cfa_offset.  */
 
-static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.sp_valid && !(fs.sp_realigned
@@ -13154,7 +13087,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the frame pointer is valid for accessing the cfa_offset.  */
 
-static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
@@ -13164,9 +13098,10 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 /* Choose a base register based upon alignment requested, speed and/or
    size.  */
 
-static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
-			    HOST_WIDE_INT &base_offset,
-			    unsigned int align_reqested, unsigned int *align)
+static void
+choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+		HOST_WIDE_INT &base_offset,
+		unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
   unsigned int hfp_align;
@@ -14159,6 +14094,7 @@ ix86_finalize_stack_realign_flags (void)
        < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
 	  ? crtl->max_used_stack_slot_alignment
 	  : crtl->stack_alignment_needed));
+  bool recompute_frame_layout_p = false;
 
   if (crtl->stack_realign_finalized)
     {
@@ -14208,8 +14144,12 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
+		if (crtl->stack_realign_needed != stack_realign)
+		  recompute_frame_layout_p = true;
 		crtl->stack_realign_needed = stack_realign;
 		crtl->stack_realign_finalized = true;
+		if (recompute_frame_layout_p)
+		  ix86_compute_frame_layout ();
 		return;
 	      }
 	}
@@ -14240,10 +14180,15 @@ ix86_finalize_stack_realign_flags (void)
       df_scan_blocks ();
       df_compute_regs_ever_live (true);
       df_analyze ();
+      recompute_frame_layout_p = true;
     }
 
+  if (crtl->stack_realign_needed != stack_realign)
+    recompute_frame_layout_p = true;
   crtl->stack_realign_needed = stack_realign;
   crtl->stack_realign_finalized = true;
+  if (recompute_frame_layout_p)
+    ix86_compute_frame_layout ();
 }
 
 /* Delete SET_GOT right after entry block if it is allocated to reg.  */
@@ -14372,7 +14317,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
   m->fs.sp_realigned = false;
 
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
     {
@@ -15212,7 +15157,7 @@ ix86_expand_epilogue (int style)
   bool restore_stub_is_tail = false;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
   m->fs.sp_valid = stack_realign_fp
@@ -15757,7 +15702,7 @@ ix86_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
@@ -29320,7 +29265,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 
       /* Set here, but it may get cleared later.  */
       if (TARGET_CALL_MS2SYSV_XLOGUES)
-	cfun->machine->call_ms2sysv = true;
+	{
+	  if (!TARGET_SSE)
+	    ;
+
+	  /* Don't break hot-patched functions.  */
+	  else if (ix86_function_ms_hook_prologue (current_function_decl))
+	    ;
+
+	  /* TODO: Cases not yet examined.  */
+	  else if (flag_split_stack)
+	    warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
+
+	  else
+	    {
+	      gcc_assert (!reload_completed);
+	      cfun->machine->call_ms2sysv = true;
+	    }
+	}
     }
 
   if (vec_len > 1)
@@ -29455,7 +29417,6 @@ ix86_init_machine_status (void)
   struct machine_function *f;
 
   f = ggc_cleared_alloc<machine_function> ();
-  f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
 
   return f;
@@ -31516,7 +31477,10 @@ ix86_static_chain (const_tree fndecl_or_type, bool
 	  if (incoming_p)
 	    {
 	      if (fndecl == current_function_decl)
-		ix86_static_chain_on_stack = true;
+		{
+		  gcc_assert (!reload_completed);
+		  ix86_static_chain_on_stack = true;
+		}
 	      return gen_frame_mem (SImode,
 				    plus_constant (Pmode,
 						   arg_pointer_rtx, -8));
@@ -52828,6 +52792,9 @@ ix86_run_selftests (void)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT ix86_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED ix86_frame_pointer_required
 
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 248031)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2163,10 +2163,6 @@ extern int const dbx_register_map[FIRST_PSEUDO_REG
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
-#define NUM_X86_64_MS_CLOBBERED_REGS \
-  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
-
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
   gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
@@ -2448,9 +2444,76 @@ enum avx_u128_state
 \f
 #define FASTCALL_PREFIX '@'
 \f
+#ifndef USED_FOR_TARGET
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- reg_save_offset
+   [padding0]
+					<- stack_realign_offset
+   [saved SSE regs]
+	OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+			<- Start of out-of-line, stub-saved/restored regs
+			   (see libgcc/config/i386/(sav|res)ms64*.S)
+     [XMM6-15]
+     [RSI]
+     [RDI]
+     [?RBX]		only if RBX is clobbered
+     [?RBP]		only if RBP and RBX are clobbered
+     [?R12]		only if R12 and all previous regs are clobbered
+     [?R13]		only if R13 and all previous regs are clobbered
+     [?R14]		only if R14 and all previous regs are clobbered
+     [?R15]		only if R15 and all previous regs are clobbered
+			<- end of stub-saved/restored regs
+     [padding1]
+   ]
+					<- outlined_save_offset
+					<- sse_regs_save_offset
+   [padding2]
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct GTY(()) ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
 /* Machine specific frame tracking during prologue/epilogue generation.  */
 
-#ifndef USED_FOR_TARGET
 struct GTY(()) machine_frame_state
 {
   /* This pair tracks the currently active CFA as reg+offset.  When reg
@@ -2520,9 +2583,8 @@ struct GTY(()) machine_function {
   int varargs_fpr_size;
   int optimize_mode_switching[MAX_386_ENTITIES];
 
-  /* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE
-     has been computed for.  */
-  int use_fast_prologue_epilogue_nregs;
+  /* Cached initial frame layout for the current function.  */
+  struct ix86_frame frame;
 
   /* For -fsplit-stack support: A stack local which holds a pointer to
      the stack arguments for a function with a variable number of

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16 20:10                 ` Bernd Edlinger
@ 2017-05-16 21:05                   ` Bernd Edlinger
  2017-05-17  2:12                   ` Daniel Santos
  1 sibling, 0 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-16 21:05 UTC (permalink / raw)
  To: Ian Lance Taylor, Daniel Santos; +Cc: gcc-patches, Uros Bizjak

On 05/16/17 21:52, Bernd Edlinger wrote:
> The calls_eh_return and ix86_static_chain_on_stack may become
> known at a later time, but after reload it should not change any more.
> To be sure, I added an assertion at ix86_static_chain, which the
> regression test did not trigger, neither with -m64 nor with -m32.
>

Oops, excuse me, actually -m32 does trigger the assert, for instance:

FAIL: gcc.target/i386/pr67770.c (internal compiler error)
FAIL: gcc.target/i386/pr67770.c (test for excess errors)
Excess errors:
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/i386/pr67770.c:25:3: 
internal compiler error: in ix86_static_chain, at config/i386/i386.c:31481
0xecf71a ix86_static_chain
         ../../gcc-trunk/gcc/config/i386/i386.c:31481
0x7b33b2 df_get_entry_block_def_set
         ../../gcc-trunk/gcc/df-scan.c:3539
0x7bb0b6 df_scan_blocks()
         ../../gcc-trunk/gcc/df-scan.c:576
0x9bc46d do_reload
         ../../gcc-trunk/gcc/ira.c:5504
0x9bc46d execute
         ../../gcc-trunk/gcc/ira.c:5624


As it looks like ix86_static_chain_on_stack will definitely change
the frame layout, and that is probably something that should not
have happened.  Without the assert it could be wrong code, right?

However, the following change would avoid the assertion:

               if (fndecl == current_function_decl
                   && !ix86_static_chain_on_stack)
                 {
                   gcc_assert (!reload_completed);
                   ix86_static_chain_on_stack = true;
                 }

At least in the test case above...


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
  2017-05-16 20:10                 ` Bernd Edlinger
@ 2017-05-16 21:35                 ` Daniel Santos
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-16 21:35 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Bernd Edlinger, gcc-patches

On 05/16/2017 12:19 PM, Ian Lance Taylor wrote:
> On Mon, May 15, 2017 at 10:00 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
>> Ian, would you mind looking at this please?  A combination of my
>> -mcall-ms2sysv-xlogues patch with Bernd's patch is causing problems when
>> ix86_expand_split_stack_prologue() calls ix86_expand_call().
> I don't have a lot of context here.  I assume that ms2sysv is going to
> be used on Windows systems, where -fsplit-stack isn't really going to
> work anyhow, so I think it would probably be OK that reject that
> combination if it causes trouble.

Sorry I wasn't more specific.  This -mcall-ms2sysv-xlogues actually 
targets Wine, although they don't use -fsplit-stack.  My patch set as-is 
is disabled when fsplit-stack is used, but during 
ix86_compute_frame_layout, which is too late in the case of 
-fsplit-stack.  I think I should just change this to a sorry() in 
ix86_option_override_internal.

> Also, it's overkill for ix86_expand_split_stack_prologue to call
> ix86_expand_call.  The call is always to __morestack, and __morestack
> is written in assembler, so we could use a simpler version of
> ix86_expand_call if that helps.  In particular we can decide that
> __morestack doesn't clobber any unusual registers, if that is what is
> causing the problem.
>
> Ian

Well aside from the conflict of the two patches, it just looks like it 
has the potential to generate clobbers where none are needed, but I'm 
having trouble actually *proving* that, so maybe I'm just wrong.

Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-16 20:10                 ` Bernd Edlinger
  2017-05-16 21:05                   ` Bernd Edlinger
@ 2017-05-17  2:12                   ` Daniel Santos
  2017-05-17 17:43                     ` Bernd Edlinger
  2017-05-18 13:48                     ` Bernd Edlinger
  1 sibling, 2 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-17  2:12 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Uros Bizjak

On 05/16/2017 02:52 PM, Bernd Edlinger wrote:
> I think I solved the problem with -fsplit-stack, I am not sure
> if ix86_static_chain_on_stack might change after reload due to
> final.c possibly calling targetm.calls.static_chain, but if that
> is the case, that is an already pre-existing problem.
>
> The goal of this patch is to make all decisions regarding the
> frame layout before the reload pass, and to make sure that
> the frame layout does not change unexpectedly it asserts
> that the data that goes into the decision does not change
> after reload_completed.
>
> With the attached patch -fsplit-stack and the attribute ms_hook_prologue
> is handed directly at the ix86_expand_call, because that data is
> already known before expansion.
>
> The calls_eh_return and ix86_static_chain_on_stack may become
> known at a later time, but after reload it should not change any more.
> To be sure, I added an assertion at ix86_static_chain, which the
> regression test did not trigger, neither with -m64 nor with -m32.
>
> I have bootstrapped the patch several times, and a few times I
> encounterd a segfault in the garbage collection, but it did not
> happen every time.  Currently I think that is unrelated to this patch.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.

With as many formatting errors as I seem to have had, I would like to 
fix those then you patch on top of that if you wouldn't mind terribly.  
While gcc uses subversion, git-blame is still very helpful (then again, 
since Uros committed it for me, I guess that's already off).


> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c    (revision 248031)
> +++ gcc/config/i386/i386.c    (working copy)
> @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
>
>  /* Additional registers that are clobbered by SYSV calls.  */
>
> -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
> +#define NUM_X86_64_MS_CLOBBERED_REGS 12
> +static int const x86_64_ms_sysv_extra_clobbered_registers
> +         [NUM_X86_64_MS_CLOBBERED_REGS] =

Is there a reason you're changing this unsigned to signed int? While 
AX_REG and such are just preprocessor macros, everywhere else it seems 
that register numbers are dealt with as unsigned ints.

> @@ -2484,13 +2486,13 @@ class xlogue_layout {
>       needs to store registers based upon data in the 
> machine_function.  */
>    HOST_WIDE_INT get_stack_space_used () const
>    {
> -    const struct machine_function &m = *cfun->machine;
> -    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
> +    const struct machine_function *m = cfun->machine;
> +    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;

What is the reason for this change?

>
> -    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
> +    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
>      return m_regs[last_reg].offset
> -        + (m.call_ms2sysv_pad_out ? 8 : 0)
> -        + STUB_INDEX_OFFSET;
> +       + (m->call_ms2sysv_pad_out ? 8 : 0)
> +       + STUB_INDEX_OFFSET;
>    }
>
>    /* Returns the offset for the base pointer used by the stub. */
> @@ -2532,7 +2534,7 @@ class xlogue_layout {
>    /* Lazy-inited cache of symbol names for stubs.  */
>    char 
> m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
>
> -  static const struct xlogue_layout GTY(()) 
> s_instances[XLOGUE_SET_COUNT];
> +  static const struct GTY(()) xlogue_layout 
> s_instances[XLOGUE_SET_COUNT];

Hmm, during development I originally had C-style xlogue_layout as a 
struct and later decided to make it a class and apparently forgot to 
remove the "struct" here.  None the less, it's bazaar that the GTY() 
would go in between the "struct" and the "xlogue_layout."  As I said 
before, I don't fully understand how this GTY works.  Can we just remove 
the "struct" keyword?

Also, if the way I had it was wrong, (and resulted in garbage collection 
not working right) then perhaps it was the cause of a problem I had with 
caching symbol rtx objects.  I could not get this to work because my 
cached objects would somehow become stale and I've since removed that 
code (from xlogue_layout::get_stub_rtx).  (i.e., does GTY effect 
lifespan of globals, TU statics and static C++ data members?)

>  /* Constructor for xlogue_layout.  */
> @@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT 
> stack_
>    : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
>      m_stack_align_off_in (stack_align_off_in)
>  {
> +  HOST_WIDE_INT offset = stack_align_off_in;
> +  unsigned i, j;
> +
>    memset (m_regs, 0, sizeof (m_regs));
>    memset (m_stub_names, 0, sizeof (m_stub_names));
> -
> -  HOST_WIDE_INT offset = stack_align_off_in;
> -  unsigned i, j;
>    for (i = j = 0; i < MAX_REGS; ++i)
>      {
>        unsigned regno = REG_ORDER[i];
> @@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT 
> stack_
>        m_regs[j].regno    = regno;
>        m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
>      }
> -    gcc_assert (j == m_nregs);
> +  gcc_assert (j == m_nregs);
>  }

Aside from my formatting errors,  this would actually be incorrect per 
the GNU coding conventions 
(https://gcc.gnu.org/codingconventions.html#Variable), which is probably 
based off of Effective C++ Item 26.  Obviously, we're still dealing with 
part of this as classical C++ and the rest as if it were C, but I'm 
trying to follow C++ norms in C++ classes and member functions.

> @@ -2676,7 +2681,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
>      {
>        int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
>                STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
> -      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
> +      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);

Good catch! Thank you.

> @@ -12634,10 +12573,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
>            && df_regs_ever_live_p (regno)));
>  }
>
> -/* Registers who's save & restore will be managed by stubs called from
> -   pro/epilogue.  */
> -static HARD_REG_SET GTY(()) stub_managed_regs;
> -
>  /* Return true if register class CL should be an additional allocno
>     class.  */
>
> @@ -12718,10 +12653,17 @@ ix86_save_reg (unsigned int regno, bool 
> maybe_eh_r
>      }
>      }
>
> -  if (ignore_outlined && cfun->machine->call_ms2sysv
> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
> -    return false;
> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
> +    {
> +      /* Registers who's save & restore will be managed by stubs 
> called from
> +     pro/epilogue.  */
> +      HARD_REG_SET stub_managed_regs;
> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>
> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
> +    return false;
> +    }
> +
>    if (crtl->drap_reg
>        && regno == REGNO (crtl->drap_reg)
>        && !cfun->machine->no_drap_save_restore)

This makes no sense.  The entire purpose of stub_managed_regs is to 
cache the result of xlogue_layout::compute_stub_managed_regs() and this 
would unnecessarily repeat that calculation for each time 
ix86_save_reg() is called.  Since 
xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many 
times, this makes it even worse.Which registers are being saved 
out-of-line and inline MUST be known at the time the stack layout is 
determined.  So stub_managed_regsshould either be left a TU static or 
just moved to struct machine_function.

As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs 
is calling ix86_save_reg (which isn't trivial) more often than it really 
has to, so I've refactored it.

>
> -/* Disables out-of-lined msabi to sysv pro/epilogues and emits a 
> warning if
> -   warn_once is null, or *warn_once is zero.  */
> -static void disable_call_ms2sysv_xlogues (const char *feature)
> +/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
> +static void warn_once_call_ms2sysv_xlogues (const char *feature)
>  {
> -  cfun->machine->call_ms2sysv = false;
> -  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with 
> %s.",
> -       feature);
> +  static bool warned_once = false;
> +  if (!warned_once)
> +    {
> +      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
> +           feature);
> +      warned_once = true;
> +    }
>  }
>

We probably don't want to suppress all warnings across cases.  I've got 
a new version of this that takes a mask for the "warned":

static void disable_call_ms2sysv_xlogues (const char *feature, int warned_mask)
{
   static int warned = 0;
   cfun->machine->call_ms2sysv = false;

   if (!(warned & warned_mask))
     {
       warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
	       feature);
       warned |= warned_mask;
     }
}


>  /* When using -fsplit-stack, the allocation routines set a field in
> @@ -12836,8 +12780,9 @@ ix86_builtin_setjmp_frame_value (void)
>  /* Fill structure ix86_frame about frame of currently computed 
> function.  */
>
>  static void
> -ix86_compute_frame_layout (struct ix86_frame *frame)
> +ix86_compute_frame_layout (void)
>  {
> +  struct ix86_frame *frame = &cfun->machine->frame;
>    struct machine_function *m = cfun->machine;
>    unsigned HOST_WIDE_INT stack_alignment_needed;
>    HOST_WIDE_INT offset;
> @@ -12845,46 +12790,41 @@ static void
>    HOST_WIDE_INT size = get_frame_size ();
>    HOST_WIDE_INT to_allocate;
>
> -  CLEAR_HARD_REG_SET (stub_managed_regs);
> -
>    /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 
> 64-bit
>     * ms_abi functions that call a sysv function.  We now need to 
> prune away
>     * cases where it should be disabled.  */
>    if (TARGET_64BIT && m->call_ms2sysv)
> -  {
> -    gcc_assert (TARGET_64BIT_MS_ABI);
> -    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
> -    gcc_assert (!TARGET_SEH);
> +    {
> +      gcc_assert (TARGET_64BIT_MS_ABI);
> +      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
> +      gcc_assert (!TARGET_SEH);
> +      gcc_assert (TARGET_SSE);
> +      gcc_assert (!ix86_using_red_zone ());
>
> -    if (!TARGET_SSE)
> -      m->call_ms2sysv = false;
> +      if (crtl->calls_eh_return)
> +    {
> +      gcc_assert (!reload_completed);
> +      m->call_ms2sysv = false;
> +      warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
> +    }
>
> -    /* Don't break hot-patched functions.  */
> -    else if (ix86_function_ms_hook_prologue (current_function_decl))
> -      m->call_ms2sysv = false;
> +      else if (ix86_static_chain_on_stack)
> +    {
> +      gcc_assert (!reload_completed);
> +      m->call_ms2sysv = false;
> +      warn_once_call_ms2sysv_xlogues ("static call chains");
> +    }
>
> -    /* TODO: Cases not yet examined.  */
> -    else if (crtl->calls_eh_return)
> -      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
> +      /* Finally, compute which registers the stub will manage.  */
> +      else
> +        {
> +      HARD_REG_SET stub_managed_regs;
> +      unsigned count = xlogue_layout
> +               ::compute_stub_managed_regs (stub_managed_regs);
> +      m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
> +    }
> +    }
>
> -    else if (ix86_static_chain_on_stack)
> -      disable_call_ms2sysv_xlogues ("static call chains");
> -
> -    else if (ix86_using_red_zone ())
> -      disable_call_ms2sysv_xlogues ("red zones");
> -
> -    else if (flag_split_stack)
> -      disable_call_ms2sysv_xlogues ("split stack");
> -
> -    /* Finally, compute which registers the stub will manage. */
> -    else
> -      {
> -    unsigned count = xlogue_layout
> -             ::compute_stub_managed_regs (stub_managed_regs);
> -    m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
> -      }
> -  }
> -

> @@ -29320,7 +29265,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> call
>
>        /* Set here, but it may get cleared later.  */
>        if (TARGET_CALL_MS2SYSV_XLOGUES)
> -    cfun->machine->call_ms2sysv = true;
> +    {
> +      if (!TARGET_SSE)
> +        ;
> +
> +      /* Don't break hot-patched functions.  */
> +      else if (ix86_function_ms_hook_prologue (current_function_decl))
> +        ;
> +
> +      /* TODO: Cases not yet examined.  */
> +      else if (flag_split_stack)
> +        warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
> +
> +      else
> +        {
> +          gcc_assert (!reload_completed);
> +          cfun->machine->call_ms2sysv = true;
> +        }
> +    }
>      }
>

Other than the local compute_stub_managed_regs (which is a problem), 
this looks like a very good improvement.  It clarifies conditions that 
will not change over the course of compiling a function (split stack, 
ms_hook_prologue, etc.) and those that can.

After some thought, I've decided that it's not better to use a sorry() 
to filter out -fsplit-stack combined with -march-ms2sysv-xlogues because 
it would break support for a TU that uses both via function attributes.  
Example:

void __attribute__((ms_abi))
a (void)
{
   call_a_sysv_fn ();
   /* stuff */
}

void __attribute__((optimize("-fsplit-stack")))
b(void)
{
   /* stuff */
}


So you're fixes for this are better.

Thanks,
Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-17  2:12                   ` Daniel Santos
@ 2017-05-17 17:43                     ` Bernd Edlinger
  2017-05-18  7:05                       ` Daniel Santos
  2017-05-18 13:48                     ` Bernd Edlinger
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-17 17:43 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Uros Bizjak

On 05/17/17 04:01, Daniel Santos wrote:
> On 05/16/2017 02:52 PM, Bernd Edlinger wrote:
>> I think I solved the problem with -fsplit-stack, I am not sure
>> if ix86_static_chain_on_stack might change after reload due to
>> final.c possibly calling targetm.calls.static_chain, but if that
>> is the case, that is an already pre-existing problem.
>>
>> The goal of this patch is to make all decisions regarding the
>> frame layout before the reload pass, and to make sure that
>> the frame layout does not change unexpectedly it asserts
>> that the data that goes into the decision does not change
>> after reload_completed.
>>
>> With the attached patch -fsplit-stack and the attribute ms_hook_prologue
>> is handed directly at the ix86_expand_call, because that data is
>> already known before expansion.
>>
>> The calls_eh_return and ix86_static_chain_on_stack may become
>> known at a later time, but after reload it should not change any more.
>> To be sure, I added an assertion at ix86_static_chain, which the
>> regression test did not trigger, neither with -m64 nor with -m32.
>>
>> I have bootstrapped the patch several times, and a few times I
>> encounterd a segfault in the garbage collection, but it did not
>> happen every time.  Currently I think that is unrelated to this patch.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu with -m64/-m32.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>
> With as many formatting errors as I seem to have had, I would like to
> fix those then you patch on top of that if you wouldn't mind terribly.
> While gcc uses subversion, git-blame is still very helpful (then again,
> since Uros committed it for me, I guess that's already off).
>

Apologies if I ruined your patch...

>
>> Index: gcc/config/i386/i386.c
>> ===================================================================
>> --- gcc/config/i386/i386.c    (revision 248031)
>> +++ gcc/config/i386/i386.c    (working copy)
>> @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
>>
>>  /* Additional registers that are clobbered by SYSV calls.  */
>>
>> -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
>> +#define NUM_X86_64_MS_CLOBBERED_REGS 12
>> +static int const x86_64_ms_sysv_extra_clobbered_registers
>> +         [NUM_X86_64_MS_CLOBBERED_REGS] =
>
> Is there a reason you're changing this unsigned to signed int? While
> AX_REG and such are just preprocessor macros, everywhere else it seems
> that register numbers are dealt with as unsigned ints.
>

I actually there seems to be confusion about "int" vs. "unsigned int"
for regno, the advantage of int, is that it can contain -1 as a
exceptional value.  Furthermore there are 3 similar arrays just
above that also use int:

static int const x86_64_int_parameter_registers[6] =
{
   DI_REG, SI_REG, DX_REG, CX_REG, R8_REG, R9_REG
};

static int const x86_64_ms_abi_int_parameter_registers[4] =
{
   CX_REG, DX_REG, R8_REG, R9_REG
};

static int const x86_64_int_return_registers[4] =
{
   AX_REG, DX_REG, DI_REG, SI_REG
};

/* Additional registers that are clobbered by SYSV calls.  */

#define NUM_X86_64_MS_CLOBBERED_REGS 12
static int const x86_64_ms_sysv_extra_clobbered_registers
                  [NUM_X86_64_MS_CLOBBERED_REGS] =
{
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
   XMM8_REG, XMM9_REG, XMM10_REG, XMM11_REG,
   XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
};

So IMHO it looked odd to have one array use a different type in the
first place.


>> @@ -2484,13 +2486,13 @@ class xlogue_layout {
>>       needs to store registers based upon data in the
>> machine_function.  */
>>    HOST_WIDE_INT get_stack_space_used () const
>>    {
>> -    const struct machine_function &m = *cfun->machine;
>> -    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
>> +    const struct machine_function *m = cfun->machine;
>> +    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
>
> What is the reason for this change?
>

Because a mixture of C and C++ (C wants "struct" machine_function)
looks ugly, and everywhere else in this module, "m" is a pointer and no
reference.

>>
>> -    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
>> +    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
>>      return m_regs[last_reg].offset
>> -        + (m.call_ms2sysv_pad_out ? 8 : 0)
>> -        + STUB_INDEX_OFFSET;
>> +       + (m->call_ms2sysv_pad_out ? 8 : 0)
>> +       + STUB_INDEX_OFFSET;
>>    }
>>
>>    /* Returns the offset for the base pointer used by the stub. */
>> @@ -2532,7 +2534,7 @@ class xlogue_layout {
>>    /* Lazy-inited cache of symbol names for stubs.  */
>>    char
>> m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
>>
>> -  static const struct xlogue_layout GTY(())
>> s_instances[XLOGUE_SET_COUNT];
>> +  static const struct GTY(()) xlogue_layout
>> s_instances[XLOGUE_SET_COUNT];
>
> Hmm, during development I originally had C-style xlogue_layout as a
> struct and later decided to make it a class and apparently forgot to
> remove the "struct" here.  None the less, it's bazaar that the GTY()
> would go in between the "struct" and the "xlogue_layout."  As I said
> before, I don't fully understand how this GTY works.  Can we just remove
> the "struct" keyword?
>
> Also, if the way I had it was wrong, (and resulted in garbage collection
> not working right) then perhaps it was the cause of a problem I had with
> caching symbol rtx objects.  I could not get this to work because my
> cached objects would somehow become stale and I've since removed that
> code (from xlogue_layout::get_stub_rtx).  (i.e., does GTY effect
> lifespan of globals, TU statics and static C++ data members?)
>

Yes, I have not noticed the "struct", and agree to remove it.

I just saw every other place where GTY is used it is directly after
"struct" or "static", so my impulse was just to follow that examples.

But neither version actually makes the class GC-able.  Apparently
this class construct is too complicated for the gengtype machinery.
So I am inclined to remove the GTY keyword completely as it gives
you only false security in GC's ability to garbage collect anything
in this class.

>>  /* Constructor for xlogue_layout.  */
>> @@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT
>> stack_
>>    : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
>>      m_stack_align_off_in (stack_align_off_in)
>>  {
>> +  HOST_WIDE_INT offset = stack_align_off_in;
>> +  unsigned i, j;
>> +
>>    memset (m_regs, 0, sizeof (m_regs));
>>    memset (m_stub_names, 0, sizeof (m_stub_names));
>> -
>> -  HOST_WIDE_INT offset = stack_align_off_in;
>> -  unsigned i, j;
>>    for (i = j = 0; i < MAX_REGS; ++i)
>>      {
>>        unsigned regno = REG_ORDER[i];
>> @@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT
>> stack_
>>        m_regs[j].regno    = regno;
>>        m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
>>      }
>> -    gcc_assert (j == m_nregs);
>> +  gcc_assert (j == m_nregs);
>>  }
>
> Aside from my formatting errors,  this would actually be incorrect per
> the GNU coding conventions
> (https://gcc.gnu.org/codingconventions.html#Variable), which is probably
> based off of Effective C++ Item 26.  Obviously, we're still dealing with
> part of this as classical C++ and the rest as if it were C, but I'm
> trying to follow C++ norms in C++ classes and member functions.
>

Well, yes, but I doubt this creates a "cognitive burden on the
programmer", given this is just one line above.

>> @@ -2676,7 +2681,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
>>      {
>>        int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
>>                STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
>> -      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
>> +      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
>
> Good catch! Thank you.
>
>> @@ -12634,10 +12573,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
>>            && df_regs_ever_live_p (regno)));
>>  }
>>
>> -/* Registers who's save & restore will be managed by stubs called from
>> -   pro/epilogue.  */
>> -static HARD_REG_SET GTY(()) stub_managed_regs;
>> -
>>  /* Return true if register class CL should be an additional allocno
>>     class.  */
>>
>> @@ -12718,10 +12653,17 @@ ix86_save_reg (unsigned int regno, bool
>> maybe_eh_r
>>      }
>>      }
>>
>> -  if (ignore_outlined && cfun->machine->call_ms2sysv
>> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>> -    return false;
>> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
>> +    {
>> +      /* Registers who's save & restore will be managed by stubs
>> called from
>> +     pro/epilogue.  */
>> +      HARD_REG_SET stub_managed_regs;
>> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>>
>> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>> +    return false;
>> +    }
>> +
>>    if (crtl->drap_reg
>>        && regno == REGNO (crtl->drap_reg)
>>        && !cfun->machine->no_drap_save_restore)
>
> This makes no sense.  The entire purpose of stub_managed_regs is to
> cache the result of xlogue_layout::compute_stub_managed_regs() and this
> would unnecessarily repeat that calculation for each time
> ix86_save_reg() is called.  Since
> xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
> times, this makes it even worse.Which registers are being saved
> out-of-line and inline MUST be known at the time the stack layout is
> determined.  So stub_managed_regsshould either be left a TU static or
> just moved to struct machine_function.
>
> As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
> is calling ix86_save_reg (which isn't trivial) more often than it really
> has to, so I've refactored it.
>

((Actually ix86_save_reg is not an expensive function.
It relies entirely on cached information.))

As I told you, it is not good to rely on the pass-manager to run all
passes for one function in a row and moreover cfun is actually a
stack of functions, see push_cfun ().  So a static value will not
reflect the same value as cfun->machine does.

I would like to move that data to i386.h but I think
we should not add new dependencies to i386.h because it
is used everywhere.  The problem is HARD_REG_SET not being
used before in any target header file, and I don't want to be
the first one.

So yes this is not a perfect solution yet, but I still doubt that
a static value is a better solution.

>>
>> -/* Disables out-of-lined msabi to sysv pro/epilogues and emits a
>> warning if
>> -   warn_once is null, or *warn_once is zero.  */
>> -static void disable_call_ms2sysv_xlogues (const char *feature)
>> +/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
>> +static void warn_once_call_ms2sysv_xlogues (const char *feature)
>>  {
>> -  cfun->machine->call_ms2sysv = false;
>> -  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with
>> %s.",
>> -       feature);
>> +  static bool warned_once = false;
>> +  if (!warned_once)
>> +    {
>> +      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
>> +           feature);
>> +      warned_once = true;
>> +    }
>>  }
>>
>
> We probably don't want to suppress all warnings across cases.  I've got
> a new version of this that takes a mask for the "warned":
>
> static void disable_call_ms2sysv_xlogues (const char *feature, int
> warned_mask)
> {
>   static int warned = 0;
>   cfun->machine->call_ms2sysv = false;
>
>   if (!(warned & warned_mask))
>     {
>       warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with
> %s.",
>            feature);
>       warned |= warned_mask;
>     }
> }
>
>

That would be a candidate for a follow-up patch.

Note, that the first parameter ought to be a warning option name, if it
is just refering to a code generation option that is IMHO mis-leading
to the user.

Thanks,
Bernd.

>>  /* When using -fsplit-stack, the allocation routines set a field in
>> @@ -12836,8 +12780,9 @@ ix86_builtin_setjmp_frame_value (void)
>>  /* Fill structure ix86_frame about frame of currently computed
>> function.  */
>>
>>  static void
>> -ix86_compute_frame_layout (struct ix86_frame *frame)
>> +ix86_compute_frame_layout (void)
>>  {
>> +  struct ix86_frame *frame = &cfun->machine->frame;
>>    struct machine_function *m = cfun->machine;
>>    unsigned HOST_WIDE_INT stack_alignment_needed;
>>    HOST_WIDE_INT offset;
>> @@ -12845,46 +12790,41 @@ static void
>>    HOST_WIDE_INT size = get_frame_size ();
>>    HOST_WIDE_INT to_allocate;
>>
>> -  CLEAR_HARD_REG_SET (stub_managed_regs);
>> -
>>    /* m->call_ms2sysv is initially enabled in ix86_expand_call for all
>> 64-bit
>>     * ms_abi functions that call a sysv function.  We now need to
>> prune away
>>     * cases where it should be disabled.  */
>>    if (TARGET_64BIT && m->call_ms2sysv)
>> -  {
>> -    gcc_assert (TARGET_64BIT_MS_ABI);
>> -    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
>> -    gcc_assert (!TARGET_SEH);
>> +    {
>> +      gcc_assert (TARGET_64BIT_MS_ABI);
>> +      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
>> +      gcc_assert (!TARGET_SEH);
>> +      gcc_assert (TARGET_SSE);
>> +      gcc_assert (!ix86_using_red_zone ());
>>
>> -    if (!TARGET_SSE)
>> -      m->call_ms2sysv = false;
>> +      if (crtl->calls_eh_return)
>> +    {
>> +      gcc_assert (!reload_completed);
>> +      m->call_ms2sysv = false;
>> +      warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
>> +    }
>>
>> -    /* Don't break hot-patched functions.  */
>> -    else if (ix86_function_ms_hook_prologue (current_function_decl))
>> -      m->call_ms2sysv = false;
>> +      else if (ix86_static_chain_on_stack)
>> +    {
>> +      gcc_assert (!reload_completed);
>> +      m->call_ms2sysv = false;
>> +      warn_once_call_ms2sysv_xlogues ("static call chains");
>> +    }
>>
>> -    /* TODO: Cases not yet examined.  */
>> -    else if (crtl->calls_eh_return)
>> -      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
>> +      /* Finally, compute which registers the stub will manage.  */
>> +      else
>> +        {
>> +      HARD_REG_SET stub_managed_regs;
>> +      unsigned count = xlogue_layout
>> +               ::compute_stub_managed_regs (stub_managed_regs);
>> +      m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
>> +    }
>> +    }
>>
>> -    else if (ix86_static_chain_on_stack)
>> -      disable_call_ms2sysv_xlogues ("static call chains");
>> -
>> -    else if (ix86_using_red_zone ())
>> -      disable_call_ms2sysv_xlogues ("red zones");
>> -
>> -    else if (flag_split_stack)
>> -      disable_call_ms2sysv_xlogues ("split stack");
>> -
>> -    /* Finally, compute which registers the stub will manage. */
>> -    else
>> -      {
>> -    unsigned count = xlogue_layout
>> -             ::compute_stub_managed_regs (stub_managed_regs);
>> -    m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
>> -      }
>> -  }
>> -
>
>> @@ -29320,7 +29265,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx
>> call
>>
>>        /* Set here, but it may get cleared later.  */
>>        if (TARGET_CALL_MS2SYSV_XLOGUES)
>> -    cfun->machine->call_ms2sysv = true;
>> +    {
>> +      if (!TARGET_SSE)
>> +        ;
>> +
>> +      /* Don't break hot-patched functions.  */
>> +      else if (ix86_function_ms_hook_prologue (current_function_decl))
>> +        ;
>> +
>> +      /* TODO: Cases not yet examined.  */
>> +      else if (flag_split_stack)
>> +        warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
>> +
>> +      else
>> +        {
>> +          gcc_assert (!reload_completed);
>> +          cfun->machine->call_ms2sysv = true;
>> +        }
>> +    }
>>      }
>>
>
> Other than the local compute_stub_managed_regs (which is a problem),
> this looks like a very good improvement.  It clarifies conditions that
> will not change over the course of compiling a function (split stack,
> ms_hook_prologue, etc.) and those that can.
>
> After some thought, I've decided that it's not better to use a sorry()
> to filter out -fsplit-stack combined with -march-ms2sysv-xlogues because
> it would break support for a TU that uses both via function attributes.
> Example:
>
> void __attribute__((ms_abi))
> a (void)
> {
>   call_a_sysv_fn ();
>   /* stuff */
> }
>
> void __attribute__((optimize("-fsplit-stack")))
> b(void)
> {
>   /* stuff */
> }
>
>
> So you're fixes for this are better.
>
> Thanks,
> Daniel
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-15  2:23       ` Daniel Santos
  2017-05-15 20:41         ` Bernd Edlinger
@ 2017-05-17 18:41         ` Bernd Edlinger
  2017-05-18  7:14           ` Daniel Santos
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-17 18:41 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches

On 05/15/17 03:39, Daniel Santos wrote:
> On 05/14/2017 11:31 AM, Bernd Edlinger wrote:
>> Hi Daniel,
>>
>> there is one thing I don't understand in your patch:
>> That is, it introduces a static value:
>>
>> /* Registers who's save & restore will be managed by stubs called from
>>      pro/epilogue.  */
>> static HARD_REG_SET GTY(()) stub_managed_regs;
>>
>> This seems to be set as a side effect of ix86_compute_frame_layout,
>> and depends on the register usage of the current function.
>> But values that depend on the current function need usually be
>> attached to cfun->machine, because the passes can run in parallel
>> unless I am completely mistaken, and the stub_managed_regs may
>> therefore be computed from a different function.
>>
>>
>> Bernd.
>
> I should add that if you want to run faster tests just on the ms to sysv
> abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
> that succeeds run the full testsuite.
>
> Daniel

Hmm, that's funny...

If I use "make check-c RUNTESTFLAGS="ms-sysv.exp" -j8" it seems to work,
but if I omit the -j8 it fails:

make check-c RUNTESTFLAGS="ms-sysv.exp"
...Test Run By ed on Wed May 17 20:38:24 2017
Native configuration is x86_64-pc-linux-gnu

		=== gcc tests ===

Schedule of variations:
     unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file 
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for 
target.
Using /home/ed/gnu/gcc-trunk/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.
Running 
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp 
...
ERROR: tcl error sourcing 
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp.
ERROR: no such variable
     (read trace on "env(GCC_RUNTEST_PARALLELIZE_DIR)")
     invoked from within
"set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv""
     (file 
"/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp" 
line 154)
     invoked from within
"source 
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
     ("uplevel" body line 1)
     invoked from within
"uplevel #0 source 
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
     invoked from within
"catch "uplevel #0 source $test_file_name""

		=== gcc Summary ===

/home/ed/gnu/gcc-build/gcc/xgcc  version 8.0.0 20170514 (experimental) 
(GCC)

make[2]: Leaving directory `/home/ed/gnu/gcc-build/gcc'
make[1]: Leaving directory `/home/ed/gnu/gcc-build/gcc'



Bernd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-17 17:43                     ` Bernd Edlinger
@ 2017-05-18  7:05                       ` Daniel Santos
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-18  7:05 UTC (permalink / raw)
  To: gcc-patches

On 05/17/2017 12:41 PM, Bernd Edlinger wrote:
> Apologies if I ruined your patch...

As I said before, I'm the new guy here. :) So when this is done I'll 
rebase my changes.  I have some test stuff to fix and some refactoring 
and refinements to xlogue_layout::compute_stub_managed_regs(). And then 
I'll find a solution to the stub_managed_regs after that.

>>> Index: gcc/config/i386/i386.c
>>> ===================================================================
>>> --- gcc/config/i386/i386.c    (revision 248031)
>>> +++ gcc/config/i386/i386.c    (working copy)
>>> @@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
>>>
>>>   /* Additional registers that are clobbered by SYSV calls.  */
>>>
>>> -unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
>>> +#define NUM_X86_64_MS_CLOBBERED_REGS 12
>>> +static int const x86_64_ms_sysv_extra_clobbered_registers
>>> +         [NUM_X86_64_MS_CLOBBERED_REGS] =
>> Is there a reason you're changing this unsigned to signed int? While
>> AX_REG and such are just preprocessor macros, everywhere else it seems
>> that register numbers are dealt with as unsigned ints.
>>
> I actually there seems to be confusion about "int" vs. "unsigned int"
> for regno, the advantage of int, is that it can contain -1 as a
> exceptional value.  Furthermore there are 3 similar arrays just
> above that also use int:
>
> static int const x86_64_int_parameter_registers[6] =
> {
>     DI_REG, SI_REG, DX_REG, CX_REG, R8_REG, R9_REG
> };
>
> static int const x86_64_ms_abi_int_parameter_registers[4] =
> {
>     CX_REG, DX_REG, R8_REG, R9_REG
> };
>
> static int const x86_64_int_return_registers[4] =
> {
>     AX_REG, DX_REG, DI_REG, SI_REG
> };
>
> /* Additional registers that are clobbered by SYSV calls.  */
>
> #define NUM_X86_64_MS_CLOBBERED_REGS 12
> static int const x86_64_ms_sysv_extra_clobbered_registers
>                    [NUM_X86_64_MS_CLOBBERED_REGS] =
> {
>     SI_REG, DI_REG,
>     XMM6_REG, XMM7_REG,
>     XMM8_REG, XMM9_REG, XMM10_REG, XMM11_REG,
>     XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
> };
>
> So IMHO it looked odd to have one array use a different type in the
> first place.

OK.  I think that when I originally started this I was using elements of 
this array in comparisons and got the signed/unsigned warning and 
changed them.  None of the code gives that warning now however.

>>> @@ -2484,13 +2486,13 @@ class xlogue_layout {
>>>        needs to store registers based upon data in the
>>> machine_function.  */
>>>     HOST_WIDE_INT get_stack_space_used () const
>>>     {
>>> -    const struct machine_function &m = *cfun->machine;
>>> -    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
>>> +    const struct machine_function *m = cfun->machine;
>>> +    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
>> What is the reason for this change?
>>
> Because a mixture of C and C++ (C wants "struct" machine_function)
> looks ugly, and everywhere else in this module, "m" is a pointer and no
> reference.

I see, consistency with the rest of the file.

>>> -    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
>>> +    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
>>>       return m_regs[last_reg].offset
>>> -        + (m.call_ms2sysv_pad_out ? 8 : 0)
>>> -        + STUB_INDEX_OFFSET;
>>> +       + (m->call_ms2sysv_pad_out ? 8 : 0)
>>> +       + STUB_INDEX_OFFSET;
>>>     }
>>>
>>>     /* Returns the offset for the base pointer used by the stub. */
>>> @@ -2532,7 +2534,7 @@ class xlogue_layout {
>>>     /* Lazy-inited cache of symbol names for stubs.  */
>>>     char
>>> m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
>>>
>>> -  static const struct xlogue_layout GTY(())
>>> s_instances[XLOGUE_SET_COUNT];
>>> +  static const struct GTY(()) xlogue_layout
>>> s_instances[XLOGUE_SET_COUNT];
>> Hmm, during development I originally had C-style xlogue_layout as a
>> struct and later decided to make it a class and apparently forgot to
>> remove the "struct" here.  None the less, it's bazaar that the GTY()
>> would go in between the "struct" and the "xlogue_layout."  As I said
>> before, I don't fully understand how this GTY works.  Can we just remove
>> the "struct" keyword?
>>
>> Also, if the way I had it was wrong, (and resulted in garbage collection
>> not working right) then perhaps it was the cause of a problem I had with
>> caching symbol rtx objects.  I could not get this to work because my
>> cached objects would somehow become stale and I've since removed that
>> code (from xlogue_layout::get_stub_rtx).  (i.e., does GTY effect
>> lifespan of globals, TU statics and static C++ data members?)
>>
> Yes, I have not noticed the "struct", and agree to remove it.
>
> I just saw every other place where GTY is used it is directly after
> "struct" or "static", so my impulse was just to follow that examples.

Yeah, and not understanding how it worked I was just trying to follow suit.

> But neither version actually makes the class GC-able.  Apparently
> this class construct is too complicated for the gengtype machinery.
> So I am inclined to remove the GTY keyword completely as it gives
> you only false security in GC's ability to garbage collect anything
> in this class.

That's helpful, thanks.  Feel free to nuke it (or I will in my follow up).


>>>   /* Constructor for xlogue_layout.  */
>>> @@ -2639,11 +2643,11 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT
>>> stack_
>>>     : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
>>>       m_stack_align_off_in (stack_align_off_in)
>>>   {
>>> +  HOST_WIDE_INT offset = stack_align_off_in;
>>> +  unsigned i, j;
>>> +
>>>     memset (m_regs, 0, sizeof (m_regs));
>>>     memset (m_stub_names, 0, sizeof (m_stub_names));
>>> -
>>> -  HOST_WIDE_INT offset = stack_align_off_in;
>>> -  unsigned i, j;
>>>     for (i = j = 0; i < MAX_REGS; ++i)
>>>       {
>>>         unsigned regno = REG_ORDER[i];
>>> @@ -2662,11 +2666,12 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT
>>> stack_
>>>         m_regs[j].regno    = regno;
>>>         m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
>>>       }
>>> -    gcc_assert (j == m_nregs);
>>> +  gcc_assert (j == m_nregs);
>>>   }
>> Aside from my formatting errors,  this would actually be incorrect per
>> the GNU coding conventions
>> (https://gcc.gnu.org/codingconventions.html#Variable), which is probably
>> based off of Effective C++ Item 26.  Obviously, we're still dealing with
>> part of this as classical C++ and the rest as if it were C, but I'm
>> trying to follow C++ norms in C++ classes and member functions.
>>
> Well, yes, but I doubt this creates a "cognitive burden on the
> programmer", given this is just one line above.

I don't know the justification for the GNU standard here, but the 
Effective C++ Item 26 is more about efficiency and is probably somewhat 
outdated now that compilers are better at reordering.  It's really aimed 
at objects with constructors. Anyway I didn't mean to get out my hair 
splitting knife. :)

>>> @@ -2676,7 +2681,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
>>>       {
>>>         int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
>>>                 STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
>>> -      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
>>> +      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
>> Good catch! Thank you.
>>
>>> @@ -12634,10 +12573,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
>>>             && df_regs_ever_live_p (regno)));
>>>   }
>>>
>>> -/* Registers who's save & restore will be managed by stubs called from
>>> -   pro/epilogue.  */
>>> -static HARD_REG_SET GTY(()) stub_managed_regs;
>>> -
>>>   /* Return true if register class CL should be an additional allocno
>>>      class.  */
>>>
>>> @@ -12718,10 +12653,17 @@ ix86_save_reg (unsigned int regno, bool
>>> maybe_eh_r
>>>       }
>>>       }
>>>
>>> -  if (ignore_outlined && cfun->machine->call_ms2sysv
>>> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>> -    return false;
>>> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
>>> +    {
>>> +      /* Registers who's save & restore will be managed by stubs
>>> called from
>>> +     pro/epilogue.  */
>>> +      HARD_REG_SET stub_managed_regs;
>>> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>>>
>>> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>> +    return false;
>>> +    }
>>> +
>>>     if (crtl->drap_reg
>>>         && regno == REGNO (crtl->drap_reg)
>>>         && !cfun->machine->no_drap_save_restore)
>> This makes no sense.  The entire purpose of stub_managed_regs is to
>> cache the result of xlogue_layout::compute_stub_managed_regs() and this
>> would unnecessarily repeat that calculation for each time
>> ix86_save_reg() is called.  Since
>> xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
>> times, this makes it even worse.Which registers are being saved
>> out-of-line and inline MUST be known at the time the stack layout is
>> determined.  So stub_managed_regsshould either be left a TU static or
>> just moved to struct machine_function.
>>
>> As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
>> is calling ix86_save_reg (which isn't trivial) more often than it really
>> has to, so I've refactored it.
>>
> ((Actually ix86_save_reg is not an expensive function.
> It relies entirely on cached information.))
>
> As I told you, it is not good to rely on the pass-manager to run all
> passes for one function in a row and moreover cfun is actually a
> stack of functions, see push_cfun ().  So a static value will not
> reflect the same value as cfun->machine does.

Oh, that's cool!

> I would like to move that data to i386.h but I think
> we should not add new dependencies to i386.h because it
> is used everywhere.  The problem is HARD_REG_SET not being
> used before in any target header file, and I don't want to be
> the first one.
>
> So yes this is not a perfect solution yet, but I still doubt that
> a static value is a better solution.

Well, before I discovered that there was a HARD_REG_SET I had a 
home-cooked bitfield that we could go back to.  I know that you're right 
about "first make it right, then make it fast" and that compulsion to 
make it "absolutely right and fast" is my Achelies' heel.  I'm going to 
focus on the test fixes and other refactoring first.

>>> -/* Disables out-of-lined msabi to sysv pro/epilogues and emits a
>>> warning if
>>> -   warn_once is null, or *warn_once is zero.  */
>>> -static void disable_call_ms2sysv_xlogues (const char *feature)
>>> +/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
>>> +static void warn_once_call_ms2sysv_xlogues (const char *feature)
>>>   {
>>> -  cfun->machine->call_ms2sysv = false;
>>> -  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with
>>> %s.",
>>> -       feature);
>>> +  static bool warned_once = false;
>>> +  if (!warned_once)
>>> +    {
>>> +      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
>>> +           feature);
>>> +      warned_once = true;
>>> +    }
>>>   }
>>>
>> We probably don't want to suppress all warnings across cases.  I've got
>> a new version of this that takes a mask for the "warned":
>>
>> static void disable_call_ms2sysv_xlogues (const char *feature, int
>> warned_mask)
>> {
>>    static int warned = 0;
>>    cfun->machine->call_ms2sysv = false;
>>
>>    if (!(warned & warned_mask))
>>      {
>>        warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with
>> %s.",
>>             feature);
>>        warned |= warned_mask;
>>      }
>> }
>>
>>
> That would be a candidate for a follow-up patch.
>
> Note, that the first parameter ought to be a warning option name, if it
> is just refering to a code generation option that is IMHO mis-leading
> to the user.

Yes, I think this is just wrong.  If I'm going to issue a warning, I 
need a real warning that can be silenced.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-17 18:41         ` Bernd Edlinger
@ 2017-05-18  7:14           ` Daniel Santos
  2017-05-19  5:12             ` Daniel Santos
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Santos @ 2017-05-18  7:14 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches


On 05/17/2017 01:39 PM, Bernd Edlinger wrote:
> On 05/15/17 03:39, Daniel Santos wrote:
>
>> I should add that if you want to run faster tests just on the ms to sysv
>> abi code, you can use make RUNTESTFLAGS="ms-sysv.exp" check and then if
>> that succeeds run the full testsuite.
>>
>> Daniel
> Hmm, that's funny...
>
> If I use "make check-c RUNTESTFLAGS="ms-sysv.exp" -j8" it seems to work,
> but if I omit the -j8 it fails:
>
> make check-c RUNTESTFLAGS="ms-sysv.exp"
> ...Test Run By ed on Wed May 17 20:38:24 2017
> Native configuration is x86_64-pc-linux-gnu
>
> 		=== gcc tests ===
>
> Schedule of variations:
>       unix
>
> Running target unix
> Using /usr/share/dejagnu/baseboards/unix.exp as board description file
> for target.
> Using /usr/share/dejagnu/config/unix.exp as generic interface file for
> target.
> Using /home/ed/gnu/gcc-trunk/gcc/testsuite/config/default.exp as
> tool-and-target-specific interface file.
> Running
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp
> ...
> ERROR: tcl error sourcing
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp.
> ERROR: no such variable
>       (read trace on "env(GCC_RUNTEST_PARALLELIZE_DIR)")
>       invoked from within
> "set parallel_dir "$env(GCC_RUNTEST_PARALLELIZE_DIR)/abi-ms-sysv""
>       (file
> "/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
> line 154)
>       invoked from within
> "source
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
>       ("uplevel" body line 1)
>       invoked from within
> "uplevel #0 source
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp"
>       invoked from within
> "catch "uplevel #0 source $test_file_name""
>
> 		=== gcc Summary ===
>
> /home/ed/gnu/gcc-build/gcc/xgcc  version 8.0.0 20170514 (experimental)
> (GCC)
>
> make[2]: Leaving directory `/home/ed/gnu/gcc-build/gcc'
> make[1]: Leaving directory `/home/ed/gnu/gcc-build/gcc'

Hmm, that might be something I hadn't actually tried.  And if I run it 
in a directory where I had previously run a multi-job check it doesn't 
blow up (maybe because the directory is already there?)  Due to the 
nature of my test program, I had to break with tradition and implement 
something akin to the test that generates random structs (I forgot what 
that one is called).  It ended up breaking the bastardized 
parallelization scheme, so I had to implement my own re-bastardized 
scheme.  Looks like I can just skip parallelization if 
GCC_RUNTEST_PARALLELIZE_DIR isn't defined.

I have another Solaris test issue on PR 80759 so I'll fix that along 
with it.

Thanks,
Daniel

PS: Oh! it might be due to the difference between -j1 and no -j argument.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-17  2:12                   ` Daniel Santos
  2017-05-17 17:43                     ` Bernd Edlinger
@ 2017-05-18 13:48                     ` Bernd Edlinger
  2017-05-19  3:13                       ` Daniel Santos
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-18 13:48 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

On 05/17/17 04:01, Daniel Santos wrote:
>>
>> -  if (ignore_outlined && cfun->machine->call_ms2sysv
>> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>> -    return false;
>> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
>> +    {
>> +      /* Registers who's save & restore will be managed by stubs
>> called from
>> +     pro/epilogue.  */
>> +      HARD_REG_SET stub_managed_regs;
>> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>>
>> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>> +    return false;
>> +    }
>> +
>>    if (crtl->drap_reg
>>        && regno == REGNO (crtl->drap_reg)
>>        && !cfun->machine->no_drap_save_restore)
>
> This makes no sense.  The entire purpose of stub_managed_regs is to
> cache the result of xlogue_layout::compute_stub_managed_regs() and this
> would unnecessarily repeat that calculation for each time
> ix86_save_reg() is called.  Since
> xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
> times, this makes it even worse.Which registers are being saved
> out-of-line and inline MUST be known at the time the stack layout is
> determined.  So stub_managed_regsshould either be left a TU static or
> just moved to struct machine_function.
>
> As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
> is calling ix86_save_reg (which isn't trivial) more often than it really
> has to, so I've refactored it.
>

Well, meanwhile I think the stub_managed_regs contain zero information
and need not be saved at all, because it can easily be reconstructed
from  m->call_ms2sysv_extra_regs.

See the attached new version.  Daniel does it work for you?


Bootstrapped and reg-tested on x86_64-pc-linux-gnu with
--target_board=unix\{,-m32\}.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-frame-layout-i386.txt --]
[-- Type: text/plain, Size: 1957 bytes --]

2017-05-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/i386/i386.c (x86_64_ms_sysv_extra_clobbered_registers): Make
	static.
	(xlogue_layout::get_stack_space_used, xlogue_layout::s_instances,
	xlogue_layout::get_instance, logue_layout::xlogue_layout,
	xlogue_layout::get_stub_rtx, sp_valid_at, fp_valid_at,
	choose_basereg): Formatting.
	(xlogue_layout::get_stub_name): Avoid const-cast.
	(xlogue_layout::compute_stub_managed_regs): Rename to...
	(xlogue_layout::count_stub_managed_regs): ...this.
	(xlogue_layout::is_stub_managed_reg): New function.
	(xlogue_layout::m_stub_names): Rename to...
	(xlogue_layout::s_stub_names): ...this.
	(xlogue_layout::STUB_INDEX_OFFSET, xlogue_layout::MIN_REGS,
	xlogue_layout::MAX_REGS, xlogue_layout::MAX_EXTRA_REGS,
	xlogue_layout::VARIANT_COUNT, xlogue_layout::STUB_NAME_MAX_LEN,
	xlogue_layout::s_stub_names): Instantiate statics.
	(stub_managed_regs): Remove.
	(ix86_save_reg): Use xlogue_layout::compute_stub_managed_regs.
	(disable_call_ms2sysv_xlogues): Rename to...
	(warn_once_call_ms2sysv_xlogues): ...this, and warn only once.
	(ix86_initial_elimination_offset, ix86_expand_call): Fix call_ms2sysv
	warning logic.
	(ix86_static_chain): Make sure that ix86_static_chain_on_stack can't
	change after reload_completed.
	(ix86_can_use_return_insn_p): Use the ix86_frame data structure
	directly.
	(ix86_expand_prologue): Likewise.
	(ix86_expand_epilogue): Likewise.
	(ix86_expand_split_stack_prologue): Likewise.
	(ix86_compute_frame_layout): Remove frame parameter ...
	(TARGET_COMPUTE_FRAME_LAYOUT): ... and export it as a target hook.
	(ix86_finalize_stack_realign_flags): Call ix86_compute_frame_layout
	only if necessary.
	(ix86_init_machine_status): Don't set use_fast_prologue_epilogue_nregs.
	(ix86_frame): Move from here ...
	* config/i386/i386.h (ix86_frame): ... to here.
	(machine_function): Remove use_fast_prologue_epilogue_nregs, cache the
	complete ix86_frame data structure instead.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout-i386.diff --]
[-- Type: text/x-patch; name="patch-frame-layout-i386.diff", Size: 25862 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 248031)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+		 [NUM_X86_64_MS_CLOBBERED_REGS] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -2474,6 +2476,7 @@ class xlogue_layout {
 
   const char *get_stub_name (enum xlogue_stub stub,
 			     unsigned n_extra_args) const;
+
   /* Returns an rtx for the stub's symbol based upon
        1.) the specified stub (save, restore or restore_ret) and
        2.) the value of cfun->machine->call_ms2sysv_extra_regs and
@@ -2484,13 +2487,13 @@ class xlogue_layout {
      needs to store registers based upon data in the machine_function.  */
   HOST_WIDE_INT get_stack_space_used () const
   {
-    const struct machine_function &m = *cfun->machine;
-    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+    const struct machine_function *m = cfun->machine;
+    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
-    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
     return m_regs[last_reg].offset
-	    + (m.call_ms2sysv_pad_out ? 8 : 0)
-	    + STUB_INDEX_OFFSET;
+	   + (m->call_ms2sysv_pad_out ? 8 : 0)
+	   + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -2500,7 +2503,8 @@ class xlogue_layout {
   }
 
   static const struct xlogue_layout &get_instance ();
-  static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs);
+  static unsigned count_stub_managed_regs ();
+  static bool is_stub_managed_reg (unsigned regno, unsigned count);
 
   static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
   static const unsigned MIN_REGS = NUM_X86_64_MS_CLOBBERED_REGS;
@@ -2530,9 +2534,10 @@ class xlogue_layout {
   struct reginfo m_regs[MAX_REGS];
 
   /* Lazy-inited cache of symbol names for stubs.  */
-  char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
+  static char s_stub_names[XLOGUE_SET_COUNT][XLOGUE_STUB_COUNT]
+			  [VARIANT_COUNT][STUB_NAME_MAX_LEN];
 
-  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+  static const xlogue_layout s_instances[XLOGUE_SET_COUNT];
 };
 
 const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
@@ -2572,9 +2577,20 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
     R15_REG,	/* 0xe0		0xe8		0xd8		0xe0	*/
 };
 
+/* Instantiate static const values.  */
+const HOST_WIDE_INT xlogue_layout::STUB_INDEX_OFFSET;
+const unsigned xlogue_layout::MIN_REGS;
+const unsigned xlogue_layout::MAX_REGS;
+const unsigned xlogue_layout::MAX_EXTRA_REGS;
+const unsigned xlogue_layout::VARIANT_COUNT;
+const unsigned xlogue_layout::STUB_NAME_MAX_LEN;
+
+/* Initialize xlogue_layout::s_stub_names to zero.  */
+char xlogue_layout::s_stub_names[XLOGUE_SET_COUNT][XLOGUE_STUB_COUNT]
+				[VARIANT_COUNT][STUB_NAME_MAX_LEN];
+
 /* Instantiates all xlogue_layout instances.  */
-const struct xlogue_layout GTY(())
-xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
+const xlogue_layout xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
   xlogue_layout (0, false),
   xlogue_layout (8, false),
   xlogue_layout (0, true),
@@ -2583,7 +2599,8 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
 
 /* Return an appropriate const instance of xlogue_layout based upon values
    in cfun->machine and crtl.  */
-const struct xlogue_layout &xlogue_layout::get_instance ()
+const struct xlogue_layout &
+xlogue_layout::get_instance ()
 {
   enum xlogue_stub_sets stub_set;
   bool aligned_plus_8 = cfun->machine->call_ms2sysv_pad_in;
@@ -2600,50 +2617,54 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
   return s_instances[stub_set];
 }
 
-/* Determine which clobbered registers can be saved by the stub and store
-   them in stub_managed_regs.  Returns the count of registers the stub will
-   save and restore.  */
+/* Determine how many clobbered registers can be saved by the stub.
+   Returns the count of registers the stub will save and restore.  */
 unsigned
-xlogue_layout::compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs)
+xlogue_layout::count_stub_managed_regs ()
 {
   bool hfp = frame_pointer_needed || stack_realign_fp;
-
   unsigned i, count;
   unsigned regno;
 
-  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; ++i)
+  for (count = i = MIN_REGS; i < MAX_REGS; ++i)
     {
-      regno = x86_64_ms_sysv_extra_clobbered_registers[i];
-      if (regno == BP_REG && hfp)
-	continue;
-      if (!ix86_save_reg (regno, false, false))
-	return 0;
-    }
-
-  for (count = i = 0; i < MAX_REGS; ++i)
-    {
       regno = REG_ORDER[i];
       if (regno == BP_REG && hfp)
 	continue;
       if (!ix86_save_reg (regno, false, false))
 	break;
-      add_to_hard_reg_set (&stub_managed_regs, DImode, regno);
       ++count;
     }
-    gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
-    return count;
+  return count;
 }
 
+/* Determine if register REGNO is a stub managed register given the
+   total COUNT of stub managed registers.  */
+bool
+xlogue_layout::is_stub_managed_reg (unsigned regno, unsigned count)
+{
+  bool hfp = frame_pointer_needed || stack_realign_fp;
+  unsigned i;
+
+  for (i = 0; i < count; ++i)
+    {
+      gcc_assert (i < MAX_REGS);
+      if (REG_ORDER[i] == BP_REG && hfp)
+	++count;
+      else if (REG_ORDER[i] == regno)
+	return true;
+    }
+  return false;
+}
+
 /* Constructor for xlogue_layout.  */
 xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp)
   : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
     m_stack_align_off_in (stack_align_off_in)
 {
-  memset (m_regs, 0, sizeof (m_regs));
-  memset (m_stub_names, 0, sizeof (m_stub_names));
-
   HOST_WIDE_INT offset = stack_align_off_in;
   unsigned i, j;
+
   for (i = j = 0; i < MAX_REGS; ++i)
     {
       unsigned regno = REG_ORDER[i];
@@ -2662,14 +2683,14 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
       m_regs[j].regno    = regno;
       m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
     }
-    gcc_assert (j == m_nregs);
+  gcc_assert (j == m_nregs);
 }
 
-const char *xlogue_layout::get_stub_name (enum xlogue_stub stub,
-					  unsigned n_extra_regs) const
+const char *
+xlogue_layout::get_stub_name (enum xlogue_stub stub,
+			      unsigned n_extra_regs) const
 {
-  xlogue_layout *writey_this = const_cast<xlogue_layout*>(this);
-  char *name = writey_this->m_stub_names[stub][n_extra_regs];
+  char *name = s_stub_names[this - s_instances][stub][n_extra_regs];
 
   /* Lazy init */
   if (!*name)
@@ -2676,7 +2697,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
     {
       int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
 			  STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
-      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
+      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
     }
 
   return name;
@@ -2684,7 +2705,8 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
 
 /* Return rtx of a symbol ref for the entry point (based upon
    cfun->machine->call_ms2sysv_extra_regs) of the specified stub.  */
-rtx xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
+rtx
+xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
 {
   const unsigned n_extra_regs = cfun->machine->call_ms2sysv_extra_regs;
   gcc_checking_assert (n_extra_regs <= MAX_EXTRA_REGS);
@@ -2703,73 +2725,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- reg_save_offset
-   [padding0]
-					<- stack_realign_offset
-   [saved SSE regs]
-	OR
-   [stub-saved registers for ms x64 --> sysv clobbers
-			<- Start of out-of-line, stub-saved/restored regs
-			   (see libgcc/config/i386/(sav|res)ms64*.S)
-     [XMM6-15]
-     [RSI]
-     [RDI]
-     [?RBX]		only if RBX is clobbered
-     [?RBP]		only if RBP and RBX are clobbered
-     [?R12]		only if R12 and all previous regs are clobbered
-     [?R13]		only if R13 and all previous regs are clobbered
-     [?R14]		only if R14 and all previous regs are clobbered
-     [?R15]		only if R15 and all previous regs are clobbered
-			<- end of stub-saved/restored regs
-     [padding1]
-   ]
-					<- outlined_save_offset
-					<- sse_regs_save_offset
-   [padding2]
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
-  HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2861,7 +2816,7 @@ static unsigned int ix86_function_arg_boundary (ma
 						const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -12293,7 +12248,7 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
     return 0;
 
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
 	  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12634,10 +12589,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
-/* Registers who's save & restore will be managed by stubs called from
-   pro/epilogue.  */
-static HARD_REG_SET GTY(()) stub_managed_regs;
-
 /* Return true if register class CL should be an additional allocno
    class.  */
 
@@ -12718,9 +12669,13 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_r
 	}
     }
 
-  if (ignore_outlined && cfun->machine->call_ms2sysv
-      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
-    return false;
+  if (ignore_outlined && cfun->machine->call_ms2sysv)
+    {
+      unsigned count = cfun->machine->call_ms2sysv_extra_regs
+		       + xlogue_layout::MIN_REGS;
+      if (xlogue_layout::is_stub_managed_reg (regno, count))
+	return false;
+    }
 
   if (crtl->drap_reg
       && regno == REGNO (crtl->drap_reg)
@@ -12787,8 +12742,7 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
     return frame.hard_frame_pointer_offset;
@@ -12818,13 +12772,16 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
-/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
-   warn_once is null, or *warn_once is zero.  */
-static void disable_call_ms2sysv_xlogues (const char *feature)
+/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
+static void warn_once_call_ms2sysv_xlogues (const char *feature)
 {
-  cfun->machine->call_ms2sysv = false;
-  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
-	   feature);
+  static bool warned_once = false;
+  if (!warned_once)
+    {
+      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
+	       feature);
+      warned_once = true;
+    }
 }
 
 /* When using -fsplit-stack, the allocation routines set a field in
@@ -12836,8 +12793,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
@@ -12845,46 +12803,39 @@ static void
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
-  CLEAR_HARD_REG_SET (stub_managed_regs);
-
   /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit
    * ms_abi functions that call a sysv function.  We now need to prune away
    * cases where it should be disabled.  */
   if (TARGET_64BIT && m->call_ms2sysv)
-  {
-    gcc_assert (TARGET_64BIT_MS_ABI);
-    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
-    gcc_assert (!TARGET_SEH);
+    {
+      gcc_assert (TARGET_64BIT_MS_ABI);
+      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
+      gcc_assert (!TARGET_SEH);
+      gcc_assert (TARGET_SSE);
+      gcc_assert (!ix86_using_red_zone ());
 
-    if (!TARGET_SSE)
-      m->call_ms2sysv = false;
+      if (crtl->calls_eh_return)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
+	}
 
-    /* Don't break hot-patched functions.  */
-    else if (ix86_function_ms_hook_prologue (current_function_decl))
-      m->call_ms2sysv = false;
+      else if (ix86_static_chain_on_stack)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("static call chains");
+	}
 
-    /* TODO: Cases not yet examined.  */
-    else if (crtl->calls_eh_return)
-      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
+      /* Finally, compute which registers the stub will manage.  */
+      else
+	{
+	  unsigned count = xlogue_layout::count_stub_managed_regs ();
+	  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+	}
+    }
 
-    else if (ix86_static_chain_on_stack)
-      disable_call_ms2sysv_xlogues ("static call chains");
-
-    else if (ix86_using_red_zone ())
-      disable_call_ms2sysv_xlogues ("red zones");
-
-    else if (flag_split_stack)
-      disable_call_ms2sysv_xlogues ("split stack");
-
-    /* Finally, compute which registers the stub will manage.  */
-    else
-      {
-	unsigned count = xlogue_layout
-			 ::compute_stub_managed_regs (stub_managed_regs);
-	m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
-      }
-  }
-
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
   m->call_ms2sysv_pad_in = 0;
@@ -12916,19 +12867,11 @@ static void
      in doing anything except PUSHs.  */
   if (TARGET_SEH)
     m->use_fast_prologue_epilogue = false;
-
-  /* During reload iteration the amount of registers saved can change.
-     Recompute the value as needed.  Do not recompute when amount of registers
-     didn't change as reload does multiple calls to the function and does not
-     expect the decision to change within single iteration.  */
-  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	   && m->use_fast_prologue_epilogue_nregs != frame->nregs)
+  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
     {
       int count = frame->nregs;
       struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-      m->use_fast_prologue_epilogue_nregs = count;
-
       /* The fast prologue uses move instead of push to save registers.  This
          is significantly longer, but also executes faster as modern hardware
          can execute the moves in parallel, but can't do that for push/pop.
@@ -13145,7 +13088,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the stack pointer is valid for accessing the cfa_offset.  */
 
-static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.sp_valid && !(fs.sp_realigned
@@ -13154,7 +13098,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the frame pointer is valid for accessing the cfa_offset.  */
 
-static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
@@ -13164,9 +13109,10 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 /* Choose a base register based upon alignment requested, speed and/or
    size.  */
 
-static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
-			    HOST_WIDE_INT &base_offset,
-			    unsigned int align_reqested, unsigned int *align)
+static void
+choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+		HOST_WIDE_INT &base_offset,
+		unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
   unsigned int hfp_align;
@@ -14159,6 +14105,7 @@ ix86_finalize_stack_realign_flags (void)
        < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
 	  ? crtl->max_used_stack_slot_alignment
 	  : crtl->stack_alignment_needed));
+  bool recompute_frame_layout_p = false;
 
   if (crtl->stack_realign_finalized)
     {
@@ -14208,8 +14155,12 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
+		if (crtl->stack_realign_needed != stack_realign)
+		  recompute_frame_layout_p = true;
 		crtl->stack_realign_needed = stack_realign;
 		crtl->stack_realign_finalized = true;
+		if (recompute_frame_layout_p)
+		  ix86_compute_frame_layout ();
 		return;
 	      }
 	}
@@ -14240,10 +14191,15 @@ ix86_finalize_stack_realign_flags (void)
       df_scan_blocks ();
       df_compute_regs_ever_live (true);
       df_analyze ();
+      recompute_frame_layout_p = true;
     }
 
+  if (crtl->stack_realign_needed != stack_realign)
+    recompute_frame_layout_p = true;
   crtl->stack_realign_needed = stack_realign;
   crtl->stack_realign_finalized = true;
+  if (recompute_frame_layout_p)
+    ix86_compute_frame_layout ();
 }
 
 /* Delete SET_GOT right after entry block if it is allocated to reg.  */
@@ -14372,7 +14328,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
   m->fs.sp_realigned = false;
 
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
     {
@@ -15212,7 +15168,7 @@ ix86_expand_epilogue (int style)
   bool restore_stub_is_tail = false;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
   m->fs.sp_valid = stack_realign_fp
@@ -15757,7 +15713,7 @@ ix86_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
@@ -29320,7 +29276,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 
       /* Set here, but it may get cleared later.  */
       if (TARGET_CALL_MS2SYSV_XLOGUES)
-	cfun->machine->call_ms2sysv = true;
+	{
+	  if (!TARGET_SSE)
+	    ;
+
+	  /* Don't break hot-patched functions.  */
+	  else if (ix86_function_ms_hook_prologue (current_function_decl))
+	    ;
+
+	  /* TODO: Cases not yet examined.  */
+	  else if (flag_split_stack)
+	    warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
+
+	  else
+	    {
+	      gcc_assert (!reload_completed);
+	      cfun->machine->call_ms2sysv = true;
+	    }
+	}
     }
 
   if (vec_len > 1)
@@ -29455,7 +29428,6 @@ ix86_init_machine_status (void)
   struct machine_function *f;
 
   f = ggc_cleared_alloc<machine_function> ();
-  f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
 
   return f;
@@ -31515,8 +31487,12 @@ ix86_static_chain (const_tree fndecl_or_type, bool
 	     same once we're executing the nested function.  */
 	  if (incoming_p)
 	    {
-	      if (fndecl == current_function_decl)
-		ix86_static_chain_on_stack = true;
+	      if (fndecl == current_function_decl
+		  && !ix86_static_chain_on_stack)
+		{
+		  gcc_assert (!reload_completed);
+		  ix86_static_chain_on_stack = true;
+		}
 	      return gen_frame_mem (SImode,
 				    plus_constant (Pmode,
 						   arg_pointer_rtx, -8));
@@ -52828,6 +52804,9 @@ ix86_run_selftests (void)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT ix86_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED ix86_frame_pointer_required
 
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 248031)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2163,10 +2163,6 @@ extern int const dbx_register_map[FIRST_PSEUDO_REG
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
-#define NUM_X86_64_MS_CLOBBERED_REGS \
-  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
-
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
   gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
@@ -2448,9 +2444,76 @@ enum avx_u128_state
 \f
 #define FASTCALL_PREFIX '@'
 \f
+#ifndef USED_FOR_TARGET
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- reg_save_offset
+   [padding0]
+					<- stack_realign_offset
+   [saved SSE regs]
+	OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+			<- Start of out-of-line, stub-saved/restored regs
+			   (see libgcc/config/i386/(sav|res)ms64*.S)
+     [XMM6-15]
+     [RSI]
+     [RDI]
+     [?RBX]		only if RBX is clobbered
+     [?RBP]		only if RBP and RBX are clobbered
+     [?R12]		only if R12 and all previous regs are clobbered
+     [?R13]		only if R13 and all previous regs are clobbered
+     [?R14]		only if R14 and all previous regs are clobbered
+     [?R15]		only if R15 and all previous regs are clobbered
+			<- end of stub-saved/restored regs
+     [padding1]
+   ]
+					<- outlined_save_offset
+					<- sse_regs_save_offset
+   [padding2]
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct GTY(()) ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
 /* Machine specific frame tracking during prologue/epilogue generation.  */
 
-#ifndef USED_FOR_TARGET
 struct GTY(()) machine_frame_state
 {
   /* This pair tracks the currently active CFA as reg+offset.  When reg
@@ -2520,9 +2583,8 @@ struct GTY(()) machine_function {
   int varargs_fpr_size;
   int optimize_mode_switching[MAX_386_ENTITIES];
 
-  /* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE
-     has been computed for.  */
-  int use_fast_prologue_epilogue_nregs;
+  /* Cached initial frame layout for the current function.  */
+  struct ix86_frame frame;
 
   /* For -fsplit-stack support: A stack local which holds a pointer to
      the stack arguments for a function with a variable number of

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-18 13:48                     ` Bernd Edlinger
@ 2017-05-19  3:13                       ` Daniel Santos
  2017-05-22 18:32                         ` Bernd Edlinger
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Santos @ 2017-05-19  3:13 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Uros Bizjak

On 05/18/2017 08:37 AM, Bernd Edlinger wrote:
> On 05/17/17 04:01, Daniel Santos wrote:
>>> -  if (ignore_outlined && cfun->machine->call_ms2sysv
>>> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>> -    return false;
>>> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
>>> +    {
>>> +      /* Registers who's save & restore will be managed by stubs
>>> called from
>>> +     pro/epilogue.  */
>>> +      HARD_REG_SET stub_managed_regs;
>>> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>>>
>>> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>> +    return false;
>>> +    }
>>> +
>>>     if (crtl->drap_reg
>>>         && regno == REGNO (crtl->drap_reg)
>>>         && !cfun->machine->no_drap_save_restore)
>> This makes no sense.  The entire purpose of stub_managed_regs is to
>> cache the result of xlogue_layout::compute_stub_managed_regs() and this
>> would unnecessarily repeat that calculation for each time
>> ix86_save_reg() is called.  Since
>> xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
>> times, this makes it even worse.Which registers are being saved
>> out-of-line and inline MUST be known at the time the stack layout is
>> determined.  So stub_managed_regsshould either be left a TU static or
>> just moved to struct machine_function.
>>
>> As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
>> is calling ix86_save_reg (which isn't trivial) more often than it really
>> has to, so I've refactored it.
>>
> Well, meanwhile I think the stub_managed_regs contain zero information
> and need not be saved at all, because it can easily be reconstructed
> from  m->call_ms2sysv_extra_regs.
>
> See the attached new version.  Daniel does it work for you?

No, I'm not at all comfortable with you making so many seemingly 
unnecessary changes to this code.  (Although I wish I got this much 
feedback during my RFCs! :)  I can accept the changes to 
is/count_stub_managed_reg (with some caveats), but I do not at all see 
the rationale for changing m_stub_names to a static and adding another 
dimension for the object instance -- from an OO standpoint, that's just 
bad design.  Can you please share your rationale for that?

Incidentally, half of the space in that array is wasted and can be 
trimmed since a given instance of xlogue_layout either supports hard 
frame pointers or doesn't, I just never got around to splitting that 
apart.  (The first three enum xlogue_stub values are for without HFP and 
the last three for with.)  Also, if we wanted to further reduce the 
memory footprint of xlogue_layout objects, the offset field of struct 
reginfo could be changed to int, and if we really wanted to go nuts then 
16 bits would work for both of its fields.

So for is/count_stub_managed_reg(s), you are obviously much more 
familiar with gcc, its passes and the i386 backend, but my knowledge 
level makes me very uncomfortable having the result of 
xlogue_layout::is_stub_managed_reg() determined in a way that has the 
(apparent) potential to differ from from the state at the time the last 
call to ix86_compute_frame_layout() was made; I just don't understand 
well enough what all can change in between the last call to 
ix86_compute_frame_layout() and the last call to 
xlogue_layout::is_stub_managed_reg().  I like your 
count_stub_managed_regs() is_stub_managed_regs() from a *performance* 
standpoint (and I know I get too uptight about that kind of thing, so 
appreciate that), but as to the change in scheme, I would have to trust 
you if you assert that this will always behave consistently.

I also want to give you a little background on some of these seemingly 
repetitive computations.  One of my design goals was for the code to be 
relatively easily to adapted to the management of out-of-line 
pro/epilogue stubs for other possible scenarios where there are a lot of 
clobbers and it could be useful.  Granted, I don't have enough knowledge 
of x86 architectures to identify situations other than this one (in 
64-bit Wine) where it could be helpful and I know that x86 push/pops are 
really small.  So theoretically, struct machine_function's 
"call_ms2sysv" could be changed to something like "outline_savres" and 
any combination of clobbered registers for which there is a descent stub 
could be used if it was a good choice.  I also realize that nobody likes 
complexity that isn't being used, and I respect that.  So if you are 
comfortable with this change and you believe you understand how it works 
then I will agree to it, but I'll be trusting you well beyond my 
knowledge level and ability to confidently predict the outcome (probably 
what a programmer hates the most).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-18  7:14           ` Daniel Santos
@ 2017-05-19  5:12             ` Daniel Santos
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-19  5:12 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches


> PS: Oh! it might be due to the difference between -j1 and no -j argument.

Yes, that's how I missed it.  This flaw isn't exposed with make -j1, but 
is exposed with just make.  Thanks for finding this!

Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-19  3:13                       ` Daniel Santos
@ 2017-05-22 18:32                         ` Bernd Edlinger
  2017-05-23 14:34                           ` Bernd Edlinger
  2017-05-23 23:01                           ` Daniel Santos
  0 siblings, 2 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-22 18:32 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Uros Bizjak

On 05/19/17 05:17, Daniel Santos wrote:
> On 05/18/2017 08:37 AM, Bernd Edlinger wrote:
>> On 05/17/17 04:01, Daniel Santos wrote:
>>>> -  if (ignore_outlined && cfun->machine->call_ms2sysv
>>>> -      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>>> -    return false;
>>>> +  if (ignore_outlined && cfun->machine->call_ms2sysv)
>>>> +    {
>>>> +      /* Registers who's save & restore will be managed by stubs
>>>> called from
>>>> +     pro/epilogue.  */
>>>> +      HARD_REG_SET stub_managed_regs;
>>>> +      xlogue_layout::compute_stub_managed_regs (stub_managed_regs);
>>>>
>>>> +      if (in_hard_reg_set_p (stub_managed_regs, DImode, regno))
>>>> +    return false;
>>>> +    }
>>>> +
>>>>     if (crtl->drap_reg
>>>>         && regno == REGNO (crtl->drap_reg)
>>>>         && !cfun->machine->no_drap_save_restore)
>>> This makes no sense.  The entire purpose of stub_managed_regs is to
>>> cache the result of xlogue_layout::compute_stub_managed_regs() and this
>>> would unnecessarily repeat that calculation for each time
>>> ix86_save_reg() is called.  Since
>>> xlogue_layout::compute_stub_managed_regs() calls ix86_save_reg many
>>> times, this makes it even worse.Which registers are being saved
>>> out-of-line and inline MUST be known at the time the stack layout is
>>> determined.  So stub_managed_regsshould either be left a TU static or
>>> just moved to struct machine_function.
>>>
>>> As an aside, I've noticed that xlogue_layout::compute_stub_managed_regs
>>> is calling ix86_save_reg (which isn't trivial) more often than it really
>>> has to, so I've refactored it.
>>>
>> Well, meanwhile I think the stub_managed_regs contain zero information
>> and need not be saved at all, because it can easily be reconstructed
>> from  m->call_ms2sysv_extra_regs.
>>
>> See the attached new version.  Daniel does it work for you?
> 
> No, I'm not at all comfortable with you making so many seemingly 
> unnecessary changes to this code.  (Although I wish I got this much 
> feedback during my RFCs! :)  I can accept the changes to 
> is/count_stub_managed_reg (with some caveats), but I do not at all see 
> the rationale for changing m_stub_names to a static and adding another 
> dimension for the object instance -- from an OO standpoint, that's just 
> bad design.  Can you please share your rationale for that?
> 

Hmm, sorry about that ...
I just thought it would be nice to avoid the const-cast here.

This moved the m_stub_names from all 4 instances to one static
array s_stub_names.  But looking at it again, I think the extra
dimension is not even necessary, because all instances share the
same data, so removing that extra dimension again will be fine.

> Incidentally, half of the space in that array is wasted and can be 
> trimmed since a given instance of xlogue_layout either supports hard 
> frame pointers or doesn't, I just never got around to splitting that 
> apart.  (The first three enum xlogue_stub values are for without HFP and 
> the last three for with.)  Also, if we wanted to further reduce the 
> memory footprint of xlogue_layout objects, the offset field of struct 
> reginfo could be changed to int, and if we really wanted to go nuts then 
> 16 bits would work for both of its fields.
> 
> So for is/count_stub_managed_reg(s), you are obviously much more 
> familiar with gcc, its passes and the i386 backend, but my knowledge 
> level makes me very uncomfortable having the result of 
> xlogue_layout::is_stub_managed_reg() determined in a way that has the 
> (apparent) potential to differ from from the state at the time the last 
> call to ix86_compute_frame_layout() was made; I just don't understand 

I fund it technically difficult to add a HARD_REG_SET to
struct machine_function, and tried to avoid the extra overhead of
calling ix86_save_reg so often, which you also felt uncomfortable with.

So, if you look at compute_stub_managed_regs I first saw that the
first loop can never terminate thru the "return 0", because the
registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed
to be clobbered.  Then I saw that the bits in stub_managed_regs
are always added in the same sequence, thus the result depends only
on the number call_ms2sysv_extra_regs and hfp so everything is already
available in struct machine_function.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] [i386] Recompute the frame layout less often
  2017-05-22 18:32                         ` Bernd Edlinger
@ 2017-05-23 14:34                           ` Bernd Edlinger
  2017-05-24  6:49                             ` Daniel Santos
                                               ` (2 more replies)
  2017-05-23 23:01                           ` Daniel Santos
  1 sibling, 3 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-05-23 14:34 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Daniel Santos, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

Hi,

this is the latest version of my patch.

As already said, it attempts to compute
the frame layout only when relevant data have
changed.

Apologies for doing more clean-up on Daniel's
patch than absolutely necessary, but ...

Bootstrap and reg-tested successfully on
x86_64-pc-linux-gnu with unix\{,-m32\}.
Is it OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-frame-layout-i386.txt --]
[-- Type: text/plain, Size: 2020 bytes --]

2017-05-23  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/i386/i386.c (x86_64_ms_sysv_extra_clobbered_registers): Make
	static.
	(xlogue_layout::get_stack_space_used, xlogue_layout::s_instances,
	xlogue_layout::get_instance, logue_layout::xlogue_layout,
	sp_valid_at, fp_valid_at, choose_basereg): Formatting.
	(xlogue_layout::get_stub_rtx): Make static.
	(xlogue_layout::get_stub_name): Avoid const-cast, make static.
	(xlogue_layout::compute_stub_managed_regs): Rename to...
	(xlogue_layout::count_stub_managed_regs): ...this.
	(xlogue_layout::is_stub_managed_reg): New function.
	(xlogue_layout::m_stub_names): Rename to...
	(xlogue_layout::s_stub_names): ...this, make static.
	(xlogue_layout::STUB_INDEX_OFFSET, xlogue_layout::MIN_REGS,
	xlogue_layout::MAX_REGS, xlogue_layout::MAX_EXTRA_REGS,
	xlogue_layout::VARIANT_COUNT, xlogue_layout::STUB_NAME_MAX_LEN,
	xlogue_layout::s_stub_names): Instantiate statics.
	(stub_managed_regs): Remove.
	(ix86_save_reg): Use xlogue_layout::compute_stub_managed_regs.
	(disable_call_ms2sysv_xlogues): Rename to...
	(warn_once_call_ms2sysv_xlogues): ...this, and warn only once.
	(ix86_initial_elimination_offset, ix86_expand_call): Fix call_ms2sysv
	warning logic.
	(ix86_static_chain): Make sure that ix86_static_chain_on_stack can't
	change after reload_completed.
	(ix86_can_use_return_insn_p): Use the ix86_frame data structure
	directly.
	(ix86_expand_prologue): Likewise.
	(ix86_expand_epilogue): Likewise.
	(ix86_expand_split_stack_prologue): Likewise.
	(ix86_compute_frame_layout): Remove frame parameter ...
	(TARGET_COMPUTE_FRAME_LAYOUT): ... and export it as a target hook.
	(ix86_finalize_stack_realign_flags): Call ix86_compute_frame_layout
	only if necessary.
	(ix86_init_machine_status): Don't set use_fast_prologue_epilogue_nregs.
	(ix86_frame): Move from here ...
	* config/i386/i386.h (ix86_frame): ... to here.
	(machine_function): Remove use_fast_prologue_epilogue_nregs, cache the
	complete ix86_frame data structure instead.  Remove some_ld_name.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout-i386.diff --]
[-- Type: text/x-patch; name="patch-frame-layout-i386.diff", Size: 26252 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 248341)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2425,7 +2425,9 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
+#define NUM_X86_64_MS_CLOBBERED_REGS 12
+static int const x86_64_ms_sysv_extra_clobbered_registers
+		 [NUM_X86_64_MS_CLOBBERED_REGS] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -2472,25 +2474,26 @@ class xlogue_layout {
     return m_regs[reg];
   }
 
-  const char *get_stub_name (enum xlogue_stub stub,
-			     unsigned n_extra_args) const;
+  static const char *get_stub_name (enum xlogue_stub stub,
+				    unsigned n_extra_args);
+
   /* Returns an rtx for the stub's symbol based upon
        1.) the specified stub (save, restore or restore_ret) and
        2.) the value of cfun->machine->call_ms2sysv_extra_regs and
        3.) rather or not stack alignment is being performed.  */
-  rtx get_stub_rtx (enum xlogue_stub stub) const;
+  static rtx get_stub_rtx (enum xlogue_stub stub);
 
   /* Returns the amount of stack space (including padding) that the stub
      needs to store registers based upon data in the machine_function.  */
   HOST_WIDE_INT get_stack_space_used () const
   {
-    const struct machine_function &m = *cfun->machine;
-    unsigned last_reg = m.call_ms2sysv_extra_regs + MIN_REGS - 1;
+    const struct machine_function *m = cfun->machine;
+    unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
-    gcc_assert (m.call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
+    gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
     return m_regs[last_reg].offset
-	    + (m.call_ms2sysv_pad_out ? 8 : 0)
-	    + STUB_INDEX_OFFSET;
+	   + (m->call_ms2sysv_pad_out ? 8 : 0)
+	   + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -2500,7 +2503,8 @@ class xlogue_layout {
   }
 
   static const struct xlogue_layout &get_instance ();
-  static unsigned compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs);
+  static unsigned count_stub_managed_regs ();
+  static bool is_stub_managed_reg (unsigned regno, unsigned count);
 
   static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
   static const unsigned MIN_REGS = NUM_X86_64_MS_CLOBBERED_REGS;
@@ -2530,9 +2534,10 @@ class xlogue_layout {
   struct reginfo m_regs[MAX_REGS];
 
   /* Lazy-inited cache of symbol names for stubs.  */
-  char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
+  static char s_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT]
+			  [STUB_NAME_MAX_LEN];
 
-  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+  static const xlogue_layout s_instances[XLOGUE_SET_COUNT];
 };
 
 const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
@@ -2572,9 +2577,20 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
     R15_REG,	/* 0xe0		0xe8		0xd8		0xe0	*/
 };
 
+/* Instantiate static const values.  */
+const HOST_WIDE_INT xlogue_layout::STUB_INDEX_OFFSET;
+const unsigned xlogue_layout::MIN_REGS;
+const unsigned xlogue_layout::MAX_REGS;
+const unsigned xlogue_layout::MAX_EXTRA_REGS;
+const unsigned xlogue_layout::VARIANT_COUNT;
+const unsigned xlogue_layout::STUB_NAME_MAX_LEN;
+
+/* Initialize xlogue_layout::s_stub_names to zero.  */
+char xlogue_layout::s_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT]
+				[STUB_NAME_MAX_LEN];
+
 /* Instantiates all xlogue_layout instances.  */
-const struct xlogue_layout GTY(())
-xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
+const xlogue_layout xlogue_layout::s_instances[XLOGUE_SET_COUNT] = {
   xlogue_layout (0, false),
   xlogue_layout (8, false),
   xlogue_layout (0, true),
@@ -2583,7 +2599,8 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
 
 /* Return an appropriate const instance of xlogue_layout based upon values
    in cfun->machine and crtl.  */
-const struct xlogue_layout &xlogue_layout::get_instance ()
+const struct xlogue_layout &
+xlogue_layout::get_instance ()
 {
   enum xlogue_stub_sets stub_set;
   bool aligned_plus_8 = cfun->machine->call_ms2sysv_pad_in;
@@ -2600,50 +2617,54 @@ const unsigned xlogue_layout::REG_ORDER[xlogue_lay
   return s_instances[stub_set];
 }
 
-/* Determine which clobbered registers can be saved by the stub and store
-   them in stub_managed_regs.  Returns the count of registers the stub will
-   save and restore.  */
+/* Determine how many clobbered registers can be saved by the stub.
+   Returns the count of registers the stub will save and restore.  */
 unsigned
-xlogue_layout::compute_stub_managed_regs (HARD_REG_SET &stub_managed_regs)
+xlogue_layout::count_stub_managed_regs ()
 {
   bool hfp = frame_pointer_needed || stack_realign_fp;
-
   unsigned i, count;
   unsigned regno;
 
-  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; ++i)
+  for (count = i = MIN_REGS; i < MAX_REGS; ++i)
     {
-      regno = x86_64_ms_sysv_extra_clobbered_registers[i];
-      if (regno == BP_REG && hfp)
-	continue;
-      if (!ix86_save_reg (regno, false, false))
-	return 0;
-    }
-
-  for (count = i = 0; i < MAX_REGS; ++i)
-    {
       regno = REG_ORDER[i];
       if (regno == BP_REG && hfp)
 	continue;
       if (!ix86_save_reg (regno, false, false))
 	break;
-      add_to_hard_reg_set (&stub_managed_regs, DImode, regno);
       ++count;
     }
-    gcc_assert (count >= MIN_REGS && count <= MAX_REGS);
-    return count;
+  return count;
 }
 
+/* Determine if register REGNO is a stub managed register given the
+   total COUNT of stub managed registers.  */
+bool
+xlogue_layout::is_stub_managed_reg (unsigned regno, unsigned count)
+{
+  bool hfp = frame_pointer_needed || stack_realign_fp;
+  unsigned i;
+
+  for (i = 0; i < count; ++i)
+    {
+      gcc_assert (i < MAX_REGS);
+      if (REG_ORDER[i] == BP_REG && hfp)
+	++count;
+      else if (REG_ORDER[i] == regno)
+	return true;
+    }
+  return false;
+}
+
 /* Constructor for xlogue_layout.  */
 xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp)
   : m_hfp (hfp) , m_nregs (hfp ? 17 : 18),
     m_stack_align_off_in (stack_align_off_in)
 {
-  memset (m_regs, 0, sizeof (m_regs));
-  memset (m_stub_names, 0, sizeof (m_stub_names));
-
   HOST_WIDE_INT offset = stack_align_off_in;
   unsigned i, j;
+
   for (i = j = 0; i < MAX_REGS; ++i)
     {
       unsigned regno = REG_ORDER[i];
@@ -2662,14 +2683,14 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
       m_regs[j].regno    = regno;
       m_regs[j++].offset = offset - STUB_INDEX_OFFSET;
     }
-    gcc_assert (j == m_nregs);
+  gcc_assert (j == m_nregs);
 }
 
-const char *xlogue_layout::get_stub_name (enum xlogue_stub stub,
-					  unsigned n_extra_regs) const
+const char *
+xlogue_layout::get_stub_name (enum xlogue_stub stub,
+			      unsigned n_extra_regs)
 {
-  xlogue_layout *writey_this = const_cast<xlogue_layout*>(this);
-  char *name = writey_this->m_stub_names[stub][n_extra_regs];
+  char *name = s_stub_names[stub][n_extra_regs];
 
   /* Lazy init */
   if (!*name)
@@ -2676,7 +2697,7 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
     {
       int res = snprintf (name, STUB_NAME_MAX_LEN, "__%s_%u",
 			  STUB_BASE_NAMES[stub], MIN_REGS + n_extra_regs);
-      gcc_checking_assert (res <= (int)STUB_NAME_MAX_LEN);
+      gcc_checking_assert (res < (int)STUB_NAME_MAX_LEN);
     }
 
   return name;
@@ -2684,7 +2705,8 @@ xlogue_layout::xlogue_layout (HOST_WIDE_INT stack_
 
 /* Return rtx of a symbol ref for the entry point (based upon
    cfun->machine->call_ms2sysv_extra_regs) of the specified stub.  */
-rtx xlogue_layout::get_stub_rtx (enum xlogue_stub stub) const
+rtx
+xlogue_layout::get_stub_rtx (enum xlogue_stub stub)
 {
   const unsigned n_extra_regs = cfun->machine->call_ms2sysv_extra_regs;
   gcc_checking_assert (n_extra_regs <= MAX_EXTRA_REGS);
@@ -2703,73 +2725,6 @@ struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- reg_save_offset
-   [padding0]
-					<- stack_realign_offset
-   [saved SSE regs]
-	OR
-   [stub-saved registers for ms x64 --> sysv clobbers
-			<- Start of out-of-line, stub-saved/restored regs
-			   (see libgcc/config/i386/(sav|res)ms64*.S)
-     [XMM6-15]
-     [RSI]
-     [RDI]
-     [?RBX]		only if RBX is clobbered
-     [?RBP]		only if RBP and RBX are clobbered
-     [?R12]		only if R12 and all previous regs are clobbered
-     [?R13]		only if R13 and all previous regs are clobbered
-     [?R14]		only if R14 and all previous regs are clobbered
-     [?R15]		only if R15 and all previous regs are clobbered
-			<- end of stub-saved/restored regs
-     [padding1]
-   ]
-					<- outlined_save_offset
-					<- sse_regs_save_offset
-   [padding2]
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT hfp_save_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
-  HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Which cpu are we scheduling for.  */
 enum attr_cpu ix86_schedule;
 
@@ -2861,7 +2816,7 @@ static unsigned int ix86_function_arg_boundary (ma
 						const_tree);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
+static void ix86_compute_frame_layout (void);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
@@ -12293,7 +12248,7 @@ ix86_can_use_return_insn_p (void)
   if (crtl->args.pops_args && crtl->args.size >= 32768)
     return 0;
 
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   return (frame.stack_pointer_offset == UNITS_PER_WORD
 	  && (frame.nregs + frame.nsseregs) == 0);
 }
@@ -12634,10 +12589,6 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
-/* Registers who's save & restore will be managed by stubs called from
-   pro/epilogue.  */
-static HARD_REG_SET GTY(()) stub_managed_regs;
-
 /* Return true if register class CL should be an additional allocno
    class.  */
 
@@ -12718,9 +12669,13 @@ ix86_save_reg (unsigned int regno, bool maybe_eh_r
 	}
     }
 
-  if (ignore_outlined && cfun->machine->call_ms2sysv
-      && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
-    return false;
+  if (ignore_outlined && cfun->machine->call_ms2sysv)
+    {
+      unsigned count = cfun->machine->call_ms2sysv_extra_regs
+		       + xlogue_layout::MIN_REGS;
+      if (xlogue_layout::is_stub_managed_reg (regno, count))
+	return false;
+    }
 
   if (crtl->drap_reg
       && regno == REGNO (crtl->drap_reg)
@@ -12787,8 +12742,7 @@ ix86_can_eliminate (const int from, const int to)
 HOST_WIDE_INT
 ix86_initial_elimination_offset (int from, int to)
 {
-  struct ix86_frame frame;
-  ix86_compute_frame_layout (&frame);
+  struct ix86_frame frame = cfun->machine->frame;
 
   if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM)
     return frame.hard_frame_pointer_offset;
@@ -12818,13 +12772,16 @@ ix86_builtin_setjmp_frame_value (void)
   return stack_realign_fp ? hard_frame_pointer_rtx : virtual_stack_vars_rtx;
 }
 
-/* Disables out-of-lined msabi to sysv pro/epilogues and emits a warning if
-   warn_once is null, or *warn_once is zero.  */
-static void disable_call_ms2sysv_xlogues (const char *feature)
+/* Emits a warning for unsupported msabi to sysv pro/epilogues.  */
+static void warn_once_call_ms2sysv_xlogues (const char *feature)
 {
-  cfun->machine->call_ms2sysv = false;
-  warning (OPT_mcall_ms2sysv_xlogues, "not currently compatible with %s.",
-	   feature);
+  static bool warned_once = false;
+  if (!warned_once)
+    {
+      warning (0, "-mcall-ms2sysv-xlogues is not compatible with %s",
+	       feature);
+      warned_once = true;
+    }
 }
 
 /* When using -fsplit-stack, the allocation routines set a field in
@@ -12836,8 +12793,9 @@ ix86_builtin_setjmp_frame_value (void)
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
 static void
-ix86_compute_frame_layout (struct ix86_frame *frame)
+ix86_compute_frame_layout (void)
 {
+  struct ix86_frame *frame = &cfun->machine->frame;
   struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
@@ -12845,46 +12803,39 @@ static void
   HOST_WIDE_INT size = get_frame_size ();
   HOST_WIDE_INT to_allocate;
 
-  CLEAR_HARD_REG_SET (stub_managed_regs);
-
   /* m->call_ms2sysv is initially enabled in ix86_expand_call for all 64-bit
    * ms_abi functions that call a sysv function.  We now need to prune away
    * cases where it should be disabled.  */
   if (TARGET_64BIT && m->call_ms2sysv)
-  {
-    gcc_assert (TARGET_64BIT_MS_ABI);
-    gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
-    gcc_assert (!TARGET_SEH);
+    {
+      gcc_assert (TARGET_64BIT_MS_ABI);
+      gcc_assert (TARGET_CALL_MS2SYSV_XLOGUES);
+      gcc_assert (!TARGET_SEH);
+      gcc_assert (TARGET_SSE);
+      gcc_assert (!ix86_using_red_zone ());
 
-    if (!TARGET_SSE)
-      m->call_ms2sysv = false;
+      if (crtl->calls_eh_return)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("__builtin_eh_return");
+	}
 
-    /* Don't break hot-patched functions.  */
-    else if (ix86_function_ms_hook_prologue (current_function_decl))
-      m->call_ms2sysv = false;
+      else if (ix86_static_chain_on_stack)
+	{
+	  gcc_assert (!reload_completed);
+	  m->call_ms2sysv = false;
+	  warn_once_call_ms2sysv_xlogues ("static call chains");
+	}
 
-    /* TODO: Cases not yet examined.  */
-    else if (crtl->calls_eh_return)
-      disable_call_ms2sysv_xlogues ("__builtin_eh_return");
+      /* Finally, compute which registers the stub will manage.  */
+      else
+	{
+	  unsigned count = xlogue_layout::count_stub_managed_regs ();
+	  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+	}
+    }
 
-    else if (ix86_static_chain_on_stack)
-      disable_call_ms2sysv_xlogues ("static call chains");
-
-    else if (ix86_using_red_zone ())
-      disable_call_ms2sysv_xlogues ("red zones");
-
-    else if (flag_split_stack)
-      disable_call_ms2sysv_xlogues ("split stack");
-
-    /* Finally, compute which registers the stub will manage.  */
-    else
-      {
-	unsigned count = xlogue_layout
-			 ::compute_stub_managed_regs (stub_managed_regs);
-	m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
-      }
-  }
-
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
   m->call_ms2sysv_pad_in = 0;
@@ -12916,19 +12867,11 @@ static void
      in doing anything except PUSHs.  */
   if (TARGET_SEH)
     m->use_fast_prologue_epilogue = false;
-
-  /* During reload iteration the amount of registers saved can change.
-     Recompute the value as needed.  Do not recompute when amount of registers
-     didn't change as reload does multiple calls to the function and does not
-     expect the decision to change within single iteration.  */
-  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-	   && m->use_fast_prologue_epilogue_nregs != frame->nregs)
+  else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
     {
       int count = frame->nregs;
       struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-      m->use_fast_prologue_epilogue_nregs = count;
-
       /* The fast prologue uses move instead of push to save registers.  This
          is significantly longer, but also executes faster as modern hardware
          can execute the moves in parallel, but can't do that for push/pop.
@@ -13145,7 +13088,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the stack pointer is valid for accessing the cfa_offset.  */
 
-static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.sp_valid && !(fs.sp_realigned
@@ -13154,7 +13098,8 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 
 /* Determine if the frame pointer is valid for accessing the cfa_offset.  */
 
-static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+static inline bool
+fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
   return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
@@ -13164,9 +13109,10 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE
 /* Choose a base register based upon alignment requested, speed and/or
    size.  */
 
-static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
-			    HOST_WIDE_INT &base_offset,
-			    unsigned int align_reqested, unsigned int *align)
+static void
+choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+		HOST_WIDE_INT &base_offset,
+		unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
   unsigned int hfp_align;
@@ -14159,6 +14105,7 @@ ix86_finalize_stack_realign_flags (void)
        < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
 	  ? crtl->max_used_stack_slot_alignment
 	  : crtl->stack_alignment_needed));
+  bool recompute_frame_layout_p = false;
 
   if (crtl->stack_realign_finalized)
     {
@@ -14208,8 +14155,12 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
+		if (crtl->stack_realign_needed != stack_realign)
+		  recompute_frame_layout_p = true;
 		crtl->stack_realign_needed = stack_realign;
 		crtl->stack_realign_finalized = true;
+		if (recompute_frame_layout_p)
+		  ix86_compute_frame_layout ();
 		return;
 	      }
 	}
@@ -14240,10 +14191,15 @@ ix86_finalize_stack_realign_flags (void)
       df_scan_blocks ();
       df_compute_regs_ever_live (true);
       df_analyze ();
+      recompute_frame_layout_p = true;
     }
 
+  if (crtl->stack_realign_needed != stack_realign)
+    recompute_frame_layout_p = true;
   crtl->stack_realign_needed = stack_realign;
   crtl->stack_realign_finalized = true;
+  if (recompute_frame_layout_p)
+    ix86_compute_frame_layout ();
 }
 
 /* Delete SET_GOT right after entry block if it is allocated to reg.  */
@@ -14372,7 +14328,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
   m->fs.sp_realigned = false;
 
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
     {
@@ -15212,7 +15168,7 @@ ix86_expand_epilogue (int style)
   bool restore_stub_is_tail = false;
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = m->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
   m->fs.sp_valid = stack_realign_fp
@@ -15757,7 +15713,7 @@ ix86_expand_split_stack_prologue (void)
   gcc_assert (flag_split_stack && reload_completed);
 
   ix86_finalize_stack_realign_flags ();
-  ix86_compute_frame_layout (&frame);
+  frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
   /* This is the label we will branch to if we have enough stack
@@ -29326,7 +29282,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 
       /* Set here, but it may get cleared later.  */
       if (TARGET_CALL_MS2SYSV_XLOGUES)
-	cfun->machine->call_ms2sysv = true;
+	{
+	  if (!TARGET_SSE)
+	    ;
+
+	  /* Don't break hot-patched functions.  */
+	  else if (ix86_function_ms_hook_prologue (current_function_decl))
+	    ;
+
+	  /* TODO: Cases not yet examined.  */
+	  else if (flag_split_stack)
+	    warn_once_call_ms2sysv_xlogues ("-fsplit-stack");
+
+	  else
+	    {
+	      gcc_assert (!reload_completed);
+	      cfun->machine->call_ms2sysv = true;
+	    }
+	}
     }
 
   if (vec_len > 1)
@@ -29461,7 +29434,6 @@ ix86_init_machine_status (void)
   struct machine_function *f;
 
   f = ggc_cleared_alloc<machine_function> ();
-  f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
 
   return f;
@@ -31521,8 +31493,12 @@ ix86_static_chain (const_tree fndecl_or_type, bool
 	     same once we're executing the nested function.  */
 	  if (incoming_p)
 	    {
-	      if (fndecl == current_function_decl)
-		ix86_static_chain_on_stack = true;
+	      if (fndecl == current_function_decl
+		  && !ix86_static_chain_on_stack)
+		{
+		  gcc_assert (!reload_completed);
+		  ix86_static_chain_on_stack = true;
+		}
 	      return gen_frame_mem (SImode,
 				    plus_constant (Pmode,
 						   arg_pointer_rtx, -8));
@@ -52834,6 +52810,9 @@ ix86_run_selftests (void)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ix86_legitimate_constant_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT ix86_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED ix86_frame_pointer_required
 
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 248341)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2163,10 +2163,6 @@ extern int const dbx_register_map[FIRST_PSEUDO_REG
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
-#define NUM_X86_64_MS_CLOBBERED_REGS \
-  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
-
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
   gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
@@ -2448,9 +2444,76 @@ enum avx_u128_state
 \f
 #define FASTCALL_PREFIX '@'
 \f
+#ifndef USED_FOR_TARGET
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- reg_save_offset
+   [padding0]
+					<- stack_realign_offset
+   [saved SSE regs]
+	OR
+   [stub-saved registers for ms x64 --> sysv clobbers
+			<- Start of out-of-line, stub-saved/restored regs
+			   (see libgcc/config/i386/(sav|res)ms64*.S)
+     [XMM6-15]
+     [RSI]
+     [RDI]
+     [?RBX]		only if RBX is clobbered
+     [?RBP]		only if RBP and RBX are clobbered
+     [?R12]		only if R12 and all previous regs are clobbered
+     [?R13]		only if R13 and all previous regs are clobbered
+     [?R14]		only if R14 and all previous regs are clobbered
+     [?R15]		only if R15 and all previous regs are clobbered
+			<- end of stub-saved/restored regs
+     [padding1]
+   ]
+					<- outlined_save_offset
+					<- sse_regs_save_offset
+   [padding2]
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct GTY(()) ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
+  HOST_WIDE_INT outlined_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
 /* Machine specific frame tracking during prologue/epilogue generation.  */
 
-#ifndef USED_FOR_TARGET
 struct GTY(()) machine_frame_state
 {
   /* This pair tracks the currently active CFA as reg+offset.  When reg
@@ -2515,14 +2578,12 @@ enum function_type
 
 struct GTY(()) machine_function {
   struct stack_local_entry *stack_locals;
-  const char *some_ld_name;
   int varargs_gpr_size;
   int varargs_fpr_size;
   int optimize_mode_switching[MAX_386_ENTITIES];
 
-  /* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE
-     has been computed for.  */
-  int use_fast_prologue_epilogue_nregs;
+  /* Cached initial frame layout for the current function.  */
+  struct ix86_frame frame;
 
   /* For -fsplit-stack support: A stack local which holds a pointer to
      the stack arguments for a function with a variable number of

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-22 18:32                         ` Bernd Edlinger
  2017-05-23 14:34                           ` Bernd Edlinger
@ 2017-05-23 23:01                           ` Daniel Santos
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-23 23:01 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches, Uros Bizjak

On 05/22/2017 01:32 PM, Bernd Edlinger wrote:
> On 05/19/17 05:17, Daniel Santos wrote:
>
>> No, I'm not at all comfortable with you making so many seemingly
>> unnecessary changes to this code.  (Although I wish I got this much
>> feedback during my RFCs! :)  I can accept the changes to
>> is/count_stub_managed_reg (with some caveats), but I do not at all see
>> the rationale for changing m_stub_names to a static and adding another
>> dimension for the object instance -- from an OO standpoint, that's just
>> bad design.  Can you please share your rationale for that?
>>
> Hmm, sorry about that ...
> I just thought it would be nice to avoid the const-cast here.

Well remember const-correctness isn't about an object's internal 
(bitwise) state, but it's externally visible (logical) state.  So a 
const member function need not avoid altering it's internal state if the 
externally visible state remains unchanged, such as when caching some 
result or lazy initing.  I have tended to prefer using const_cast for 
this, isolating its use to a single const accessor function (or if () 
block) to leave less room for the data members to be accidentally 
altered in another const member function.  But mutable is generally 
preferred over const_cast, which opens up the danger of accidentally 
modifying an object's logical state (especially by a subsequent edit), 
so using mutable is probably a better practice anyway.

However, ...

> This moved the m_stub_names from all 4 instances to one static
> array s_stub_names.  But looking at it again, I think the extra
> dimension is not even necessary, because all instances share the
> same data, so removing that extra dimension again will be fine.

You are correct!  And I see that you're new patch has already changed 
get_stub_name to a static member function, so great!

>> Incidentally, half of the space in that array is wasted and can be
>> trimmed since a given instance of xlogue_layout either supports hard
>> frame pointers or doesn't, I just never got around to splitting that
>> apart.  (The first three enum xlogue_stub values are for without HFP and
>> the last three for with.)  Also, if we wanted to further reduce the
>> memory footprint of xlogue_layout objects, the offset field of struct
>> reginfo could be changed to int, and if we really wanted to go nuts then
>> 16 bits would work for both of its fields.
>>
>> So for is/count_stub_managed_reg(s), you are obviously much more
>> familiar with gcc, its passes and the i386 backend, but my knowledge
>> level makes me very uncomfortable having the result of
>> xlogue_layout::is_stub_managed_reg() determined in a way that has the
>> (apparent) potential to differ from from the state at the time the last
>> call to ix86_compute_frame_layout() was made; I just don't understand
> I fund it technically difficult to add a HARD_REG_SET to
> struct machine_function, and tried to avoid the extra overhead of
> calling ix86_save_reg so often, which you also felt uncomfortable with.
>
> So, if you look at compute_stub_managed_regs I first saw that the
> first loop can never terminate thru the "return 0", because the
> registers in x86_64_ms_sysv_extra_clobbered_registers are guaranteed
> to be clobbered.  Then I saw that the bits in stub_managed_regs
> are always added in the same sequence, thus the result depends only
> on the number call_ms2sysv_extra_regs and hfp so everything is already
> available in struct machine_function.
>
>
> Thanks
> Bernd.

Yes, I agree with how you have refactored compute_stub_managed_regs 
given your rationale of not adding another header dependency to i386.h.  
It is only the overall scheme of calculating this outside of 
ix86_compute_frame_layout that I cannot validate due to my not having a 
good understanding of what can and cannot change in between the time 
that ix86_compute_frame_layout is last called and the last call to 
is_stub_managed_regs().

As Uros said, my patch set touches a "delicate part of the compiler, 
where lots of code-paths cross each other (and we have had quite some 
hard-to-fix bugs in this area)" 
(https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01924.html).  I wrote it 
the way I did with my understanding of what was safe to do and your 
alterations move it's functionality outside of that understanding.  So 
if you say that this won't break it, then I will have to trust you (and 
the testsuite) on that.

On that note, the tests are undergoing some change and bug fixes and I'm 
planning on adding more tests to validate non-breakage with various 
other stack frame-related options and probably additional tests (and 
test options) triggered by GCC_TEST_RUN_EXPENSIVE or some such.

Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-23 14:34                           ` Bernd Edlinger
@ 2017-05-24  6:49                             ` Daniel Santos
  2017-06-01 16:06                             ` [PING][PATCH] " Bernd Edlinger
  2017-06-01 18:18                             ` [PATCH] " Uros Bizjak
  2 siblings, 0 replies; 29+ messages in thread
From: Daniel Santos @ 2017-05-24  6:49 UTC (permalink / raw)
  To: Bernd Edlinger, Uros Bizjak; +Cc: gcc-patches

On 05/23/2017 09:31 AM, Bernd Edlinger wrote:
> Hi,
>
> this is the latest version of my patch.
>
> As already said, it attempts to compute
> the frame layout only when relevant data have
> changed.
>
> Apologies for doing more clean-up on Daniel's
> patch than absolutely necessary, but ...
>
> Bootstrap and reg-tested successfully on
> x86_64-pc-linux-gnu with unix\{,-m32\}.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.

OK with me.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PING][PATCH] [i386] Recompute the frame layout less often
  2017-05-23 14:34                           ` Bernd Edlinger
  2017-05-24  6:49                             ` Daniel Santos
@ 2017-06-01 16:06                             ` Bernd Edlinger
  2017-06-01 18:18                             ` [PATCH] " Uros Bizjak
  2 siblings, 0 replies; 29+ messages in thread
From: Bernd Edlinger @ 2017-06-01 16:06 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Daniel Santos, gcc-patches

Ping...

the latest version of this patch was posted here:
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01758.html


Thanks
Bernd.

On 05/23/17 16:31, Bernd Edlinger wrote:
> Hi,
> 
> this is the latest version of my patch.
> 
> As already said, it attempts to compute
> the frame layout only when relevant data have
> changed.
> 
> Apologies for doing more clean-up on Daniel's
> patch than absolutely necessary, but ...
> 
> Bootstrap and reg-tested successfully on
> x86_64-pc-linux-gnu with unix\{,-m32\}.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] [i386] Recompute the frame layout less often
  2017-05-23 14:34                           ` Bernd Edlinger
  2017-05-24  6:49                             ` Daniel Santos
  2017-06-01 16:06                             ` [PING][PATCH] " Bernd Edlinger
@ 2017-06-01 18:18                             ` Uros Bizjak
  2 siblings, 0 replies; 29+ messages in thread
From: Uros Bizjak @ 2017-06-01 18:18 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Daniel Santos, gcc-patches

On Tue, May 23, 2017 at 4:31 PM, Bernd Edlinger
<bernd.edlinger@hotmail.de> wrote:
> Hi,
>
> this is the latest version of my patch.
>
> As already said, it attempts to compute
> the frame layout only when relevant data have
> changed.
>
> Apologies for doing more clean-up on Daniel's
> patch than absolutely necessary, but ...
>
> Bootstrap and reg-tested successfully on
> x86_64-pc-linux-gnu with unix\{,-m32\}.
> Is it OK for trunk?

It is hard to review a patch that mixes cleanup and functionality changes...

LGTM, so OK for trunk.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-06-01 18:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-14  9:11 [PATCH] [i386] Recompute the frame layout less often Bernd Edlinger
2017-05-14  9:58 ` Daniel Santos
2017-05-14 10:27   ` Uros Bizjak
2017-05-14 18:11     ` Bernd Edlinger
2017-05-14 20:37       ` Daniel Santos
2017-05-15  2:23       ` Daniel Santos
2017-05-15 20:41         ` Bernd Edlinger
2017-05-15 23:52           ` Daniel Santos
2017-05-16  5:42             ` Daniel Santos
2017-05-16  8:52               ` Bernd Edlinger
2017-05-16 15:11                 ` Daniel Santos
2017-05-16 17:38               ` Ian Lance Taylor via gcc-patches
2017-05-16 20:10                 ` Bernd Edlinger
2017-05-16 21:05                   ` Bernd Edlinger
2017-05-17  2:12                   ` Daniel Santos
2017-05-17 17:43                     ` Bernd Edlinger
2017-05-18  7:05                       ` Daniel Santos
2017-05-18 13:48                     ` Bernd Edlinger
2017-05-19  3:13                       ` Daniel Santos
2017-05-22 18:32                         ` Bernd Edlinger
2017-05-23 14:34                           ` Bernd Edlinger
2017-05-24  6:49                             ` Daniel Santos
2017-06-01 16:06                             ` [PING][PATCH] " Bernd Edlinger
2017-06-01 18:18                             ` [PATCH] " Uros Bizjak
2017-05-23 23:01                           ` Daniel Santos
2017-05-16 21:35                 ` Daniel Santos
2017-05-17 18:41         ` Bernd Edlinger
2017-05-18  7:14           ` Daniel Santos
2017-05-19  5:12             ` Daniel Santos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).