public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Use aligned SSE movs for re-aligned MS ABI pro/epilogues
@ 2016-12-22 20:27 Daniel Santos
  2016-12-22 20:55 ` [PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves Daniel Santos
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Daniel Santos @ 2016-12-22 20:27 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

According to the Microsoft 64-bit ABI specification, registers RDI, RSI 
and XMM6-15 are non-volatile and the stack alignment is 16 bytes.  In 
practice, the Windows implementation appears to not be so picky about 
the 16-byte alignment requirement, probably because it never to save SSE 
registers and instead just never uses them.  This led to a large list 
(https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs 
violating the ABI with impunity, but crashing in Wine until 
force_align_arg_pointer was added to gcc and used in Wine.

Stack re-alignment was originally done prior to int register saves, but 
was moved to after SSE saves in 2010 to better facilitate 
parallelization, and for simplicity's sake, the stack pointer was 
considered invalid after stack re-alignment and SSE movs were emitted 
unaligned relative to the frame pointer.  But now that forced stack 
re-alignment is the new normal for Wine64, it means that it always gets 
the unaligned movs in Wine. This patch set fixes the problem while 
preserving the improved parallelization of int register saves of Richard 
Henderson's patch in 2010.

This patchset is a prerequisite to another I'm still refining that 
out-of-lines these pro/epilogues. I'm still pretty new to this project, 
so I hope I haven't missed anything. (No additional failures in tests.)

Daniel Santos

[-- Attachment #2: ChangeLog-aligned-see-movs --]
[-- Type: text/plain, Size: 940 bytes --]


2016-12-21  Daniel Santos  <daniel.santos@pobox.com>

	* config/i386/i386.h (struct machine_frame_state): New fields
	sp_realigned and sp_realigned_offset.

	* config/i386/i386.c
	(struct ix86_frame): New fields stack_realign_allocate_offset and
	stack_realign_offset.
	(ix86_compute_frame_layout): Modify re-alignment calculations.
	(sp_valid_at, fp_valid_at): New inline functions.
	(choose_basereg): New function.
	(choose_baseaddr): Add align parameter, use choose_basereg and modify
	all callers.
	(ix86_emit_save_reg_using_mov, ix86_emit_restore_sse_regs_using_mov):
	Use align parameter of choose_baseaddr to generated aligned SSE movs
	when possible.
	(pro_epilogue_adjust_stack): Modify to track
	machine_frame_state::sp_realigned.
	(ix86_expand_prologue): Modify stack re-alignment code.
	(ix86_emit_leave): Clear machine_frame_state::sp_realigned.
	(ix86_expand_epilogue): Modify validity checks of frame and stack
	pointers.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment.
  2016-12-22 20:27 Use aligned SSE movs for re-aligned MS ABI pro/epilogues Daniel Santos
  2016-12-22 20:55 ` [PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves Daniel Santos
@ 2016-12-22 20:55 ` Daniel Santos
  2016-12-22 22:26 ` [PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs Daniel Santos
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Santos @ 2016-12-22 20:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: Daniel Santos

This stage adds the fields sp_realigned and sp_realigned_offset to
struct machine_frame_state and adds the concept of the stack pointer
being re-aligned rather than invalid.  The inline functions sp_valid_at
and fp_valid_at are added to test if a given location relative to the
CFA can be accessed with the stack or frame pointer, respectively.

Stack allocation prior to re-alignment is modified so that we allocate
what is needed, but don't allocate unneeded space in the event that no
SSE registers are saved, but frame.sse_reg_save_offset is increased for
alignment.

As this change only alters how SSE registers are saved, moving the
re-alignment AND should not hinder parallelization of int register saves.
---
 gcc/config/i386/i386.c | 69 ++++++++++++++++++++++++++++++++++++--------------
 gcc/config/i386/i386.h | 12 +++++++++
 2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7f7389cbe31..b5f9f36094f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12604,6 +12604,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset)
   return len;
 }
 
+/* Determine if the stack pointer is valid for accessing the cfa_offset.  */
+
+static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.sp_valid && !(fs.sp_realigned
+			  && cfa_offset < fs.sp_realigned_offset);
+}
+
+/* Determine if the frame pointer is valid for accessing the cfa_offset.  */
+
+static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
+			  && cfa_offset >= fs.sp_realigned_offset);
+}
+
 /* Return an RTX that points to CFA_OFFSET within the stack frame.
    The valid base registers are taken from CFUN->MACHINE->FS.  */
 
@@ -12902,15 +12920,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
     {
       HOST_WIDE_INT ooffset = m->fs.sp_offset;
       bool valid = m->fs.sp_valid;
+      bool realigned = m->fs.sp_realigned;
 
       if (src == hard_frame_pointer_rtx)
 	{
 	  valid = m->fs.fp_valid;
+	  realigned = false;
 	  ooffset = m->fs.fp_offset;
 	}
       else if (src == crtl->drap_reg)
 	{
 	  valid = m->fs.drap_valid;
+	  realigned = false;
 	  ooffset = 0;
 	}
       else
@@ -12924,6 +12945,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
 
       m->fs.sp_offset = ooffset - INTVAL (offset);
       m->fs.sp_valid = valid;
+      m->fs.sp_realigned = realigned;
     }
 }
 
@@ -13673,6 +13695,7 @@ ix86_expand_prologue (void)
      this is fudged; we're interested to offsets within the local frame.  */
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
 
   ix86_compute_frame_layout (&frame);
 
@@ -13889,11 +13912,10 @@ ix86_expand_prologue (void)
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
 	 that there's enough storage above the realignment point.  */
-      if (m->fs.sp_offset != frame.sse_reg_save_offset)
+      allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
+      if (allocate)
         pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-				   GEN_INT (m->fs.sp_offset
-					    - frame.sse_reg_save_offset),
-				   -1, false);
+				   GEN_INT (-allocate), -1, false);
 
       /* Align the stack.  */
       insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
@@ -13901,11 +13923,14 @@ ix86_expand_prologue (void)
 					GEN_INT (-align_bytes)));
 
       /* For the purposes of register save area addressing, the stack
-         pointer is no longer valid.  As for the value of sp_offset,
-	 see ix86_compute_frame_layout, which we need to match in order
-	 to pass verification of stack_pointer_offset at the end.  */
+	 pointer can no longer be used to access anything in the frame
+	 below m->fs.sp_realigned_offset and the frame pointer cannot be
+	 used for anything at or above.  */
+      gcc_assert (m->fs.sp_offset == frame.stack_realign_allocate_offset);
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-      m->fs.sp_valid = false;
+      m->fs.sp_realigned = true;
+      m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
     }
 
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
@@ -14244,6 +14269,7 @@ ix86_emit_leave (void)
 
   gcc_assert (m->fs.fp_valid);
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
   m->fs.sp_offset = m->fs.fp_offset - UNITS_PER_WORD;
   m->fs.fp_valid = false;
 
@@ -14344,9 +14370,10 @@ ix86_expand_epilogue (int style)
   ix86_finalize_stack_realign_flags ();
   ix86_compute_frame_layout (&frame);
 
-  m->fs.sp_valid = (!frame_pointer_needed
-		    || (crtl->sp_is_unchanging
-			&& !stack_realign_fp));
+  m->fs.sp_realigned = stack_realign_fp;
+  m->fs.sp_valid = stack_realign_fp
+		   || !frame_pointer_needed
+		   || crtl->sp_is_unchanging;
   gcc_assert (!m->fs.sp_valid
 	      || m->fs.sp_offset == frame.stack_pointer_offset);
 
@@ -14396,10 +14423,10 @@ ix86_expand_epilogue (int style)
   /* SEH requires the use of pops to identify the epilogue.  */
   else if (TARGET_SEH)
     restore_regs_via_mov = false;
-  /* If we're only restoring one register and sp is not valid then
+  /* If we're only restoring one register and sp cannot be used then
      using a move instruction to restore the register since it's
      less work than reloading sp and popping the register.  */
-  else if (!m->fs.sp_valid && frame.nregs <= 1)
+  else if (!sp_valid_at (frame.hfp_save_offset) && frame.nregs <= 1)
     restore_regs_via_mov = true;
   else if (TARGET_EPILOGUE_USING_MOVE
 	   && cfun->machine->use_fast_prologue_epilogue
@@ -14424,7 +14451,7 @@ ix86_expand_epilogue (int style)
 	 the stack pointer, if we will restore via sp.  */
       if (TARGET_64BIT
 	  && m->fs.sp_offset > 0x7fffffff
-	  && !(m->fs.fp_valid || m->fs.drap_valid)
+	  && !(fp_valid_at (frame.stack_realign_offset) || m->fs.drap_valid)
 	  && (frame.nsseregs + frame.nregs) != 0)
 	{
 	  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
@@ -14510,6 +14537,7 @@ ix86_expand_epilogue (int style)
 	    }
 	  m->fs.sp_offset = UNITS_PER_WORD;
 	  m->fs.sp_valid = true;
+	  m->fs.sp_realigned = false;
 	}
     }
   else
@@ -14531,10 +14559,11 @@ ix86_expand_epilogue (int style)
 	}
 
       /* First step is to deallocate the stack frame so that we can
-	 pop the registers.  Also do it on SEH target for very large
-	 frame as the emitted instructions aren't allowed by the ABI in
-	 epilogues.  */
-      if (!m->fs.sp_valid
+	 pop the registers.  If the stack pointer was realigned, it needs
+	 to be restored now.  Also do it on SEH target for very large
+	 frame as the emitted instructions aren't allowed by the ABI
+	 in epilogues.  */
+      if (!m->fs.sp_valid || m->fs.sp_realigned
  	  || (TARGET_SEH
 	      && (m->fs.sp_offset - frame.reg_save_offset
 		  >= SEH_MAX_FRAME_SIZE)))
@@ -14562,7 +14591,8 @@ ix86_expand_epilogue (int style)
     {
       /* If the stack pointer is valid and pointing at the frame
 	 pointer store address, then we only need a pop.  */
-      if (m->fs.sp_valid && m->fs.sp_offset == frame.hfp_save_offset)
+      if (sp_valid_at (frame.hfp_save_offset)
+	  && m->fs.sp_offset == frame.hfp_save_offset)
 	ix86_emit_restore_reg_using_pop (hard_frame_pointer_rtx);
       /* Leave results in shorter dependency chains on CPUs that are
 	 able to grok it fast.  */
@@ -14616,6 +14646,7 @@ ix86_expand_epilogue (int style)
      be possible to merge the local stack deallocation with the
      deallocation forced by ix86_static_chain_on_stack.   */
   gcc_assert (m->fs.sp_valid);
+  gcc_assert (!m->fs.sp_realigned);
   gcc_assert (!m->fs.fp_valid);
   gcc_assert (!m->fs.realigned);
   if (m->fs.sp_offset != UNITS_PER_WORD)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 5f5368da96d..72b0d89e22c 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2498,6 +2498,18 @@ struct GTY(()) machine_frame_state
      set, the SP/FP offsets above are relative to the aligned frame
      and not the CFA.  */
   BOOL_BITFIELD realigned : 1;
+
+  /* Indicates that the stack pointer has been realigned and sp_offset
+     rounded up to the nearest alignment boundary.  Unlike `realigned`
+     above, this does not realign the hard frame pointer and is not
+     treated like a new local stack frame.  */
+  BOOL_BITFIELD sp_realigned : 1;
+
+  /* The offset (from the CFA) the stack pointer was realigned to.  When
+     sp_realigned is true, the stack pointer may be used to address
+     memory at or above this offset, but may not be used to address memory
+     below it.  */
+  HOST_WIDE_INT sp_realigned_offset;
 };
 
 /* Private to winnt.c.  */
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves.
  2016-12-22 20:27 Use aligned SSE movs for re-aligned MS ABI pro/epilogues Daniel Santos
@ 2016-12-22 20:55 ` Daniel Santos
  2016-12-22 20:55 ` [PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment Daniel Santos
  2016-12-22 22:26 ` [PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs Daniel Santos
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Santos @ 2016-12-22 20:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: Daniel Santos

This step adds new fields to struct ix86_frame to track where we started
the stack re-alignment and what we need to allocate prior to
re-alignment.  In ix86_compute_frame_layout, we do the stack frame
re-alignment computation prior to computing the SSE save area so that it
we have an aligned SSE save area.

This new also assures that the SSE save area is properly aligned when
DRAP is used.
---
 gcc/config/i386/i386.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 792e8ec232d..7f7389cbe31 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2453,7 +2453,7 @@ struct GTY(()) stack_local_entry {
    [saved regs]
 					<- regs_save_offset
    [padding0]
-
+					<- stack_realign_offset
    [saved SSE regs]
 					<- sse_regs_save_offset
    [padding1]          |
@@ -2479,6 +2479,8 @@ struct ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12457,28 +12459,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (TARGET_SEH)
     frame->hard_frame_pointer_offset = offset;
 
+  /* When re-aligning the stack frame, but not saving SSE registers, this
+     is the offset we want to allocate memory for.  */
+  frame->stack_realign_allocate_offset = offset;
+
+  /* The re-aligned stack starts here.  Values before this point are not
+     directly comparable with values below this point.  Use sp_valid_at
+     to determine if the stack pointer is valid for a given offset and
+     fp_valid_at for the frame pointer.  */
+  if (stack_realign_fp)
+    offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->stack_realign_offset = offset;
+
   /* Align and set SSE register save area.  */
   if (frame->nsseregs)
     {
       /* The only ABI that has saved SSE registers (Win64) also has a
-	 16-byte aligned default stack, and thus we don't need to be
-	 within the re-aligned local stack frame to save them.  In case
-	 incoming stack boundary is aligned to less than 16 bytes,
-	 unaligned move of SSE register will be emitted, so there is
-	 no point to round up the SSE register save area outside the
-	 re-aligned local stack frame to 16 bytes.  */
-      if (ix86_incoming_stack_boundary >= 128)
+	 16-byte aligned default stack.  However, many programs violate
+	 the ABI, and Wine64 forces stack realignment to compensate.
+
+	 If the incoming stack boundary is at least 16 bytes, or DRAP is
+	 required and the DRAP re-alignment boundary is at least 16 bytes,
+	 then we want the SSE register save area properly aligned.  */
+      if (ix86_incoming_stack_boundary >= 128
+	       || (stack_realign_drap && stack_alignment_needed >= 16))
 	offset = ROUND_UP (offset, 16);
       offset += frame->nsseregs * 16;
+      frame->stack_realign_allocate_offset = offset;
     }
-  frame->sse_reg_save_offset = offset;
 
-  /* The re-aligned stack starts here.  Values before this point are not
-     directly comparable with values below this point.  In order to make
-     sure that no value happens to be the same before and after, force
-     the alignment computation below to add a non-zero value.  */
-  if (stack_realign_fp)
-    offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->sse_reg_save_offset = offset;
 
   /* Va-arg area */
   frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs
  2016-12-22 20:27 Use aligned SSE movs for re-aligned MS ABI pro/epilogues Daniel Santos
  2016-12-22 20:55 ` [PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves Daniel Santos
  2016-12-22 20:55 ` [PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment Daniel Santos
@ 2016-12-22 22:26 ` Daniel Santos
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Santos @ 2016-12-22 22:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Daniel Santos

This adds an optional `align' parameter to choose_baseaddr allowing the
caller to request an address that is aligned to some boundary.  Then
ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov are
modified so that optimally aligned memory is used when such a base
register is available.
---
 gcc/config/i386/i386.c | 110 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 87 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b5f9f36094f..e60267a903d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12622,15 +12622,40 @@ static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
 			  && cfa_offset >= fs.sp_realigned_offset);
 }
 
-/* Return an RTX that points to CFA_OFFSET within the stack frame.
-   The valid base registers are taken from CFUN->MACHINE->FS.  */
+/* Choose a base register based upon alignment requested, speed and/or
+   size.  */
 
-static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset)
+static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+			    HOST_WIDE_INT &base_offset,
+			    unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
-  rtx base_reg = NULL;
-  HOST_WIDE_INT base_offset = 0;
+  unsigned int hfp_align;
+  unsigned int drap_align;
+  unsigned int sp_align;
+  bool hfp_ok  = fp_valid_at (cfa_offset);
+  bool drap_ok = m->fs.drap_valid;
+  bool sp_ok   = sp_valid_at (cfa_offset);
+
+  hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY;
+
+  /* Filter out any registers that don't meet the requested alignment
+     criteria.  */
+  if (align_reqested)
+    {
+      /* Make sure we weren't given a cfa_offset incongruent with the
+	 align_reqested.  */
+      gcc_assert (!(cfa_offset & (align_reqested / BITS_PER_UNIT - 1)));
+
+      if (m->fs.realigned)
+	hfp_align = drap_align = sp_align = crtl->stack_alignment_needed;
+      else if (m->fs.sp_realigned)
+	sp_align = crtl->stack_alignment_needed;
+
+      hfp_ok = hfp_ok && hfp_align >= align_reqested;
+      drap_ok = drap_ok && drap_align >= align_reqested;
+      sp_ok = sp_ok && sp_align >= align_reqested;
+    }
 
   if (m->use_fast_prologue_epilogue)
     {
@@ -12639,17 +12664,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
          while DRAP must be reloaded within the epilogue.  But choose either
          over the SP due to increased encoding size.  */
 
-      if (m->fs.fp_valid)
+      if (hfp_ok)
 	{
 	  base_reg = hard_frame_pointer_rtx;
 	  base_offset = m->fs.fp_offset - cfa_offset;
 	}
-      else if (m->fs.drap_valid)
+      else if (drap_ok)
 	{
 	  base_reg = crtl->drap_reg;
 	  base_offset = 0 - cfa_offset;
 	}
-      else if (m->fs.sp_valid)
+      else if (sp_ok)
 	{
 	  base_reg = stack_pointer_rtx;
 	  base_offset = m->fs.sp_offset - cfa_offset;
@@ -12662,13 +12687,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 
       /* Choose the base register with the smallest address encoding.
          With a tie, choose FP > DRAP > SP.  */
-      if (m->fs.sp_valid)
+      if (sp_ok)
 	{
 	  base_reg = stack_pointer_rtx;
 	  base_offset = m->fs.sp_offset - cfa_offset;
           len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset);
 	}
-      if (m->fs.drap_valid)
+      if (drap_ok)
 	{
 	  toffset = 0 - cfa_offset;
 	  tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset);
@@ -12679,7 +12704,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 	      len = tlen;
 	    }
 	}
-      if (m->fs.fp_valid)
+      if (hfp_ok)
 	{
 	  toffset = m->fs.fp_offset - cfa_offset;
 	  tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset);
@@ -12691,8 +12716,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 	    }
 	}
     }
-  gcc_assert (base_reg != NULL);
 
+    /* Set the align return value.  */
+    if (align)
+      {
+	if (base_reg == stack_pointer_rtx)
+	  *align = sp_align;
+	else if (base_reg == crtl->drap_reg)
+	  *align = drap_align;
+	else if (base_reg == hard_frame_pointer_rtx)
+	  *align = hfp_align;
+      }
+}
+
+/* Return an RTX that points to CFA_OFFSET within the stack frame and
+   the alignment of address.  If align is non-null, it should point to
+   an alignment value (in bits) that is preferred or zero and will
+   recieve the alignment of the base register that was selected.  The
+   valid base registers are taken from CFUN->MACHINE->FS.  */
+
+static rtx
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+{
+  rtx base_reg = NULL;
+  HOST_WIDE_INT base_offset = 0;
+
+  /* If a specific alignment is requested, try to get a base register
+     with that alignment first.  */
+  if (align && *align)
+    choose_basereg (cfa_offset, base_reg, base_offset, *align, align);
+
+  if (!base_reg)
+    choose_basereg (cfa_offset, base_reg, base_offset, 0, align);
+
+  gcc_assert (base_reg != NULL);
   return plus_constant (Pmode, base_reg, base_offset);
 }
 
@@ -12721,13 +12778,13 @@ ix86_emit_save_reg_using_mov (machine_mode mode, unsigned int regno,
   struct machine_function *m = cfun->machine;
   rtx reg = gen_rtx_REG (mode, regno);
   rtx mem, addr, base, insn;
-  unsigned int align;
+  unsigned int align = GET_MODE_ALIGNMENT (mode);
 
-  addr = choose_baseaddr (cfa_offset);
+  addr = choose_baseaddr (cfa_offset, &align);
   mem = gen_frame_mem (mode, addr);
 
-  /* The location is aligned up to INCOMING_STACK_BOUNDARY.  */
-  align = MIN (GET_MODE_ALIGNMENT (mode), INCOMING_STACK_BOUNDARY);
+  /* The location aligment depends upon the base register.  */
+  align = MIN (GET_MODE_ALIGNMENT (mode), align);
   set_mem_align (mem, align);
 
   insn = emit_insn (gen_rtx_SET (mem, reg));
@@ -12767,6 +12824,13 @@ ix86_emit_save_reg_using_mov (machine_mode mode, unsigned int regno,
 	}
     }
 
+  else if (base == stack_pointer_rtx && m->fs.sp_realigned
+	   && cfa_offset >= m->fs.sp_realigned_offset)
+    {
+      gcc_checking_assert (stack_realign_fp);
+      add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg));
+    }
+
   /* The memory may not be relative to the current CFA register,
      which means that we may need to generate a new pattern for
      use by the unwind info.  */
@@ -14166,7 +14230,7 @@ ix86_expand_prologue (void)
       /* vDRAP is setup but after reload it turns out stack realign
          isn't necessary, here we will emit prologue to setup DRAP
          without stack realign adjustment */
-      t = choose_baseaddr (0);
+      t = choose_baseaddr (0, NULL);
       emit_insn (gen_rtx_SET (crtl->drap_reg, t));
     }
 
@@ -14303,7 +14367,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT cfa_offset,
 	rtx mem;
 	rtx_insn *insn;
 
-	mem = choose_baseaddr (cfa_offset);
+	mem = choose_baseaddr (cfa_offset, NULL);
 	mem = gen_frame_mem (word_mode, mem);
 	insn = emit_move_insn (reg, mem);
 
@@ -14340,13 +14404,13 @@ ix86_emit_restore_sse_regs_using_mov (HOST_WIDE_INT cfa_offset,
       {
 	rtx reg = gen_rtx_REG (V4SFmode, regno);
 	rtx mem;
-	unsigned int align;
+	unsigned int align = GET_MODE_ALIGNMENT (V4SFmode);
 
-	mem = choose_baseaddr (cfa_offset);
+	mem = choose_baseaddr (cfa_offset, &align);
 	mem = gen_rtx_MEM (V4SFmode, mem);
 
-	/* The location is aligned up to INCOMING_STACK_BOUNDARY.  */
-	align = MIN (GET_MODE_ALIGNMENT (V4SFmode), INCOMING_STACK_BOUNDARY);
+	/* The location aligment depends upon the base register.  */
+	align = MIN (GET_MODE_ALIGNMENT (V4SFmode), align);
 	set_mem_align (mem, align);
 	emit_insn (gen_rtx_SET (reg, mem));
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-22 20:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-22 20:27 Use aligned SSE movs for re-aligned MS ABI pro/epilogues Daniel Santos
2016-12-22 20:55 ` [PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves Daniel Santos
2016-12-22 20:55 ` [PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment Daniel Santos
2016-12-22 22:26 ` [PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs Daniel Santos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).