public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
@ 2017-07-31 11:16 Daniel Santos
  2017-07-31 11:19 ` [PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:16 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H.J. Lu, Martin Liska,
	Rainer Orth, Mike Stump

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]

When working on the Wine64 project to use aligned SSE MOVs after SP 
realignment and adding -mcall-ms2sysv-xlogues, I overlooked the fact 
that the function body may require a stack alignment greater than 
16-bytes.  This can result in an ICE with -mabi=ms -mavx512f and some 
other cases.  This patch set reworks the strategy for calculating the 
frame layout following normal (inline) integral register saves (at 
frame.reg_save_offset) to the start of the frame for the local function 
(frame.frame_pointer_offset).

I've completed a bootstrap and full regression test with no additional 
failures, but I don't have access to a machine with avx512 extensions.  
I have manually run the tests that need it using the Intel SDE, but I 
haven't been able to validate that my 
check_effective_target_avx512f_runtime code in 
gcc/testsuite/lib/target-supports.exp is correctly enabling the tests 
for pr80969-4*.c.

As an aside note, I still have some rework of the ms-sysv.exp tests that 
I haven't yet to submitted and in which I'm adding more tests for cases 
with uncommon stacks, as in PR 81563.

Thanks,
Daniel

[-- Attachment #2: pr80969.gcc.ChangeLog --]
[-- Type: text/plain, Size: 712 bytes --]

2017-07-23  Daniel Santos  <daniel.santos@pobox.com>

	* config/i386/i386.h (ix86_frame::outlined_save_offset): Remove field.
	(ix86_frame::stack_realign_allocate_offset): Likewise.
	(ix86_frame::stack_realign_allocate): New field.
	(struct machine_frame_state): Modify comments.
	(machine_frame_state::sp_realigned_fp_end): New field.
	(machine_function::call_ms2sysv_pad_out): Remove field.
	* config/i386/i386.c (xlogue_layout::get_stack_space_used): Modify.
	(ix86_compute_frame_layout): Likewise.
	(sp_valid_at): Likewise.
	(fp_valid_at): Likewise.
	(choose_baseaddr): Modify comments.
	(ix86_emit_outlined_ms2sysv_save): Modify.
	(ix86_expand_prologue): Likewise.
	(ix86_expand_epilogue): Modify comments.

[-- Attachment #3: pr80969.gcc.testsuite.ChangeLog --]
[-- Type: text/plain, Size: 354 bytes --]

2017-07-23  Daniel Santos  <daniel.santos@pobox.com>
	* gcc.target/i386/pr80969-1.c: New testcase.
	* gcc.target/i386/pr80969-2a.c: Likewise.
	* gcc.target/i386/pr80969-2.c: Likewise.
	* gcc.target/i386/pr80969-3.c: Likewise.
	* gcc.target/i386/pr80969-4a.c: Likewise.
	* gcc.target/i386/pr80969-4b.c: Likewise.
	* gcc.target/i386/pr80969-4.c: Likewise.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
  2017-07-31 11:19 ` [PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
  2017-07-31 11:19 ` [PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-08-02 23:28   ` [PATCH 5/6 v2] " Daniel Santos
  2017-07-31 11:19 ` [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset Daniel Santos
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 54 +++++++++++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e92f322de0c..7e1fc4dfbf5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13273,10 +13273,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
    an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14322,35 +14325,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+     we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+			  + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
-     layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+      pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+				GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
 						  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-			     GEN_INT (-stack_alloc_size), -1,
-			     m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
     {
       const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
       rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 			     r.regno);
-      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
     }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14608,8 +14611,8 @@ ix86_expand_prologue (void)
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
 	 that there's enough storage above the realignment point.  */
-      allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-      if (allocate && !m->call_ms2sysv)
+      allocate = frame.stack_realign_allocate;
+      if (allocate)
         pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 				   GEN_INT (-allocate), -1, false);
 
@@ -14618,8 +14621,7 @@ ix86_expand_prologue (void)
 					stack_pointer_rtx,
 					GEN_INT (-align_bytes)));
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-      m->fs.sp_realigned = true;
-      m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      m->fs.sp_realigned_offset = m->fs.sp_offset - allocate;
       /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
 	 Beyond this point, stack access should be done via choose_baseaddr or
 	 by using sp_valid_at and fp_valid_at to determine the correct base
@@ -14627,6 +14629,8 @@ ix86_expand_prologue (void)
 	 and not physical.  */
       gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
       gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
+      m->fs.sp_realigned = true;
+
       /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 	 is needed to describe where a register is saved using a realigned
 	 stack pointer, so we need to invalidate the stack pointer for that
@@ -14688,7 +14692,7 @@ ix86_expand_prologue (void)
      so probe if the size is non-negative to preserve the protection area.  */
   if (allocate >= 0 && flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
     {
-      /* We expect the registers to be saved when probes are used.  */
+      /* We expect the GP registers to be saved when probes are used.  */
       gcc_assert (int_registers_saved);
 
       if (STACK_CHECK_MOVING_SP)
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
  2017-07-31 11:19 ` [PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-08-08 19:23   ` [PATCH 6/6 v2] " Daniel Santos
  2017-07-31 11:19 ` [PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, Mike Stump, Rainer Orth
  Cc: Martin Liska, H . J . Lu

The testcase in the PR is used as a base and relevant variants are added
to test other factors affected by the patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
              stubs.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 ++++
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  26 ++++++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |  26 ++++++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  31 ++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4.c  | 123 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 +++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 +++++++++++++++++++++++++++++
 gcc/testsuite/lib/target-supports.exp      |  66 +++++++++++++++
 8 files changed, 536 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 00000000000..eb8d767a778
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 00000000000..e868d6c7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 00000000000..071a90534a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 00000000000..5982981b55c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with alloca (and DRAP).  */
+
+#include <alloca.h>
+
+int a[56];
+volatile int b = -12345;
+volatile const int d = 42;
+
+void foo (int *x, int y, int z)
+{
+}
+
+void (*volatile const foo_noinfo)(int *, int, int) = foo;
+
+int main (int argc, char *argv[]) {
+  int c;
+  int *e = alloca (d);
+  foo_noinfo (e, d, 0);
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    foo_noinfo (e, d, c);
+    a[-(b % 56)] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c
new file mode 100644
index 00000000000..1ec54d081cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
@@ -0,0 +1,123 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512 and va_args.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void (*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4a.c b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
new file mode 100644
index 00000000000..faf263170e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline, sysv_abi))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void __attribute__((sysv_abi))
+(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4b.c b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
new file mode 100644
index 00000000000..9bc8995e58e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call using save/restore stubs.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline, sysv_abi))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void __attribute__((sysv_abi))
+(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5a6562794b2..554ec10e4b1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1642,6 +1642,29 @@ proc check_avx_os_support_available { } {
     }]
 }
 
+# Return 1 if the target OS supports running AVX executables, 0
+# otherwise.  Cache the result.
+
+proc check_avx512_os_support_available { } {
+    return [check_cached_effective_target avx512_os_support_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+	    expr 0
+	} else {
+	    # Check that OS has AVX512, AVX and SSE saving enabled.
+	    check_runtime_nocache avx512_os_support_available {
+		int main ()
+		{
+		  unsigned int eax, edx;
+
+		  asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (0));
+		  return (eax & 0xe6) != 0xe6;
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports executing SSE instructions, 0
 # otherwise.  Cache the result.
 
@@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
 	    expr 0
 	} else {
 	    check_runtime_nocache avx2_hw_available {
+		#include <stddef.h>
 		#include "cpuid.h"
 		int main ()
 		{
@@ -1842,6 +1866,37 @@ proc check_avx2_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing AVX512 foundation instructions, 0
+# otherwise.  Cache the result.
+
+proc check_avx512f_hw_available { } {
+    return [check_cached_effective_target avx512f_hw_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+	    expr 0
+	} else {
+	    check_runtime_nocache avx512f_hw_available {
+		#include <stddef.h>
+		#include "cpuid.h"
+		int main ()
+		{
+		  unsigned int eax, ebx, ecx, edx;
+		  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)
+		      || !(ecx & bit_OSXSAVE))
+		    return 1;
+
+		  if (__get_cpuid_max (0, NULL) < 7)
+		    return 1;
+
+		  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+		  return !(ebx & bit_AVX512F);
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports running SSE executables, 0 otherwise.
 
 proc check_effective_target_sse_runtime { } {
@@ -1928,6 +1983,17 @@ proc check_effective_target_avx2_runtime { } {
     return 0
 }
 
+# Return 1 if the target supports running AVX512f executables, 0 otherwise.
+
+proc check_effective_target_avx512f_runtime { } {
+    if { [check_effective_target_avx512f]
+	 && [check_avx512f_hw_available]
+	 && [check_avx512_os_support_available] } {
+	return 1
+    }
+    return 0
+}
+
 # Return 1 if we are compiling for 64-bit PowerPC but we do not use direct
 # move instructions for moves from GPR to FPR.
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/6] [i386] Modify ix86_compute_frame_layout
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
                   ` (3 preceding siblings ...)
  2017-07-31 11:19 ` [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-07-31 11:19 ` [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out Daniel Santos
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 116 +++++++++++++++++++++++++++++++++----------------
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e2e9546a27c..e92f322de0c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12874,6 +12874,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+    {
+      gcc_assert (stack_alignment_needed >= 16);
+      gcc_assert (!frame->nsseregs);
+    }
+
   /* For SEH we have to limit the amount of code movement into the prologue.
      At present we do this via a BLOCKAGE, at which point there's very little
      scheduling that can be done, which means that there's very little point
@@ -12936,54 +12944,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
     frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
-     is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
-     directly comparable with values below this point.  Use sp_valid_at
-     to determine if the stack pointer is valid for a given offset and
-     fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-    offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
     {
-      gcc_assert (stack_alignment_needed >= 16);
-      gcc_assert (!frame->nsseregs);
+      /* We may need a 16-byte aligned stack for the remainder of the
+	 register save area, but the stack frame for the local function
+	 may require a greater alignment if using AVX/2/512.  In order
+	 to avoid wasting space, we first calculate the space needed for
+	 the rest of the register saves, add that to the stack pointer,
+	 and then realign the stack to the boundary of the start of the
+	 frame for the local function.  */
+      HOST_WIDE_INT space_needed = 0;
+      HOST_WIDE_INT sse_reg_space_needed = 0;
 
-      m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-      offset += xlogue_layout::get_instance ().get_stack_space_used ();
-    }
+      if (TARGET_64BIT)
+	{
+	  if (m->call_ms2sysv)
+	    {
+	      m->call_ms2sysv_pad_in = 0;
+	      space_needed = xlogue_layout::get_instance ().get_stack_space_used ();
+	    }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-    {
-      /* The only ABI that has saved SSE registers (Win64) also has a
-	 16-byte aligned default stack.  However, many programs violate
-	 the ABI, and Wine64 forces stack realignment to compensate.
+	  else if (frame->nsseregs)
+	    /* The only ABI that has saved SSE registers (Win64) also has a
+	       16-byte aligned default stack.  However, many programs violate
+	       the ABI, and Wine64 forces stack realignment to compensate.  */
+	    space_needed = frame->nsseregs * 16;
+
+	  sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+	  /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+	     rounding to be pedantic.  */
+	  space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+	}
+      else
+	space_needed = frame->va_arg_size;
+
+      /* Record the allocation size required prior to the realignment AND.  */
+      frame->stack_realign_allocate = space_needed;
+
+      /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+	 before this point are not directly comparable with values below
+	 this point.  Use sp_valid_at to determine if the stack pointer is
+	 valid for a given offset, fp_valid_at for the frame pointer, or
+	 choose_baseaddr to have a base register chosen for you.
 
-	 If the incoming stack boundary is at least 16 bytes, or DRAP is
-	 required and the DRAP re-alignment boundary is at least 16 bytes,
-	 then we want the SSE register save area properly aligned.  */
-      if (ix86_incoming_stack_boundary >= 128
-	       || (stack_realign_drap && stack_alignment_needed >= 16))
-	offset = ROUND_UP (offset, 16);
-      offset += frame->nsseregs * 16;
-      frame->stack_realign_allocate_offset = offset;
+	 Note that the result of (frame->stack_realign_offset
+	 & (stack_alignment_needed - 1)) may not equal zero.  */
+      offset = ROUND_UP (offset + space_needed, stack_alignment_needed);
+      frame->stack_realign_offset = offset - space_needed;
+      frame->sse_reg_save_offset = frame->stack_realign_offset
+							+ sse_reg_space_needed;
     }
+  else
+    {
+      frame->stack_realign_offset = offset;
 
-  frame->sse_reg_save_offset = offset;
+      if (TARGET_64BIT && m->call_ms2sysv)
+	{
+	  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
+	  offset += xlogue_layout::get_instance ().get_stack_space_used ();
+	}
 
-  /* Va-arg area */
-  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-  offset += frame->va_arg_size;
+      /* Align and set SSE register save area.  */
+      else if (frame->nsseregs)
+	{
+	  /* If the incoming stack boundary is at least 16 bytes, or DRAP is
+	     required and the DRAP re-alignment boundary is at least 16 bytes,
+	     then we want the SSE register save area properly aligned.  */
+	  if (ix86_incoming_stack_boundary >= 128
+		  || (stack_realign_drap && stack_alignment_needed >= 16))
+	    offset = ROUND_UP (offset, 16);
+	  offset += frame->nsseregs * 16;
+	}
+      frame->sse_reg_save_offset = offset;
+      offset += frame->va_arg_size;
+    }
 
   /* Align start of frame for local function.  */
-  if (stack_realign_fp
-      || m->call_ms2sysv
-      || offset != frame->sse_reg_save_offset
+  if (m->call_ms2sysv
+      || frame->va_arg_size != 0
       || size != 0
       || !crtl->is_leaf
       || cfun->calls_alloca
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index b08e45f68d4..e58882baee8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2501,7 +2501,7 @@ struct GTY(()) ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_allocate;
   HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-07-31 11:19 ` [PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 45 ++++++++++++++++++++++++++++++---------------
 gcc/config/i386/i386.h | 18 +++++++++++++-----
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1486ff3750..690631dfe43 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13102,26 +13102,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
-			  && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+    {
+      /* Validate that the cfa_offset isn't in a "no-man's land".  */
+      gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+      return false;
+    }
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
-			  && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+    {
+      /* Validate that the cfa_offset isn't in a "no-man's land".  */
+      gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+      return false;
+    }
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14560,6 +14570,9 @@ ix86_expand_prologue (void)
       int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
       gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+      /* Record last valid frame pointer offset.  */
+      m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
       /* The computation of the size of the re-aligned stack frame means
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14573,13 +14586,15 @@ ix86_expand_prologue (void)
       insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
 					stack_pointer_rtx,
 					GEN_INT (-align_bytes)));
-      /* For the purposes of register save area addressing, the stack
-	 pointer can no longer be used to access anything in the frame
-	 below m->fs.sp_realigned_offset and the frame pointer cannot be
-	 used for anything at or above.  */
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
       m->fs.sp_realigned = true;
       m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+	 Beyond this point, stack access should be done via choose_baseaddr or
+	 by using sp_valid_at and fp_valid_at to determine the correct base
+	 register.  Henceforth, any CFA offset should be thought of as logical
+	 and not physical.  */
+      gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
       gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
       /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 	 is needed to describe where a register is saved using a realigned
@@ -15269,10 +15284,10 @@ ix86_expand_epilogue (int style)
   if (restore_regs_via_mov || frame.nsseregs)
     {
       /* Ensure that the entire register save area is addressable via
-	 the stack pointer, if we will restore via sp.  */
+	 the stack pointer, if we will restore SSE regs via sp.  */
       if (TARGET_64BIT
 	  && m->fs.sp_offset > 0x7fffffff
-	  && !(fp_valid_at (frame.stack_realign_offset) || m->fs.drap_valid)
+	  && sp_valid_at (frame.stack_realign_offset)
 	  && (frame.nsseregs + frame.nregs) != 0)
 	{
 	  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 682745ae06b..ce5bb7f6677 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2512,7 +2512,9 @@ struct GTY(()) ix86_frame
   bool save_regs_using_mov;
 };
 
-/* Machine specific frame tracking during prologue/epilogue generation.  */
+/* Machine specific frame tracking during prologue/epilogue generation.  All
+   values are positive, but since the x86 stack grows downward, are subtratced
+   from the CFA to produce a valid address.  */
 
 struct GTY(()) machine_frame_state
 {
@@ -2550,13 +2552,19 @@ struct GTY(()) machine_frame_state
 
   /* Indicates whether the stack pointer has been re-aligned.  When set,
      SP/FP continue to be relative to the CFA, but the stack pointer
-     should only be used for offsets >= sp_realigned_offset, while
-     the frame pointer should be used for offsets < sp_realigned_offset.
+     should only be used for offsets > sp_realigned_offset, while
+     the frame pointer should be used for offsets <= sp_realigned_fp_last.
      The flags realigned and sp_realigned are mutually exclusive.  */
   BOOL_BITFIELD sp_realigned : 1;
 
-  /* If sp_realigned is set, this is the offset from the CFA that the
-     stack pointer was realigned to.  */
+  /* If sp_realigned is set, this is the last valid offset from the CFA
+     that can be used for access with the frame pointer.  */
+  HOST_WIDE_INT sp_realigned_fp_last;
+
+  /* If sp_realigned is set, this is the offset from the CFA that the stack
+     pointer was realigned, and may or may not be equal to sp_realigned_fp_last.
+     Access via the stack pointer is only valid for offsets that are greater than
+     this value.  */
   HOST_WIDE_INT sp_realigned_offset;
 };
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
                   ` (2 preceding siblings ...)
  2017-07-31 11:19 ` [PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-07-31 13:53   ` Uros Bizjak
  2017-07-31 11:19 ` [PATCH 4/6] [i386] Modify ix86_compute_frame_layout Daniel Santos
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

This value was used in an earlier incarnation of the
-mcall-ms2sysv-xlogues patch set but is now set and never read.  The
value of ix86_frame::sse_reg_save_offset serves the same purpose.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 1 -
 gcc/config/i386/i386.h | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 690631dfe43..47c5608c3cd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void)
 
       offset += xlogue.get_stack_space_used ();
       gcc_assert (!(offset & 0xf));
-      frame->outlined_save_offset = offset;
     }
 
   /* Align and set SSE register save area.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ce5bb7f6677..1648bdf1556 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2477,8 +2477,7 @@ enum avx_u128_state
 			<- end of stub-saved/restored regs
      [padding1]
    ]
-					<- outlined_save_offset
-					<- sse_regs_save_offset
+					<- sse_reg_save_offset
    [padding2]
 		       |		<- FRAME_POINTER
    [va_arg registers]  |
@@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT stack_realign_allocate_offset;
   HOST_WIDE_INT stack_realign_offset;
-  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
                   ` (4 preceding siblings ...)
  2017-07-31 11:19 ` [PATCH 4/6] [i386] Modify ix86_compute_frame_layout Daniel Santos
@ 2017-07-31 11:19 ` Daniel Santos
  2017-07-31 13:59   ` Uros Bizjak
  2017-07-31 17:23 ` [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
  2017-08-08 19:31 ` PING " Daniel Santos
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 11:19 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

The -mcall-ms2sysv-xlogues project added the boolean fields
call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function
to track rather or not an additional 8 bytes of padding was needed for
stack alignment prior to and after the stub save area.  This design was
based upon the faulty assumption the function body would not require a
stack alignment greater than 16 bytes.  This continues to work well for
managing padding prior to the stub save area, but will not work for the
outgoing alignment.

Rather than changing machine_function::call_ms2sysv_pad_out to a larger
type, this patch removes it, thus transferring responsibility for stack
alignment following the stub save area from class xlogue_layout to the
body of ix86_compute_frame_layout.  Since the 64-bit va_arg register
save area is always a multiple of 16-bytes in size (176 for System V ABI
and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset
at the start of the function body (frame.frame_pointer_offset) will
assure there is enough room for any padding needed to keep the save area
for SSE va_args 16-byte aligned, so no modification is needed for that
calculation.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 18 ++++--------------
 gcc/config/i386/i386.h |  8 ++------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 47c5608c3cd..e2e9546a27c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2491,9 +2491,7 @@ public:
     unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
 
     gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
-    return m_regs[last_reg].offset
-	   + (m->call_ms2sysv_pad_out ? 8 : 0)
-	   + STUB_INDEX_OFFSET;
+    return m_regs[last_reg].offset + STUB_INDEX_OFFSET;
   }
 
   /* Returns the offset for the base pointer used by the stub.  */
@@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void)
 	{
 	  unsigned count = xlogue_layout::count_stub_managed_regs ();
 	  m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
+	  m->call_ms2sysv_pad_in = 0;
 	}
     }
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
-  m->call_ms2sysv_pad_in = 0;
-  m->call_ms2sysv_pad_out = 0;
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
      except for function prologues, leaf functions and when the defult
@@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void)
       gcc_assert (!frame->nsseregs);
 
       m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-
-      /* Select an appropriate layout for incoming stack offset.  */
-      const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-
-      if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD)
-	m->call_ms2sysv_pad_out = 1;
-
-      offset += xlogue.get_stack_space_used ();
-      gcc_assert (!(offset & 0xf));
+      offset += xlogue_layout::get_instance ().get_stack_space_used ();
     }
 
   /* Align and set SSE register save area.  */
@@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void)
 
   /* Align start of frame for local function.  */
   if (stack_realign_fp
+      || m->call_ms2sysv
       || offset != frame->sse_reg_save_offset
       || size != 0
       || !crtl->is_leaf
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1648bdf1556..b08e45f68d4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2646,17 +2646,13 @@ struct GTY(()) machine_function {
   BOOL_BITFIELD arg_reg_available : 1;
 
   /* If true, we're out-of-lining reg save/restore for regs clobbered
-     by ms_abi functions calling a sysv function.  */
+     by 64-bit ms_abi functions calling a sysv_abi function.  */
   BOOL_BITFIELD call_ms2sysv : 1;
 
   /* If true, the incoming 16-byte aligned stack has an offset (of 8) and
-     needs padding.  */
+     needs padding prior to out-of-line stub save/restore area.  */
   BOOL_BITFIELD call_ms2sysv_pad_in : 1;
 
-  /* If true, the size of the stub save area plus inline int reg saves will
-     result in an 8 byte offset, so needs padding.  */
-  BOOL_BITFIELD call_ms2sysv_pad_out : 1;
-
   /* This is the number of extra registers saved by stub (valid range is
      0-6). Each additional register is only saved/restored by the stubs
      if all successive ones are. (Will always be zero when using a hard
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset
  2017-07-31 11:19 ` [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset Daniel Santos
@ 2017-07-31 13:53   ` Uros Bizjak
  0 siblings, 0 replies; 22+ messages in thread
From: Uros Bizjak @ 2017-07-31 13:53 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Jan Hubicka, Martin Liska, H . J . Lu

On Mon, Jul 31, 2017 at 1:24 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
> This value was used in an earlier incarnation of the
> -mcall-ms2sysv-xlogues patch set but is now set and never read.  The
> value of ix86_frame::sse_reg_save_offset serves the same purpose.

OK as obvious patch.

Thanks,
Uros.

> Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
> ---
>  gcc/config/i386/i386.c | 1 -
>  gcc/config/i386/i386.h | 4 +---
>  2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 690631dfe43..47c5608c3cd 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -12966,7 +12966,6 @@ ix86_compute_frame_layout (void)
>
>        offset += xlogue.get_stack_space_used ();
>        gcc_assert (!(offset & 0xf));
> -      frame->outlined_save_offset = offset;
>      }
>
>    /* Align and set SSE register save area.  */
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ce5bb7f6677..1648bdf1556 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2477,8 +2477,7 @@ enum avx_u128_state
>                         <- end of stub-saved/restored regs
>       [padding1]
>     ]
> -                                       <- outlined_save_offset
> -                                       <- sse_regs_save_offset
> +                                       <- sse_reg_save_offset
>     [padding2]
>                        |                <- FRAME_POINTER
>     [va_arg registers]  |
> @@ -2504,7 +2503,6 @@ struct GTY(()) ix86_frame
>    HOST_WIDE_INT reg_save_offset;
>    HOST_WIDE_INT stack_realign_allocate_offset;
>    HOST_WIDE_INT stack_realign_offset;
> -  HOST_WIDE_INT outlined_save_offset;
>    HOST_WIDE_INT sse_reg_save_offset;
>
>    /* When save_regs_using_mov is set, emit prologue using
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out
  2017-07-31 11:19 ` [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out Daniel Santos
@ 2017-07-31 13:59   ` Uros Bizjak
  0 siblings, 0 replies; 22+ messages in thread
From: Uros Bizjak @ 2017-07-31 13:59 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Jan Hubicka, Martin Liska, H . J . Lu

On Mon, Jul 31, 2017 at 1:24 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
> The -mcall-ms2sysv-xlogues project added the boolean fields
> call_ms2sysv_pad_in and call_ms2sysv_pad_out to struct machine_function
> to track rather or not an additional 8 bytes of padding was needed for
> stack alignment prior to and after the stub save area.  This design was
> based upon the faulty assumption the function body would not require a
> stack alignment greater than 16 bytes.  This continues to work well for
> managing padding prior to the stub save area, but will not work for the
> outgoing alignment.
>
> Rather than changing machine_function::call_ms2sysv_pad_out to a larger
> type, this patch removes it, thus transferring responsibility for stack
> alignment following the stub save area from class xlogue_layout to the
> body of ix86_compute_frame_layout.  Since the 64-bit va_arg register
> save area is always a multiple of 16-bytes in size (176 for System V ABI
> and 96 for Microsoft ABI), the ROUND_UP calculation for the stack offset
> at the start of the function body (frame.frame_pointer_offset) will
> assure there is enough room for any padding needed to keep the save area
> for SSE va_args 16-byte aligned, so no modification is needed for that
> calculation.
>
> Signed-off-by: Daniel Santos <daniel.santos@pobox.com>

LGTM.

OK for mainline.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c | 18 ++++--------------
>  gcc/config/i386/i386.h |  8 ++------
>  2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 47c5608c3cd..e2e9546a27c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2491,9 +2491,7 @@ public:
>      unsigned last_reg = m->call_ms2sysv_extra_regs + MIN_REGS - 1;
>
>      gcc_assert (m->call_ms2sysv_extra_regs <= MAX_EXTRA_REGS);
> -    return m_regs[last_reg].offset
> -          + (m->call_ms2sysv_pad_out ? 8 : 0)
> -          + STUB_INDEX_OFFSET;
> +    return m_regs[last_reg].offset + STUB_INDEX_OFFSET;
>    }
>
>    /* Returns the offset for the base pointer used by the stub.  */
> @@ -12849,13 +12847,12 @@ ix86_compute_frame_layout (void)
>         {
>           unsigned count = xlogue_layout::count_stub_managed_regs ();
>           m->call_ms2sysv_extra_regs = count - xlogue_layout::MIN_REGS;
> +         m->call_ms2sysv_pad_in = 0;
>         }
>      }
>
>    frame->nregs = ix86_nsaved_regs ();
>    frame->nsseregs = ix86_nsaved_sseregs ();
> -  m->call_ms2sysv_pad_in = 0;
> -  m->call_ms2sysv_pad_out = 0;
>
>    /* 64-bit MS ABI seem to require stack alignment to be always 16,
>       except for function prologues, leaf functions and when the defult
> @@ -12957,15 +12954,7 @@ ix86_compute_frame_layout (void)
>        gcc_assert (!frame->nsseregs);
>
>        m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
> -
> -      /* Select an appropriate layout for incoming stack offset.  */
> -      const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
> -
> -      if ((offset + xlogue.get_stack_space_used ()) & UNITS_PER_WORD)
> -       m->call_ms2sysv_pad_out = 1;
> -
> -      offset += xlogue.get_stack_space_used ();
> -      gcc_assert (!(offset & 0xf));
> +      offset += xlogue_layout::get_instance ().get_stack_space_used ();
>      }
>
>    /* Align and set SSE register save area.  */
> @@ -12993,6 +12982,7 @@ ix86_compute_frame_layout (void)
>
>    /* Align start of frame for local function.  */
>    if (stack_realign_fp
> +      || m->call_ms2sysv
>        || offset != frame->sse_reg_save_offset
>        || size != 0
>        || !crtl->is_leaf
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 1648bdf1556..b08e45f68d4 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2646,17 +2646,13 @@ struct GTY(()) machine_function {
>    BOOL_BITFIELD arg_reg_available : 1;
>
>    /* If true, we're out-of-lining reg save/restore for regs clobbered
> -     by ms_abi functions calling a sysv function.  */
> +     by 64-bit ms_abi functions calling a sysv_abi function.  */
>    BOOL_BITFIELD call_ms2sysv : 1;
>
>    /* If true, the incoming 16-byte aligned stack has an offset (of 8) and
> -     needs padding.  */
> +     needs padding prior to out-of-line stub save/restore area.  */
>    BOOL_BITFIELD call_ms2sysv_pad_in : 1;
>
> -  /* If true, the size of the stub save area plus inline int reg saves will
> -     result in an 8 byte offset, so needs padding.  */
> -  BOOL_BITFIELD call_ms2sysv_pad_out : 1;
> -
>    /* This is the number of extra registers saved by stub (valid range is
>       0-6). Each additional register is only saved/restored by the stubs
>       if all successive ones are. (Will always be zero when using a hard
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
                   ` (5 preceding siblings ...)
  2017-07-31 11:19 ` [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out Daniel Santos
@ 2017-07-31 17:23 ` Daniel Santos
  2017-08-01  6:20   ` Uros Bizjak
  2017-08-08 19:31 ` PING " Daniel Santos
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-07-31 17:23 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak; +Cc: Jan Hubicka, Martin Liska

Well I just learned how to test 32-bit earlier and I've uncovered a 
problem when running 32-bit tests.  Do you want me to commit the the two 
patches (squashed together) in the mean time?

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
  2017-07-31 17:23 ` [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
@ 2017-08-01  6:20   ` Uros Bizjak
  0 siblings, 0 replies; 22+ messages in thread
From: Uros Bizjak @ 2017-08-01  6:20 UTC (permalink / raw)
  To: Daniel Santos; +Cc: gcc-patches, Jan Hubicka, Martin Liska

On Mon, Jul 31, 2017 at 7:28 PM, Daniel Santos <daniel.santos@pobox.com> wrote:
> Well I just learned how to test 32-bit earlier and I've uncovered a problem
> when running 32-bit tests.  Do you want me to commit the the two patches
> (squashed together) in the mean time?

Yes, please commit two approved patches (with correct ChangeLog).

Uros.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/6 v2] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
  2017-07-31 11:19 ` [PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
@ 2017-08-02 23:28   ` Daniel Santos
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-02 23:28 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka; +Cc: Martin Liska, H . J . Lu

My first version of this patch inited m->fs.sp_realigned_fp_last with
the value of m->fs.sp_offset prior to performing the stack realignment.
I had forgotten, however, that when we're saving GP regs using MOV that
we delay SP modification as long as possible so that the value of
m->fs.sp_offset at this point is correct when we've used push, but
incorrect when we've used mov.

This time I've bootstraped with --enable-checking=yes,rtl
--enable-languages=all and reg tested using the below command to test both 64-
and 32-bit code.

  make -kj8 RUNTESTFLAGS="--target_board=unix/\{,-m32\}" check

Original patch description:

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 58 ++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0dc366cf16e..a1f39cd714c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13289,10 +13289,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
    an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14338,35 +14341,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+     we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+			  + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
-     layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+      pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+				GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
 						  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-			     GEN_INT (-stack_alloc_size), -1,
-			     m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
     {
       const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
       rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 			     r.regno);
-      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
     }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14621,14 +14624,15 @@ ix86_expand_prologue (void)
       gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
       /* Record last valid frame pointer offset.  */
-      m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+      m->fs.sp_realigned_fp_last = frame.reg_save_offset;
 
       /* The computation of the size of the re-aligned stack frame means
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
 	 that there's enough storage above the realignment point.  */
-      allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-      if (allocate && !m->call_ms2sysv)
+      allocate = frame.reg_save_offset - m->fs.sp_offset
+		 + frame.stack_realign_allocate;
+      if (allocate)
         pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 				   GEN_INT (-allocate), -1, false);
 
@@ -14637,8 +14641,8 @@ ix86_expand_prologue (void)
 					stack_pointer_rtx,
 					GEN_INT (-align_bytes)));
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-      m->fs.sp_realigned = true;
-      m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      m->fs.sp_realigned_offset = m->fs.sp_offset
+					      - frame.stack_realign_allocate;
       /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
 	 Beyond this point, stack access should be done via choose_baseaddr or
 	 by using sp_valid_at and fp_valid_at to determine the correct base
@@ -14646,6 +14650,8 @@ ix86_expand_prologue (void)
 	 and not physical.  */
       gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
       gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
+      m->fs.sp_realigned = true;
+
       /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 	 is needed to describe where a register is saved using a realigned
 	 stack pointer, so we need to invalidate the stack pointer for that
@@ -14707,7 +14713,7 @@ ix86_expand_prologue (void)
      so probe if the size is non-negative to preserve the protection area.  */
   if (allocate >= 0 && flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
     {
-      /* We expect the registers to be saved when probes are used.  */
+      /* We expect the GP registers to be saved when probes are used.  */
       gcc_assert (int_registers_saved);
 
       if (STACK_CHECK_MOVING_SP)
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/6 v2] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
  2017-07-31 11:19 ` [PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
@ 2017-08-08 19:23   ` Daniel Santos
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-08 19:23 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Sandra Loosemore, Gerald Pfeifer, Joseph Myers
  Cc: Martin Liska

This update adds documentation for the new effective taregts in addition to a
few existing effective targets that were undocumented.

Changes to lib/target-supports.exp and documentation:
* Add effective-targets avx512f and avx512f_runtime (needed for new
  tests).
* Corrects bug in check_avx2_hw_available.
* Adds documentation for effective-targets avx2, avx2_runtime (both
  missing), avx512f and avx512f_runtime.

The following tests are added.  The testcase in the PR is used as a base
and relevant variants are added to test other factors affected by the
patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
              stubs.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/doc/sourcebuild.texi                   |  12 +++
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 ++++
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  26 ++++++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |  26 ++++++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  31 ++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4.c  | 123 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4a.c | 124 +++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4b.c | 124 +++++++++++++++++++++++++++++
 gcc/testsuite/lib/target-supports.exp      |  66 +++++++++++++++
 9 files changed, 548 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 85af8778167..66f040f212d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1852,6 +1852,18 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx2
+Target supports compiling @code{avx2} instructions.
+
+@item avx2_runtime
+Target supports the execution of @code{avx2} instructions.
+
+@item avx512f
+Target supports compiling @code{avx512f} instructions.
+
+@item avx512f_runtime
+Target supports the execution of @code{avx512f} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 00000000000..eb8d767a778
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 00000000000..e868d6c7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 00000000000..071a90534a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 00000000000..5982981b55c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with alloca (and DRAP).  */
+
+#include <alloca.h>
+
+int a[56];
+volatile int b = -12345;
+volatile const int d = 42;
+
+void foo (int *x, int y, int z)
+{
+}
+
+void (*volatile const foo_noinfo)(int *, int, int) = foo;
+
+int main (int argc, char *argv[]) {
+  int c;
+  int *e = alloca (d);
+  foo_noinfo (e, d, 0);
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    foo_noinfo (e, d, c);
+    a[-(b % 56)] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c
new file mode 100644
index 00000000000..1ec54d081cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
@@ -0,0 +1,123 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512 and va_args.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void (*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4a.c b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
new file mode 100644
index 00000000000..faf263170e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline, sysv_abi))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void __attribute__((sysv_abi))
+(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4b.c b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
new file mode 100644
index 00000000000..9bc8995e58e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target avx512f_runtime } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call using save/restore stubs.  */
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline, sysv_abi))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void __attribute__((sysv_abi))
+(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5a6562794b2..554ec10e4b1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1642,6 +1642,29 @@ proc check_avx_os_support_available { } {
     }]
 }
 
+# Return 1 if the target OS supports running AVX executables, 0
+# otherwise.  Cache the result.
+
+proc check_avx512_os_support_available { } {
+    return [check_cached_effective_target avx512_os_support_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+	    expr 0
+	} else {
+	    # Check that OS has AVX512, AVX and SSE saving enabled.
+	    check_runtime_nocache avx512_os_support_available {
+		int main ()
+		{
+		  unsigned int eax, edx;
+
+		  asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (0));
+		  return (eax & 0xe6) != 0xe6;
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports executing SSE instructions, 0
 # otherwise.  Cache the result.
 
@@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
 	    expr 0
 	} else {
 	    check_runtime_nocache avx2_hw_available {
+		#include <stddef.h>
 		#include "cpuid.h"
 		int main ()
 		{
@@ -1842,6 +1866,37 @@ proc check_avx2_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing AVX512 foundation instructions, 0
+# otherwise.  Cache the result.
+
+proc check_avx512f_hw_available { } {
+    return [check_cached_effective_target avx512f_hw_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+	    expr 0
+	} else {
+	    check_runtime_nocache avx512f_hw_available {
+		#include <stddef.h>
+		#include "cpuid.h"
+		int main ()
+		{
+		  unsigned int eax, ebx, ecx, edx;
+		  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)
+		      || !(ecx & bit_OSXSAVE))
+		    return 1;
+
+		  if (__get_cpuid_max (0, NULL) < 7)
+		    return 1;
+
+		  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+		  return !(ebx & bit_AVX512F);
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports running SSE executables, 0 otherwise.
 
 proc check_effective_target_sse_runtime { } {
@@ -1928,6 +1983,17 @@ proc check_effective_target_avx2_runtime { } {
     return 0
 }
 
+# Return 1 if the target supports running AVX512f executables, 0 otherwise.
+
+proc check_effective_target_avx512f_runtime { } {
+    if { [check_effective_target_avx512f]
+	 && [check_avx512f_hw_available]
+	 && [check_avx512_os_support_available] } {
+	return 1
+    }
+    return 0
+}
+
 # Return 1 if we are compiling for 64-bit PowerPC but we do not use direct
 # move instructions for moves from GPR to FPR.
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* PING Re: [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
  2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
                   ` (6 preceding siblings ...)
  2017-07-31 17:23 ` [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
@ 2017-08-08 19:31 ` Daniel Santos
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-08-08 19:31 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H.J. Lu, Martin Liska,
	Rainer Orth, Mike Stump

[-- Attachment #1: Type: text/plain, Size: 1111 bytes --]

Original message: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02005.html

Patches 2 and 3 have been committed and I have corrected the error in
patch 5.  I configuring with --enable-checking=yes,rtl
--enable-languages=all and retested with
RUNTESTFLAGS="--target_board=unix/\{,-m32\}"  The updated patches fix an
error when using mov instead of push and add documentation for changes
to target-supports.exp.  I have included modified ChangeLogs.

In addition to to fixing the ICE, this patch set makes more efficient
use of stack space in some cases the outgoing stack boundary is > 16
bytes and realignment is necessary.  This adds new tests, some of which
require avx512f (gcc/testsuite/gcc.target/i386/pr80969-4*.c) -- these I
have only tested these using Intel SDE.

Below is an updated list of the patches.

1. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02006.html
2. Committed.
3. Committed.
4. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02009.html
5. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00249.html
6. v2 -- https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00618.html

Thanks,
Daniel

[-- Attachment #2: pr80969.gcc.ChangeLog --]
[-- Type: text/plain, Size: 661 bytes --]

2017-08-08  Daniel Santos  <daniel.santos@pobox.com>

	* config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
	Remove
	(ix86_frame::stack_realign_allocate): New field.
	(struct machine_frame_state): Modify comments.
	(machine_frame_state::sp_realigned_fp_end): New field.
	* config/i386/i386.c (ix86_compute_frame_layout): Modify.
	(sp_valid_at): Likewise.
	(fp_valid_at): Likewise.
	(choose_baseaddr): Modify comments.
	(ix86_emit_outlined_ms2sysv_save): Modify.
	(ix86_expand_prologue): Likewise.
	* doc/sourcebuild.texi (avx2, avx2_runtime): Add missing items to
	effective-targets.
	(avx512f, avx512f_runtime): Add new items to effective-tarets.

[-- Attachment #3: pr80969.gcc.testsuite.ChangeLog --]
[-- Type: text/plain, Size: 570 bytes --]

2017-08-08  Daniel Santos  <daniel.santos@pobox.com>

	* lib/target-supports.exp (check_avx512_os_support_available): New
	Procedure.
	(check_avx2_hw_available): Modify.
	(check_avx512f_hw_available): New Procedure.
	(check_effective_target_avx512f_runtime): Likewise.
	* gcc.target/i386/pr80969-1.c: New testcase.
	* gcc.target/i386/pr80969-2a.c: Likewise.
	* gcc.target/i386/pr80969-2.c: Likewise.
	* gcc.target/i386/pr80969-3.c: Likewise.
	* gcc.target/i386/pr80969-4a.c: Likewise.
	* gcc.target/i386/pr80969-4b.c: Likewise.
	* gcc.target/i386/pr80969-4.c: Likewise.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
  2017-08-08 19:31 ` PING " Daniel Santos
@ 2017-08-22 22:44   ` Daniel Santos
  2017-08-22 23:23     ` [PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
                       ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-22 22:44 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H.J. Lu, Martin Liska,
	Rainer Orth, Mike Stump, H.J. Lu

I had to fix a few things for x32 compatibility and I this is ready
now.  H.J. tested on machine with avx512 (including x32) and I've tested
both native x32 and normal x86_64 with m64, m32 and mx32 and all is
well.  I've made more changes to the tests so I'm just submitting a
version 2 of the whole patch set.

OK for trunk?

2017-08-22  Daniel Santos  <daniel.santos@pobox.com>

	* config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
	Remove field.
	(ix86_frame::stack_realign_allocate): New field.
	(struct machine_frame_state): Modify comments.
	(machine_frame_state::sp_realigned_fp_end): New field.
	* config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame
	layout calculation.
	(sp_valid_at): Add assertion to assure no attempt to access invalid
	offset of a realigned stack.
	(fp_valid_at): Likewise.
	(choose_baseaddr): Modify comments.
	(ix86_emit_outlined_ms2sysv_save): Adjust to changes in
	ix86_expand_prologue.
	(ix86_expand_prologue): Modify stack realignment and allocation.
	(ix86_expand_epilogue): Modify comments.

2017-08-22  Daniel Santos  <daniel.santos@pobox.com>

	* gcc.target/i386/pr80969-1.c: New testcase.
	* gcc.target/i386/pr80969-2a.c: Likewise.
	* gcc.target/i386/pr80969-2.c: Likewise.
	* gcc.target/i386/pr80969-3.c: Likewise.
	* gcc.target/i386/pr80969-4a.c: Likewise.
	* gcc.target/i386/pr80969-4b.c: Likewise.
	* gcc.target/i386/pr80969-4.c: Likewise.
	* gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
@ 2017-08-22 23:23     ` Daniel Santos
  2017-08-23  3:51     ` [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-22 23:23 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H . J . Lu

When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 45 ++++++++++++++++++++++++++++++---------------
 gcc/config/i386/i386.h | 18 +++++++++++++-----
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c08ad55fcd9..601e3ef47f6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13177,26 +13177,36 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
-			  && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+    {
+      /* Validate that the cfa_offset isn't in a "no-man's land".  */
+      gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+      return false;
+    }
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
-			  && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+    {
+      /* Validate that the cfa_offset isn't in a "no-man's land".  */
+      gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+      return false;
+    }
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14675,6 +14685,9 @@ ix86_expand_prologue (void)
       int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
       gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+      /* Record last valid frame pointer offset.  */
+      m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
       /* The computation of the size of the re-aligned stack frame means
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14688,13 +14701,15 @@ ix86_expand_prologue (void)
       insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
 					stack_pointer_rtx,
 					GEN_INT (-align_bytes)));
-      /* For the purposes of register save area addressing, the stack
-	 pointer can no longer be used to access anything in the frame
-	 below m->fs.sp_realigned_offset and the frame pointer cannot be
-	 used for anything at or above.  */
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
       m->fs.sp_realigned = true;
       m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+	 Beyond this point, stack access should be done via choose_baseaddr or
+	 by using sp_valid_at and fp_valid_at to determine the correct base
+	 register.  Henceforth, any CFA offset should be thought of as logical
+	 and not physical.  */
+      gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
       gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
       /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 	 is needed to describe where a register is saved using a realigned
@@ -15392,10 +15407,10 @@ ix86_expand_epilogue (int style)
   if (restore_regs_via_mov || frame.nsseregs)
     {
       /* Ensure that the entire register save area is addressable via
-	 the stack pointer, if we will restore via sp.  */
+	 the stack pointer, if we will restore SSE regs via sp.  */
       if (TARGET_64BIT
 	  && m->fs.sp_offset > 0x7fffffff
-	  && !(fp_valid_at (frame.stack_realign_offset) || m->fs.drap_valid)
+	  && sp_valid_at (frame.stack_realign_offset)
 	  && (frame.nsseregs + frame.nregs) != 0)
 	{
 	  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index f4c96fc5cba..ae94b0c7a01 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2512,7 +2512,9 @@ struct GTY(()) ix86_frame
   bool save_regs_using_mov;
 };
 
-/* Machine specific frame tracking during prologue/epilogue generation.  */
+/* Machine specific frame tracking during prologue/epilogue generation.  All
+   values are positive, but since the x86 stack grows downward, are subtratced
+   from the CFA to produce a valid address.  */
 
 struct GTY(()) machine_frame_state
 {
@@ -2550,13 +2552,19 @@ struct GTY(()) machine_frame_state
 
   /* Indicates whether the stack pointer has been re-aligned.  When set,
      SP/FP continue to be relative to the CFA, but the stack pointer
-     should only be used for offsets >= sp_realigned_offset, while
-     the frame pointer should be used for offsets < sp_realigned_offset.
+     should only be used for offsets > sp_realigned_offset, while
+     the frame pointer should be used for offsets <= sp_realigned_fp_last.
      The flags realigned and sp_realigned are mutually exclusive.  */
   BOOL_BITFIELD sp_realigned : 1;
 
-  /* If sp_realigned is set, this is the offset from the CFA that the
-     stack pointer was realigned to.  */
+  /* If sp_realigned is set, this is the last valid offset from the CFA
+     that can be used for access with the frame pointer.  */
+  HOST_WIDE_INT sp_realigned_fp_last;
+
+  /* If sp_realigned is set, this is the offset from the CFA that the stack
+     pointer was realigned, and may or may not be equal to sp_realigned_fp_last.
+     Access via the stack pointer is only valid for offsets that are greater than
+     this value.  */
   HOST_WIDE_INT sp_realigned_offset;
 };
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
  2017-08-22 23:23     ` [PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
@ 2017-08-23  3:51     ` Daniel Santos
  2017-08-23 13:46       ` Uros Bizjak
  2017-08-23  4:17     ` [PATCH 2/4] [i386] Modify ix86_compute_frame_layout Daniel Santos
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Daniel Santos @ 2017-08-23  3:51 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H . J . Lu, Rainer Orth,
	Mike Stump

Changes to lib/target-supports.exp and documentation:
* Add effective-targets avx512f and avx512f_runtime (needed for new
  tests).
* Corrects bug in check_avx2_hw_available.
* Adds documentation for effective-targets avx2, avx2_runtime (both
  missing), avx512f and avx512f_runtime.

The following tests are added.  The testcase in the PR is used as a base
and relevant variants are added to test other factors affected by the
patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
	      stubs.
pr80969-4.h   Common header for pr80969-4*.c.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/doc/sourcebuild.texi                   |  12 +++
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 ++++
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  27 +++++++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |   8 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  32 ++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4.c  |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4.h  | 119 +++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr80969-4a.c |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4b.c |   9 +++
 gcc/testsuite/lib/target-supports.exp      |  66 ++++++++++++++++
 10 files changed, 307 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e6313dc031e..0bf4d6afeb6 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx2
+Target supports compiling @code{avx2} instructions.
+
+@item avx2_runtime
+Target supports the execution of @code{avx2} instructions.
+
+@item avx512f
+Target supports compiling @code{avx512f} instructions.
+
+@item avx512f_runtime
+Target supports the execution of @code{avx512f} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 00000000000..e0520b45c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { ! x32 } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 00000000000..f885dee6512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 00000000000..baea0796d24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target { lp64 && avx512f_runtime } } } */
+/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+#include "pr80969-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 00000000000..d902a771cc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,32 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with alloca (and DRAP).  */
+
+#include <alloca.h>
+
+int a[56];
+volatile int b = -12345;
+volatile const int d = 42;
+
+void foo (int *x, int y, int z)
+{
+}
+
+void (*volatile const foo_noinfo)(int *, int, int) = foo;
+
+int main (int argc, char *argv[]) {
+  int c;
+  int *e = alloca (d);
+  foo_noinfo (e, d, 0);
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    foo_noinfo (e, d, c);
+    a[-(b % 56)] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c
new file mode 100644
index 00000000000..d5026657cd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
@@ -0,0 +1,9 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512 and va_args.  */
+
+#define CALLEE_ABI ms_abi
+#include "pr80969-4.h"
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.h b/gcc/testsuite/gcc.target/i386/pr80969-4.h
new file mode 100644
index 00000000000..a7ff6456242
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4.h
@@ -0,0 +1,119 @@
+
+#include <stdarg.h>
+#include <assert.h>
+
+#include "avx-check.h"
+
+int a[56];
+int b;
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
+__m128i n3 = { 893, -3180 } ;
+int n4 = 324;
+double n5 = 103.3;
+__m128i n6 = { -123, 2 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128i n10 = { 1233, -100 };
+int n11 = 407;
+double n12 = 304.9;
+__m128i n13 = { 233, -110 };
+__m256i n14 = { -1233, 23, 34, -1003 };
+__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m512d e2;
+__m128i e3;
+int e4;
+double e5;
+__m128i e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128i e10;
+int e11;
+double e12;
+__m128i e13;
+__m256i e14;
+__m512i e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline, CALLEE_ABI))
+bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
+{
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  e4 = va_arg (va_arglist, int);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128i);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128i);
+  e11 = va_arg (va_arglist, int);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128i);
+  e14 = va_arg (va_arglist, __m256i);
+  e15 = va_arg (va_arglist, __m512i);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+}
+
+void __attribute__((CALLEE_ABI))
+(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
+
+static void
+__attribute__((noinline))
+foo (__m128 a1, __m512d a2, __m128i a3, ...)
+{
+  va_list va_arglist;
+  int c;
+
+  va_start (va_arglist, a3);
+  bar_noinfo (a1, a2, a3, va_arglist);
+  va_end (va_arglist);
+
+  for (; b; b++) {
+    c = b;
+    if (b & 1)
+      c = 2;
+    a[b] = c;
+  }
+}
+void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
+
+static void
+avx_test (void)
+{
+  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+       n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4a.c b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
new file mode 100644
index 00000000000..e5d4cadb045
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
@@ -0,0 +1,9 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call.  */
+
+#define CALLEE_ABI sysv_abi
+#include "pr80969-4.h"
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4b.c b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
new file mode 100644
index 00000000000..ae8759249eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
@@ -0,0 +1,9 @@
+/* { dg-do run { target { lp64 && avx512f_runtime } } } */
+/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test with avx512, va_args, and ms to sysv call using save/restore stubs.  */
+
+#define CALLEE_ABI sysv_abi
+#include "pr80969-4.h"
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5219fbf4671..4383fd59cd7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1642,6 +1642,29 @@ proc check_avx_os_support_available { } {
     }]
 }
 
+# Return 1 if the target OS supports running AVX executables, 0
+# otherwise.  Cache the result.
+
+proc check_avx512_os_support_available { } {
+    return [check_cached_effective_target avx512_os_support_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+	    expr 0
+	} else {
+	    # Check that OS has AVX512, AVX and SSE saving enabled.
+	    check_runtime_nocache avx512_os_support_available {
+		int main ()
+		{
+		  unsigned int eax, edx;
+
+		  asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (0));
+		  return (eax & 0xe6) != 0xe6;
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports executing SSE instructions, 0
 # otherwise.  Cache the result.
 
@@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
 	    expr 0
 	} else {
 	    check_runtime_nocache avx2_hw_available {
+		#include <stddef.h>
 		#include "cpuid.h"
 		int main ()
 		{
@@ -1842,6 +1866,37 @@ proc check_avx2_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing AVX512 foundation instructions, 0
+# otherwise.  Cache the result.
+
+proc check_avx512f_hw_available { } {
+    return [check_cached_effective_target avx512f_hw_available {
+	# If this is not the right target then we can skip the test.
+	if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+	    expr 0
+	} else {
+	    check_runtime_nocache avx512f_hw_available {
+		#include <stddef.h>
+		#include "cpuid.h"
+		int main ()
+		{
+		  unsigned int eax, ebx, ecx, edx;
+		  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)
+		      || !(ecx & bit_OSXSAVE))
+		    return 1;
+
+		  if (__get_cpuid_max (0, NULL) < 7)
+		    return 1;
+
+		  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+		  return !(ebx & bit_AVX512F);
+		}
+	    } ""
+	}
+    }]
+}
+
 # Return 1 if the target supports running SSE executables, 0 otherwise.
 
 proc check_effective_target_sse_runtime { } {
@@ -1928,6 +1983,17 @@ proc check_effective_target_avx2_runtime { } {
     return 0
 }
 
+# Return 1 if the target supports running AVX512f executables, 0 otherwise.
+
+proc check_effective_target_avx512f_runtime { } {
+    if { [check_effective_target_avx512f]
+	 && [check_avx512f_hw_available]
+	 && [check_avx512_os_support_available] } {
+	return 1
+    }
+    return 0
+}
+
 # Return 1 if we are compiling for 64-bit PowerPC but we do not use direct
 # move instructions for moves from GPR to FPR.
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4] [i386] Modify ix86_compute_frame_layout
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
  2017-08-22 23:23     ` [PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
  2017-08-23  3:51     ` [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
@ 2017-08-23  4:17     ` Daniel Santos
  2017-08-23  4:18     ` [PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
  2017-08-23 13:53     ` [PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Uros Bizjak
  4 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-23  4:17 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H . J . Lu

These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 116 +++++++++++++++++++++++++++++++++----------------
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 601e3ef47f6..30e84dd5303 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12960,6 +12960,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+    {
+      gcc_assert (stack_alignment_needed >= 16);
+      gcc_assert (!frame->nsseregs);
+    }
+
   /* For SEH we have to limit the amount of code movement into the prologue.
      At present we do this via a BLOCKAGE, at which point there's very little
      scheduling that can be done, which means that there's very little point
@@ -13022,54 +13030,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
     frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
-     is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
-     directly comparable with values below this point.  Use sp_valid_at
-     to determine if the stack pointer is valid for a given offset and
-     fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-    offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
     {
-      gcc_assert (stack_alignment_needed >= 16);
-      gcc_assert (!frame->nsseregs);
+      /* We may need a 16-byte aligned stack for the remainder of the
+	 register save area, but the stack frame for the local function
+	 may require a greater alignment if using AVX/2/512.  In order
+	 to avoid wasting space, we first calculate the space needed for
+	 the rest of the register saves, add that to the stack pointer,
+	 and then realign the stack to the boundary of the start of the
+	 frame for the local function.  */
+      HOST_WIDE_INT space_needed = 0;
+      HOST_WIDE_INT sse_reg_space_needed = 0;
 
-      m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-      offset += xlogue_layout::get_instance ().get_stack_space_used ();
-    }
+      if (TARGET_64BIT)
+	{
+	  if (m->call_ms2sysv)
+	    {
+	      m->call_ms2sysv_pad_in = 0;
+	      space_needed = xlogue_layout::get_instance ().get_stack_space_used ();
+	    }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-    {
-      /* The only ABI that has saved SSE registers (Win64) also has a
-	 16-byte aligned default stack.  However, many programs violate
-	 the ABI, and Wine64 forces stack realignment to compensate.
+	  else if (frame->nsseregs)
+	    /* The only ABI that has saved SSE registers (Win64) also has a
+	       16-byte aligned default stack.  However, many programs violate
+	       the ABI, and Wine64 forces stack realignment to compensate.  */
+	    space_needed = frame->nsseregs * 16;
+
+	  sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+	  /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+	     rounding to be pedantic.  */
+	  space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+	}
+      else
+	space_needed = frame->va_arg_size;
+
+      /* Record the allocation size required prior to the realignment AND.  */
+      frame->stack_realign_allocate = space_needed;
+
+      /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+	 before this point are not directly comparable with values below
+	 this point.  Use sp_valid_at to determine if the stack pointer is
+	 valid for a given offset, fp_valid_at for the frame pointer, or
+	 choose_baseaddr to have a base register chosen for you.
 
-	 If the incoming stack boundary is at least 16 bytes, or DRAP is
-	 required and the DRAP re-alignment boundary is at least 16 bytes,
-	 then we want the SSE register save area properly aligned.  */
-      if (ix86_incoming_stack_boundary >= 128
-	       || (stack_realign_drap && stack_alignment_needed >= 16))
-	offset = ROUND_UP (offset, 16);
-      offset += frame->nsseregs * 16;
-      frame->stack_realign_allocate_offset = offset;
+	 Note that the result of (frame->stack_realign_offset
+	 & (stack_alignment_needed - 1)) may not equal zero.  */
+      offset = ROUND_UP (offset + space_needed, stack_alignment_needed);
+      frame->stack_realign_offset = offset - space_needed;
+      frame->sse_reg_save_offset = frame->stack_realign_offset
+							+ sse_reg_space_needed;
     }
+  else
+    {
+      frame->stack_realign_offset = offset;
 
-  frame->sse_reg_save_offset = offset;
+      if (TARGET_64BIT && m->call_ms2sysv)
+	{
+	  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
+	  offset += xlogue_layout::get_instance ().get_stack_space_used ();
+	}
 
-  /* Va-arg area */
-  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-  offset += frame->va_arg_size;
+      /* Align and set SSE register save area.  */
+      else if (frame->nsseregs)
+	{
+	  /* If the incoming stack boundary is at least 16 bytes, or DRAP is
+	     required and the DRAP re-alignment boundary is at least 16 bytes,
+	     then we want the SSE register save area properly aligned.  */
+	  if (ix86_incoming_stack_boundary >= 128
+		  || (stack_realign_drap && stack_alignment_needed >= 16))
+	    offset = ROUND_UP (offset, 16);
+	  offset += frame->nsseregs * 16;
+	}
+      frame->sse_reg_save_offset = offset;
+      offset += frame->va_arg_size;
+    }
 
   /* Align start of frame for local function.  */
-  if (stack_realign_fp
-      || m->call_ms2sysv
-      || offset != frame->sse_reg_save_offset
+  if (m->call_ms2sysv
+      || frame->va_arg_size != 0
       || size != 0
       || !crtl->is_leaf
       || cfun->calls_alloca
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ae94b0c7a01..dad6499ca1d 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2503,7 +2503,7 @@ struct GTY(()) ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_allocate;
   HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al.
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
                       ` (2 preceding siblings ...)
  2017-08-23  4:17     ` [PATCH 2/4] [i386] Modify ix86_compute_frame_layout Daniel Santos
@ 2017-08-23  4:18     ` Daniel Santos
  2017-08-23 13:53     ` [PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Uros Bizjak
  4 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-23  4:18 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Jan Hubicka, H . J . Lu

My first version of this patch inited m->fs.sp_realigned_fp_last with
the value of m->fs.sp_offset prior to performing the stack realignment.
I had forgotten, however, that when we're saving GP regs using MOV that
we delay SP modification as long as possible so that the value of
m->fs.sp_offset at this point is correct when we've used push, but
incorrect when we've used mov.

This has been tested on both x86_64-pc-linux-gnu{,x32} with
--target_board=unix/\{-m64,-mx32,-m32\}.

Original patch description:

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
---
 gcc/config/i386/i386.c | 58 ++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 30e84dd5303..dbc771da8aa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13359,10 +13359,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
    an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14445,35 +14448,35 @@ ix86_emit_outlined_ms2sysv_save (const struct ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+     we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+			  + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
-     layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+      pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+				GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
 						  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-			     GEN_INT (-stack_alloc_size), -1,
-			     m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
     {
       const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
       rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 			     r.regno);
-      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+      RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
     }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14728,14 +14731,15 @@ ix86_expand_prologue (void)
       gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
       /* Record last valid frame pointer offset.  */
-      m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+      m->fs.sp_realigned_fp_last = frame.reg_save_offset;
 
       /* The computation of the size of the re-aligned stack frame means
 	 that we must allocate the size of the register save area before
 	 performing the actual alignment.  Otherwise we cannot guarantee
 	 that there's enough storage above the realignment point.  */
-      allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
-      if (allocate && !m->call_ms2sysv)
+      allocate = frame.reg_save_offset - m->fs.sp_offset
+		 + frame.stack_realign_allocate;
+      if (allocate)
         pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 				   GEN_INT (-allocate), -1, false);
 
@@ -14744,8 +14748,8 @@ ix86_expand_prologue (void)
 					stack_pointer_rtx,
 					GEN_INT (-align_bytes)));
       m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-      m->fs.sp_realigned = true;
-      m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+      m->fs.sp_realigned_offset = m->fs.sp_offset
+					      - frame.stack_realign_allocate;
       /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
 	 Beyond this point, stack access should be done via choose_baseaddr or
 	 by using sp_valid_at and fp_valid_at to determine the correct base
@@ -14753,6 +14757,8 @@ ix86_expand_prologue (void)
 	 and not physical.  */
       gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
       gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
+      m->fs.sp_realigned = true;
+
       /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 	 is needed to describe where a register is saved using a realigned
 	 stack pointer, so we need to invalidate the stack pointer for that
@@ -14814,7 +14820,7 @@ ix86_expand_prologue (void)
      so probe if the size is non-negative to preserve the protection area.  */
   if (allocate >= 0 && flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
     {
-      /* We expect the registers to be saved when probes are used.  */
+      /* We expect the GP registers to be saved when probes are used.  */
       gcc_assert (int_registers_saved);
 
       if (STACK_CHECK_MOVING_SP)
-- 
2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
  2017-08-23  3:51     ` [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
@ 2017-08-23 13:46       ` Uros Bizjak
  2017-08-24  1:32         ` Daniel Santos
  0 siblings, 1 reply; 22+ messages in thread
From: Uros Bizjak @ 2017-08-23 13:46 UTC (permalink / raw)
  To: Daniel Santos
  Cc: gcc-patches, Jan Hubicka, H . J . Lu, Rainer Orth, Mike Stump

On Wed, Aug 23, 2017 at 12:50 AM, Daniel Santos <daniel.santos@pobox.com> wrote:
> Changes to lib/target-supports.exp and documentation:
> * Add effective-targets avx512f and avx512f_runtime (needed for new
>   tests).
> * Corrects bug in check_avx2_hw_available.
> * Adds documentation for effective-targets avx2, avx2_runtime (both
>   missing), avx512f and avx512f_runtime.
>
> The following tests are added.  The testcase in the PR is used as a base
> and relevant variants are added to test other factors affected by the
> patch set.
>
> pr80969-1.c   Base test case.
> pr80969-2.c   With ms to sysv call.
> pr80969-2a.c  With ms to sysv call using stubs.
> pr80969-3.c   With alloca (for DRAP test).
> pr80969-4.c   With va_args passed via va_list
> pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
> pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
>               stubs.
> pr80969-4.h   Common header for pr80969-4*.c.
>
> Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
> ---
>  gcc/doc/sourcebuild.texi                   |  12 +++
>  gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 ++++
>  gcc/testsuite/gcc.target/i386/pr80969-2.c  |  27 +++++++
>  gcc/testsuite/gcc.target/i386/pr80969-2a.c |   8 ++
>  gcc/testsuite/gcc.target/i386/pr80969-3.c  |  32 ++++++++
>  gcc/testsuite/gcc.target/i386/pr80969-4.c  |   9 +++
>  gcc/testsuite/gcc.target/i386/pr80969-4.h  | 119 +++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr80969-4a.c |   9 +++
>  gcc/testsuite/gcc.target/i386/pr80969-4b.c |   9 +++
>  gcc/testsuite/lib/target-supports.exp      |  66 ++++++++++++++++
>  10 files changed, 307 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index e6313dc031e..0bf4d6afeb6 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions.
>  @item avx_runtime
>  Target supports the execution of @code{avx} instructions.
>
> +@item avx2
> +Target supports compiling @code{avx2} instructions.
> +
> +@item avx2_runtime
> +Target supports the execution of @code{avx2} instructions.
> +
> +@item avx512f
> +Target supports compiling @code{avx512f} instructions.
> +
> +@item avx512f_runtime
> +Target supports the execution of @code{avx512f} instructions.
> +
>  @item cell_hw
>  Test system can execute AltiVec and Cell PPU instructions.
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c b/gcc/testsuite/gcc.target/i386/pr80969-1.c
> new file mode 100644
> index 00000000000..e0520b45c40
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run { target { ! x32 } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +int a[56];
> +int b;
> +int main (int argc, char *argv[]) {
> +  int c;
> +  for (; b; b++) {
> +    c = b;
> +    if (b & 1)
> +      c = 2;
> +    a[b] = c;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c b/gcc/testsuite/gcc.target/i386/pr80969-2.c
> new file mode 100644
> index 00000000000..f885dee6512
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
> +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test when calling a sysv func.  */
> +
> +int a[56];
> +int b;
> +
> +static void __attribute__((sysv_abi)) sysv ()
> +{
> +}
> +
> +void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
> +
> +int main (int argc, char *argv[]) {
> +  int c;
> +  sysv_noinfo ();
> +  for (; b; b++) {
> +    c = b;
> +    if (b & 1)
> +      c = 2;
> +    a[b] = c;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
> new file mode 100644
> index 00000000000..baea0796d24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
> @@ -0,0 +1,8 @@
> +/* { dg-do run { target { lp64 && avx512f_runtime } } } */
> +/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test when calling a sysv func using save/restore stubs.  */
> +
> +#include "pr80969-2.c"
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c b/gcc/testsuite/gcc.target/i386/pr80969-3.c
> new file mode 100644
> index 00000000000..d902a771cc8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
> +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test with alloca (and DRAP).  */
> +
> +#include <alloca.h>
> +
> +int a[56];
> +volatile int b = -12345;
> +volatile const int d = 42;
> +
> +void foo (int *x, int y, int z)
> +{
> +}
> +
> +void (*volatile const foo_noinfo)(int *, int, int) = foo;
> +
> +int main (int argc, char *argv[]) {
> +  int c;
> +  int *e = alloca (d);
> +  foo_noinfo (e, d, 0);
> +  for (; b; b++) {
> +    c = b;
> +    if (b & 1)
> +      c = 2;
> +    foo_noinfo (e, d, c);
> +    a[-(b % 56)] = c;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.c b/gcc/testsuite/gcc.target/i386/pr80969-4.c
> new file mode 100644
> index 00000000000..d5026657cd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-4.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
> +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test with avx512 and va_args.  */
> +
> +#define CALLEE_ABI ms_abi
> +#include "pr80969-4.h"
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4.h b/gcc/testsuite/gcc.target/i386/pr80969-4.h
> new file mode 100644
> index 00000000000..a7ff6456242
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-4.h
> @@ -0,0 +1,119 @@
> +
> +#include <stdarg.h>
> +#include <assert.h>
> +
> +#include "avx-check.h"
> +
> +int a[56];
> +int b;
> +
> +__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
> +__m512d n2 = { -93.83, 893.318, 3994.3, -39484.0, 830.32, -328.32, 3.14159, 2.99792 };
> +__m128i n3 = { 893, -3180 } ;
> +int n4 = 324;
> +double n5 = 103.3;
> +__m128i n6 = { -123, 2 };
> +__m128d n7 = { -91.387, -8193.518 };
> +__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
> +__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
> +__m128i n10 = { 1233, -100 };
> +int n11 = 407;
> +double n12 = 304.9;
> +__m128i n13 = { 233, -110 };
> +__m256i n14 = { -1233, 23, 34, -1003 };
> +__m512i n15 = { -393, -180, 213.4, 1119.03, -8193.518, -100, 304.9, 2.99792 };
> +__m128d n16 = { 73.0, 63.18 };
> +__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
> +__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
> +
> +__m128 e1;
> +__m512d e2;
> +__m128i e3;
> +int e4;
> +double e5;
> +__m128i e6;
> +__m128d e7;
> +__m256d e8;
> +__m128 e9;
> +__m128i e10;
> +int e11;
> +double e12;
> +__m128i e13;
> +__m256i e14;
> +__m512i e15;
> +__m128d e16;
> +__m256 e17;
> +__m128 e18;
> +
> +static void
> +__attribute__((noinline, CALLEE_ABI))
> +bar (__m128 a1, __m512d a2, __m128i a3, va_list va_arglist)
> +{
> +  e1 = a1;
> +  e2 = a2;
> +  e3 = a3;
> +  e4 = va_arg (va_arglist, int);
> +  e5 = va_arg (va_arglist, double);
> +  e6 = va_arg (va_arglist, __m128i);
> +  e7 = va_arg (va_arglist, __m128d);
> +  e8 = va_arg (va_arglist, __m256d);
> +  e9 = va_arg (va_arglist, __m128);
> +  e10 = va_arg (va_arglist, __m128i);
> +  e11 = va_arg (va_arglist, int);
> +  e12 = va_arg (va_arglist, double);
> +  e13 = va_arg (va_arglist, __m128i);
> +  e14 = va_arg (va_arglist, __m256i);
> +  e15 = va_arg (va_arglist, __m512i);
> +  e16 = va_arg (va_arglist, __m128d);
> +  e17 = va_arg (va_arglist, __m256);
> +  e18 = va_arg (va_arglist, __m128);
> +}
> +
> +void __attribute__((CALLEE_ABI))
> +(*volatile const bar_noinfo) (__m128, __m512d, __m128i, va_list) = bar;
> +
> +static void
> +__attribute__((noinline))
> +foo (__m128 a1, __m512d a2, __m128i a3, ...)
> +{
> +  va_list va_arglist;
> +  int c;
> +
> +  va_start (va_arglist, a3);
> +  bar_noinfo (a1, a2, a3, va_arglist);
> +  va_end (va_arglist);
> +
> +  for (; b; b++) {
> +    c = b;
> +    if (b & 1)
> +      c = 2;
> +    a[b] = c;
> +  }
> +}
> +void (*volatile const foo_noinfo) (__m128, __m512d, __m128i, ...) = foo;
> +
> +static void
> +avx_test (void)
> +{
> +  foo (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
> +       n13, n14, n15, n16, n17, n18);
> +  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
> +  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
> +  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
> +  assert (n4 == e4);
> +  assert (n5 == e5);
> +  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
> +  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
> +  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
> +  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
> +  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
> +  assert (n11 == e11);
> +  assert (n12 == e12);
> +  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
> +  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
> +  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
> +  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
> +  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
> +  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4a.c b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
> new file mode 100644
> index 00000000000..e5d4cadb045
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-4a.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
> +/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test with avx512, va_args, and ms to sysv call.  */
> +
> +#define CALLEE_ABI sysv_abi
> +#include "pr80969-4.h"
> diff --git a/gcc/testsuite/gcc.target/i386/pr80969-4b.c b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
> new file mode 100644
> index 00000000000..ae8759249eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr80969-4b.c
> @@ -0,0 +1,9 @@
> +/* { dg-do run { target { lp64 && avx512f_runtime } } } */
> +/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
> +/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +/* Test with avx512, va_args, and ms to sysv call using save/restore stubs.  */
> +
> +#define CALLEE_ABI sysv_abi
> +#include "pr80969-4.h"
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 5219fbf4671..4383fd59cd7 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -1642,6 +1642,29 @@ proc check_avx_os_support_available { } {
>      }]
>  }
>
> +# Return 1 if the target OS supports running AVX executables, 0
> +# otherwise.  Cache the result.
> +
> +proc check_avx512_os_support_available { } {
> +    return [check_cached_effective_target avx512_os_support_available {
> +       # If this is not the right target then we can skip the test.
> +       if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
> +           expr 0
> +       } else {
> +           # Check that OS has AVX512, AVX and SSE saving enabled.
> +           check_runtime_nocache avx512_os_support_available {
> +               int main ()
> +               {
> +                 unsigned int eax, edx;
> +
> +                 asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (0));
> +                 return (eax & 0xe6) != 0xe6;
> +               }
> +           } ""
> +       }
> +    }]
> +}
> +
>  # Return 1 if the target supports executing SSE instructions, 0
>  # otherwise.  Cache the result.
>
> @@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
>             expr 0
>         } else {
>             check_runtime_nocache avx2_hw_available {
> +               #include <stddef.h>

Why is the above include needed?

>                 #include "cpuid.h"
>                 int main ()
>                 {
> @@ -1842,6 +1866,37 @@ proc check_avx2_hw_available { } {
>      }]
>  }
>
> +# Return 1 if the target supports executing AVX512 foundation instructions, 0
> +# otherwise.  Cache the result.
> +
> +proc check_avx512f_hw_available { } {
> +    return [check_cached_effective_target avx512f_hw_available {
> +       # If this is not the right target then we can skip the test.
> +       if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
> +           expr 0
> +       } else {
> +           check_runtime_nocache avx512f_hw_available {
> +               #include <stddef.h>

The above include is not needed.

> +               #include "cpuid.h"
> +               int main ()
> +               {
> +                 unsigned int eax, ebx, ecx, edx;
> +                 if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)
> +                     || !(ecx & bit_OSXSAVE))
> +                   return 1;
> +
> +                 if (__get_cpuid_max (0, NULL) < 7)
> +                   return 1;
> +
> +                 __cpuid_count (7, 0, eax, ebx, ecx, edx);
> +
> +                 return !(ebx & bit_AVX512F);
> +               }
> +           } ""
> +       }
> +    }]
> +}
> +
>  # Return 1 if the target supports running SSE executables, 0 otherwise.
>
>  proc check_effective_target_sse_runtime { } {
> @@ -1928,6 +1983,17 @@ proc check_effective_target_avx2_runtime { } {
>      return 0
>  }
>
> +# Return 1 if the target supports running AVX512f executables, 0 otherwise.
> +
> +proc check_effective_target_avx512f_runtime { } {
> +    if { [check_effective_target_avx512f]
> +        && [check_avx512f_hw_available]
> +        && [check_avx512_os_support_available] } {
> +       return 1
> +    }
> +    return 0
> +}
> +
>  # Return 1 if we are compiling for 64-bit PowerPC but we do not use direct
>  # move instructions for moves from GPR to FPR.
>
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f
  2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
                       ` (3 preceding siblings ...)
  2017-08-23  4:18     ` [PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
@ 2017-08-23 13:53     ` Uros Bizjak
  4 siblings, 0 replies; 22+ messages in thread
From: Uros Bizjak @ 2017-08-23 13:53 UTC (permalink / raw)
  To: Daniel Santos
  Cc: gcc-patches, Jan Hubicka, H.J. Lu, Martin Liska, Rainer Orth, Mike Stump

On Wed, Aug 23, 2017 at 12:34 AM, Daniel Santos <daniel.santos@pobox.com> wrote:
> I had to fix a few things for x32 compatibility and I this is ready
> now.  H.J. tested on machine with avx512 (including x32) and I've tested
> both native x32 and normal x86_64 with m64, m32 and mx32 and all is
> well.  I've made more changes to the tests so I'm just submitting a
> version 2 of the whole patch set.
>
> OK for trunk?
>
> 2017-08-22  Daniel Santos  <daniel.santos@pobox.com>
>
>         * config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
>         Remove field.
>         (ix86_frame::stack_realign_allocate): New field.
>         (struct machine_frame_state): Modify comments.
>         (machine_frame_state::sp_realigned_fp_end): New field.
>         * config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame
>         layout calculation.
>         (sp_valid_at): Add assertion to assure no attempt to access invalid
>         offset of a realigned stack.
>         (fp_valid_at): Likewise.
>         (choose_baseaddr): Modify comments.
>         (ix86_emit_outlined_ms2sysv_save): Adjust to changes in
>         ix86_expand_prologue.
>         (ix86_expand_prologue): Modify stack realignment and allocation.
>         (ix86_expand_epilogue): Modify comments.
>
> 2017-08-22  Daniel Santos  <daniel.santos@pobox.com>
>
>         * gcc.target/i386/pr80969-1.c: New testcase.
>         * gcc.target/i386/pr80969-2a.c: Likewise.
>         * gcc.target/i386/pr80969-2.c: Likewise.
>         * gcc.target/i386/pr80969-3.c: Likewise.
>         * gcc.target/i386/pr80969-4a.c: Likewise.
>         * gcc.target/i386/pr80969-4b.c: Likewise.
>         * gcc.target/i386/pr80969-4.c: Likewise.
>         * gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c

I went through the patchset one more time, and I didn't find anything
that would stick out. I have a comment in 4/4, but nothing critical.

IMO extensive testsuite would catch possible bug there, so

OK for mainline.

(But please stay around to fix any possible fallout...)

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available
  2017-08-23 13:46       ` Uros Bizjak
@ 2017-08-24  1:32         ` Daniel Santos
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Santos @ 2017-08-24  1:32 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jan Hubicka, H . J . Lu, Rainer Orth, Mike Stump


On 08/23/2017 08:26 AM, Uros Bizjak wrote:

>> @@ -1822,6 +1845,7 @@ proc check_avx2_hw_available { } {
>>             expr 0
>>         } else {
>>             check_runtime_nocache avx2_hw_available {
>> +               #include <stddef.h>
> Why is the above include needed?

It is only needed to #define NULL.  Without the include, I've had this
function fail due to NULL being undefined.

Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-23 20:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-31 11:16 [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
2017-07-31 11:19 ` [PATCH 1/6] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
2017-07-31 11:19 ` [PATCH 6/6] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
2017-08-08 19:23   ` [PATCH 6/6 v2] " Daniel Santos
2017-07-31 11:19 ` [PATCH 5/6] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
2017-08-02 23:28   ` [PATCH 5/6 v2] " Daniel Santos
2017-07-31 11:19 ` [PATCH 2/6] [i386] Remove ix86_frame::outlined_save_offset Daniel Santos
2017-07-31 13:53   ` Uros Bizjak
2017-07-31 11:19 ` [PATCH 4/6] [i386] Modify ix86_compute_frame_layout Daniel Santos
2017-07-31 11:19 ` [PATCH 3/6] [i386] Remove machine_function::call_ms2sysv_pad_out Daniel Santos
2017-07-31 13:59   ` Uros Bizjak
2017-07-31 17:23 ` [PATCH 0/6] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Daniel Santos
2017-08-01  6:20   ` Uros Bizjak
2017-08-08 19:31 ` PING " Daniel Santos
2017-08-22 22:44   ` [PATCH v4 0/4] " Daniel Santos
2017-08-22 23:23     ` [PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at Daniel Santos
2017-08-23  3:51     ` [PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available Daniel Santos
2017-08-23 13:46       ` Uros Bizjak
2017-08-24  1:32         ` Daniel Santos
2017-08-23  4:17     ` [PATCH 2/4] [i386] Modify ix86_compute_frame_layout Daniel Santos
2017-08-23  4:18     ` [PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al Daniel Santos
2017-08-23 13:53     ` [PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).