[PATCH 01/10] i386: Properly encode vector registers in vector move

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (6 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 06/10] i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-26 22:42   ` Jeff Law
  2020-02-15 15:26 ` [PATCH 02/10] i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV H.J. Lu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

On x86, when AVX and AVX512 are enabled, vector move instructions can
be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):

   0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
   4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2

We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
and 256-bit vector moves.  Mode attributes on x86 vector move patterns
indicate target preferences of vector move encoding.  For vector register
to vector register move, we can use 512-bit vector move instructions to
move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
if upper 16 vector registers aren't used.  This patch adds a function,
ix86_output_ssemov, to generate vector moves:

1. If zmm registers are used, use EVEX encoding.
2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
will be generated.
3. If xmm16-xmm31/ymm16-ymm31 registers are used:
   a. With AVX512VL, AVX512VL vector moves will be generated.
   b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
      move will be done with zmm register move.

Tested on AVX2 and AVX512 with and without --with-arch=native.

gcc/

	PR target/89229
	PR target/89346
	* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
	* config/i386/i386.c (ix86_get_ssemov): New function.
	(ix86_output_ssemov): Likewise.
	* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
	ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
	check.

gcc/testsuite/

	PR target/89229
	PR target/89346
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
	* gcc.target/i386/pr89229-2a.c: New test.
---
 gcc/config/i386/i386-protos.h                 |   2 +
 gcc/config/i386/i386.c                        | 274 ++++++++++++++++++
 gcc/config/i386/sse.md                        |  98 +------
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
 gcc/testsuite/gcc.target/i386/pr89346.c       |  15 +
 5 files changed, 296 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 266381ca5a6..39fcaa0ad5f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -38,6 +38,8 @@ extern void ix86_expand_split_stack_prologue (void);
 extern void ix86_output_addr_vec_elt (FILE *, int);
 extern void ix86_output_addr_diff_elt (FILE *, int, int);
 
+extern const char *ix86_output_ssemov (rtx_insn *, rtx *);
+
 extern enum calling_abi ix86_cfun_abi (void);
 extern enum calling_abi ix86_function_type_abi (const_tree);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dac7a3fc5fd..26f8c9494b9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4915,6 +4915,280 @@ ix86_pre_reload_split (void)
 	  && !(cfun->curr_properties & PROP_rtl_split_insns));
 }
 
+/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
+   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
+   TARGET_AVX512VL or it is a register to register move which can
+   be done with zmm register move. */
+
+static const char *
+ix86_get_ssemov (rtx *operands, unsigned size,
+		 enum attr_mode insn_mode, machine_mode mode)
+{
+  char buf[128];
+  bool misaligned_p = (misaligned_operand (operands[0], mode)
+		       || misaligned_operand (operands[1], mode));
+  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
+		     || EXT_REX_SSE_REG_P (operands[1]));
+  machine_mode scalar_mode;
+
+  const char *opcode = NULL;
+  enum
+    {
+      opcode_int,
+      opcode_float,
+      opcode_double
+    } type = opcode_int;
+
+  switch (insn_mode)
+    {
+    case MODE_V16SF:
+    case MODE_V8SF:
+    case MODE_V4SF:
+      scalar_mode = E_SFmode;
+      break;
+    case MODE_V8DF:
+    case MODE_V4DF:
+    case MODE_V2DF:
+      scalar_mode = E_DFmode;
+      break;
+    case MODE_XI:
+    case MODE_OI:
+    case MODE_TI:
+      scalar_mode = GET_MODE_INNER (mode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (SCALAR_FLOAT_MODE_P (scalar_mode))
+    {
+      switch (scalar_mode)
+	{
+	case E_SFmode:
+	  if (size == 64 || !evex_reg_p || TARGET_AVX512VL)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
+	  else
+	    type = opcode_float;
+	  break;
+	case E_DFmode:
+	  if (size == 64 || !evex_reg_p || TARGET_AVX512VL)
+	    opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
+	  else
+	    type = opcode_double;
+	  break;
+	case E_TFmode:
+	  if (size == 64)
+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (evex_reg_p)
+	    {
+	      if (TARGET_AVX512VL)
+		opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	    }
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else if (SCALAR_INT_MODE_P (scalar_mode))
+    {
+      switch (scalar_mode)
+	{
+	case E_QImode:
+	  if (size == 64)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "vmovdqu64")
+		      : "vmovdqa64");
+	  else if (evex_reg_p)
+	    {
+	      if (TARGET_AVX512VL)
+		opcode = (misaligned_p
+			  ? (TARGET_AVX512BW
+			     ? "vmovdqu8"
+			     : "vmovdqu64")
+			  : "vmovdqa64");
+	    }
+	  else
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "%vmovdqu")
+		      : "%vmovdqa");
+	  break;
+	case E_HImode:
+	  if (size == 64)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "vmovdqu64")
+		      : "vmovdqa64");
+	  else if (evex_reg_p)
+	    {
+	      if (TARGET_AVX512VL)
+		opcode = (misaligned_p
+			  ? (TARGET_AVX512BW
+			     ? "vmovdqu16"
+			     : "vmovdqu64")
+			  : "vmovdqa64");
+	    }
+	  else
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovdqu")
+		      : "%vmovdqa");
+	  break;
+	case E_SImode:
+	  if (size == 64)
+	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  else if (evex_reg_p)
+	    {
+	      if (TARGET_AVX512VL)
+		opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	    }
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	case E_DImode:
+	case E_TImode:
+	case E_OImode:
+	  if (size == 64)
+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (evex_reg_p)
+	    {
+	      if (TARGET_AVX512VL)
+		opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	    }
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	case E_XImode:
+	  opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  if (!opcode)
+    {
+      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
+	 registers without AVX512VL by using zmm register move.  */
+      if (!evex_reg_p
+	  || TARGET_AVX512VL
+	  || memory_operand (operands[0], mode)
+	  || memory_operand (operands[1], mode))
+	gcc_unreachable ();
+      size = 64;
+      switch (type)
+	{
+	case opcode_int:
+	  opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  break;
+	case opcode_float:
+	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
+	  break;
+	case opcode_double:
+	  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
+	  break;
+	}
+    }
+
+  switch (size)
+    {
+    case 64:
+      snprintf (buf, sizeof (buf), "%s\t{%%g1, %%g0|%%g0, %%g1}",
+		opcode);
+      break;
+    case 32:
+      snprintf (buf, sizeof (buf), "%s\t{%%t1, %%t0|%%t0, %%t1}",
+		opcode);
+      break;
+    case 16:
+      snprintf (buf, sizeof (buf), "%s\t{%%x1, %%x0|%%x0, %%x1}",
+		opcode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  output_asm_insn (buf, operands);
+  return "";
+}
+
+/* Return the template of the TYPE_SSEMOV instruction to move
+   operands[1] into operands[0].  */
+
+const char *
+ix86_output_ssemov (rtx_insn *insn, rtx *operands)
+{
+  machine_mode mode = GET_MODE (operands[0]);
+  if (get_attr_type (insn) != TYPE_SSEMOV
+      || mode != GET_MODE (operands[1]))
+    gcc_unreachable ();
+
+  enum attr_mode insn_mode = get_attr_mode (insn);
+
+  switch (insn_mode)
+    {
+    case MODE_XI:
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return ix86_get_ssemov (operands, 64, insn_mode, mode);
+
+    case MODE_OI:
+    case MODE_V4DF:
+    case MODE_V8SF:
+      return ix86_get_ssemov (operands, 32, insn_mode, mode);
+
+    case MODE_TI:
+    case MODE_V2DF:
+    case MODE_V4SF:
+      return ix86_get_ssemov (operands, 16, insn_mode, mode);
+
+    case MODE_DI:
+      /* Handle broken assemblers that require movd instead of movq. */
+      if (!HAVE_AS_IX86_INTERUNIT_MOVQ
+	  && (GENERAL_REG_P (operands[0])
+	      || GENERAL_REG_P (operands[1])))
+	return "%vmovd\t{%1, %0|%0, %1}";
+      else
+	return "%vmovq\t{%1, %0|%0, %1}";
+
+    case MODE_V2SF:
+      if (TARGET_AVX && REG_P (operands[0]))
+	return "vmovlps\t{%1, %d0|%d0, %1}";
+      else
+	return "%vmovlps\t{%1, %0|%0, %1}";
+
+    case MODE_DF:
+      if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
+	return "vmovsd\t{%d1, %0|%0, %d1}";
+      else
+	return "%vmovsd\t{%1, %0|%0, %1}";
+
+    case MODE_V1DF:
+      gcc_assert (!TARGET_AVX);
+       return "movlpd\t{%1, %0|%0, %1}";
+
+    case MODE_SI:
+      return "%vmovd\t{%1, %0|%0, %1}";
+
+    case MODE_SF:
+      if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
+	return "vmovss\t{%d1, %0|%0, %d1}";
+      else
+	return "%vmovss\t{%1, %0|%0, %1}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Returns true if OP contains a symbol reference */
 
 bool
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ee1f138d1af..8f5902292c6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1013,98 +1013,7 @@ (define_insn "mov<mode>_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
-	 in avx512f, so we need to use workarounds, to access sse registers
-	 16-31, which are evex-only. In avx512vl we don't need workarounds.  */
-      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
-	  && (EXT_REX_SSE_REG_P (operands[0])
-	      || EXT_REX_SSE_REG_P (operands[1])))
-	{
-	  if (memory_operand (operands[0], <MODE>mode))
-	    {
-	      if (<MODE_SIZE> == 32)
-		return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
-	      else if (<MODE_SIZE> == 16)
-		return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
-	      else
-		gcc_unreachable ();
-	    }
-	  else if (memory_operand (operands[1], <MODE>mode))
-	    {
-	      if (<MODE_SIZE> == 32)
-		return "vbroadcast<shuffletype>64x4\t{%1, %g0|%g0, %1}";
-	      else if (<MODE_SIZE> == 16)
-		return "vbroadcast<shuffletype>32x4\t{%1, %g0|%g0, %1}";
-	      else
-		gcc_unreachable ();
-	    }
-	  else
-	    /* Reg -> reg move is always aligned.  Just use wider move.  */
-	    switch (get_attr_mode (insn))
-	      {
-	      case MODE_V8SF:
-	      case MODE_V4SF:
-		return "vmovaps\t{%g1, %g0|%g0, %g1}";
-	      case MODE_V4DF:
-	      case MODE_V2DF:
-		return "vmovapd\t{%g1, %g0|%g0, %g1}";
-	      case MODE_OI:
-	      case MODE_TI:
-		return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
-	      default:
-		gcc_unreachable ();
-	      }
-	}
-
-      switch (get_attr_mode (insn))
-	{
-	case MODE_V16SF:
-	case MODE_V8SF:
-	case MODE_V4SF:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return "%vmovups\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovaps\t{%1, %0|%0, %1}";
-
-	case MODE_V8DF:
-	case MODE_V4DF:
-	case MODE_V2DF:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return "%vmovupd\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovapd\t{%1, %0|%0, %1}";
-
-	case MODE_OI:
-	case MODE_TI:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return TARGET_AVX512VL
-		   && (<MODE>mode == V4SImode
-		       || <MODE>mode == V2DImode
-		       || <MODE>mode == V8SImode
-		       || <MODE>mode == V4DImode
-		       || TARGET_AVX512BW)
-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
-		   : "%vmovdqu\t{%1, %0|%0, %1}";
-	  else
-	    return TARGET_AVX512VL ? "vmovdqa64\t{%1, %0|%0, %1}"
-				   : "%vmovdqa\t{%1, %0|%0, %1}";
-	case MODE_XI:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return (<MODE>mode == V16SImode
-		    || <MODE>mode == V8DImode
-		    || TARGET_AVX512BW)
-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
-		   : "vmovdqu64\t{%1, %0|%0, %1}";
-	  else
-	    return "vmovdqa64\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -1113,10 +1022,7 @@ (define_insn "mov<mode>_internal"
   [(set_attr "type" "sselog1,sselog1,ssemov,ssemov")
    (set_attr "prefix" "maybe_vex")
    (set (attr "mode")
-	(cond [(and (eq_attr "alternative" "1")
-		    (match_test "TARGET_AVX512VL"))
-		 (const_string "<sseinsnmode>")
-	       (match_test "TARGET_AVX")
+	(cond [(match_test "TARGET_AVX")
 		 (const_string "<sseinsnmode>")
 	       (ior (not (match_test "TARGET_SSE2"))
 		    (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
index 14fe4b84544..db4d9d14875 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
@@ -4,14 +4,13 @@
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\\(\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\\(\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\\(\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\\(\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5,6}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\nxy\]*\\((?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5,6}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr89346.c b/gcc/testsuite/gcc.target/i386/pr89346.c
new file mode 100644
index 00000000000..cdc9accf521
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89346.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+#include <immintrin.h>
+
+long long *p;
+volatile __m256i y;
+
+void
+foo (void)
+{
+   _mm256_store_epi64 (p, y);
+}
+
+/* { dg-final { scan-assembler-not "vmovdqa64" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 03/10] i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 04/10] i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV H.J. Lu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode ymm16-ymm31 registers with and without AVX512VL.

	PR target/89229
	* config/i386/i386.md (*movoi_internal_avx): Call
	ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
	and TARGET_AVX512VL check.
---
 gcc/config/i386/i386.md | 26 ++------------------------
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b30e5a51edc..9e9b17d0913 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1925,21 +1925,7 @@ (define_insn "*movoi_internal_avx"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      if (misaligned_operand (operands[0], OImode)
-	  || misaligned_operand (operands[1], OImode))
-	{
-	  if (get_attr_mode (insn) == MODE_XI)
-	    return "vmovdqu32\t{%1, %0|%0, %1}";
-	  else
-	    return "vmovdqu\t{%1, %0|%0, %1}";
-	}
-      else
-	{
-	  if (get_attr_mode (insn) == MODE_XI)
-	    return "vmovdqa32\t{%1, %0|%0, %1}";
-	  else
-	    return "vmovdqa\t{%1, %0|%0, %1}";
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -1948,15 +1934,7 @@ (define_insn "*movoi_internal_avx"
   [(set_attr "isa" "*,avx2,*,*")
    (set_attr "type" "sselog1,sselog1,ssemov,ssemov")
    (set_attr "prefix" "vex")
-   (set (attr "mode")
-	(cond [(ior (match_operand 0 "ext_sse_reg_operand")
-		    (match_operand 1 "ext_sse_reg_operand"))
-		 (const_string "XI")
-	       (and (eq_attr "alternative" "1")
-		    (match_test "TARGET_AVX512VL"))
-		 (const_string "XI")
-	      ]
-	      (const_string "OI")))])
+   (set_attr "mode" "OI")])
 
 (define_insn "*movti_internal"
   [(set (match_operand:TI 0 "nonimmediate_operand" "=!r ,o ,v,v ,v ,m,?r,?Yd")
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 02/10] i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (7 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 01/10] i386: Properly encode vector registers in vector move H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 09/10] i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV H.J. Lu
  2020-02-24 12:55 ` PING^8: [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

	PR target/89229
	* config/i386/i386.md (*movxi_internal_avx512f): Call
	ix86_output_ssemov for TYPE_SSEMOV.
---
 gcc/config/i386/i386.md | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f14683cd14f..b30e5a51edc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1902,11 +1902,7 @@ (define_insn "*movxi_internal_avx512f"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      if (misaligned_operand (operands[0], XImode)
-	  || misaligned_operand (operands[1], XImode))
-	return "vmovdqu32\t{%1, %0|%0, %1}";
-      else
-	return "vmovdqa32\t{%1, %0|%0, %1}";
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/10] i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (5 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 07/10] i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 01/10] i386: Properly encode vector registers in vector move H.J. Lu
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

gcc/

	PR target/89229
	* config/i386/i386.md (*movsi_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.  Remove ext_sse_reg_operand and TARGET_AVX512VL
	check.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-4a.c: New test.
	* gcc.target/i386/pr89229-4b.c: Likewise.
	* gcc.target/i386/pr89229-4c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 25 ++--------------------
 gcc/testsuite/gcc.target/i386/pr89229-4a.c | 17 +++++++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-4b.c |  6 ++++++
 gcc/testsuite/gcc.target/i386/pr89229-4c.c |  7 ++++++
 4 files changed, 32 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 03d8078e957..05815c5cf3b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2261,25 +2261,7 @@ (define_insn "*movsi_internal"
       gcc_unreachable ();
 
     case TYPE_SSEMOV:
-      switch (get_attr_mode (insn))
-	{
-	case MODE_SI:
-          return "%vmovd\t{%1, %0|%0, %1}";
-	case MODE_TI:
-	  return "%vmovdqa\t{%1, %0|%0, %1}";
-	case MODE_XI:
-	  return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
-
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-
-	case MODE_SF:
-	  gcc_assert (!TARGET_AVX);
-          return "movss\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     case TYPE_MMX:
       return "pxor\t%0, %0";
@@ -2345,10 +2327,7 @@ (define_insn "*movsi_internal"
      (cond [(eq_attr "alternative" "2,3")
 	      (const_string "DI")
 	    (eq_attr "alternative" "8,9")
-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
-			  (match_operand 1 "ext_sse_reg_operand"))
-		       (const_string "XI")
-		     (match_test "TARGET_AVX")
+	      (cond [(match_test "TARGET_AVX")
 		       (const_string "TI")
 		     (ior (not (match_test "TARGET_SSE2"))
 			  (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4a.c b/gcc/testsuite/gcc.target/i386/pr89229-4a.c
new file mode 100644
index 00000000000..fd56f447016
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-4a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern int i;
+
+int
+foo1 (void)
+{
+  register int xmm16 __asm ("xmm16") = i;
+  asm volatile ("" : "+v" (xmm16));
+  register int xmm17 __asm ("xmm17") = xmm16;
+  asm volatile ("" : "+v" (xmm17));
+  return xmm17;
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4b.c b/gcc/testsuite/gcc.target/i386/pr89229-4b.c
new file mode 100644
index 00000000000..023e81253a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-4b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+#include "pr89229-4a.c"
+
+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4c.c b/gcc/testsuite/gcc.target/i386/pr89229-4c.c
new file mode 100644
index 00000000000..bb728082e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-4c.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-4a.c"
+
+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/10] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (2 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 05/10] i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 08/10] i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV H.J. Lu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

Remove ext_sse_reg_operand since it is no longer needed.

	PR target/89229
	* config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call
	ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
	check.
	* config/i386/predicates.md (ext_sse_reg_operand): Removed.
---
 gcc/config/i386/mmx.md        | 29 ++---------------------------
 gcc/config/i386/predicates.md |  5 -----
 2 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f695831b5b9..7d9db5d352c 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -118,29 +118,7 @@ (define_insn "*mov<mode>_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      switch (get_attr_mode (insn))
-	{
-	case MODE_DI:
-	  /* Handle broken assemblers that require movd instead of movq.  */
-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
-	    return "%vmovd\t{%1, %0|%0, %1}";
-	  return "%vmovq\t{%1, %0|%0, %1}";
-	case MODE_TI:
-	  return "%vmovdqa\t{%1, %0|%0, %1}";
-	case MODE_XI:
-	  return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
-
-	case MODE_V2SF:
-	  if (TARGET_AVX && REG_P (operands[0]))
-	    return "vmovlps\t{%1, %0, %0|%0, %0, %1}";
-	  return "%vmovlps\t{%1, %0|%0, %1}";
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -189,10 +167,7 @@ (define_insn "*mov<mode>_internal"
      (cond [(eq_attr "alternative" "2")
 	      (const_string "SI")
 	    (eq_attr "alternative" "11,12")
-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
-			  (match_operand 1 "ext_sse_reg_operand"))
-			(const_string "XI")
-		     (match_test "<MODE>mode == V2SFmode")
+	      (cond [(match_test "<MODE>mode == V2SFmode")
 		       (const_string "V4SF")
 		     (ior (not (match_test "TARGET_SSE2"))
 			  (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 1119366d54e..71f4cb1193c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -61,11 +61,6 @@ (define_predicate "sse_reg_operand"
   (and (match_code "reg")
        (match_test "SSE_REGNO_P (REGNO (op))")))
 
-;; True if the operand is an AVX-512 new register.
-(define_predicate "ext_sse_reg_operand"
-  (and (match_code "reg")
-       (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
-
 ;; Return true if op is a QImode register.
 (define_predicate "any_QIreg_operand"
   (and (match_code "reg")
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 09/10] i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (8 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 02/10] i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-24 12:55 ` PING^8: [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to V16SFmode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

gcc/

	PR target/89229
	* config/i386/i386.md (*movdf_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.  Remove TARGET_PREFER_AVX256, TARGET_AVX512VL
	and ext_sse_reg_operand check.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-7a.c: New test.
	* gcc.target/i386/pr89229-7b.c: Likewise.
	* gcc.target/i386/pr89229-7c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 26 ++--------------------
 gcc/testsuite/gcc.target/i386/pr89229-7a.c | 16 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-7b.c |  6 +++++
 gcc/testsuite/gcc.target/i386/pr89229-7c.c |  6 +++++
 4 files changed, 30 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 01892992adb..2dcf2d598c3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3469,24 +3469,7 @@ (define_insn "*movsf_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      switch (get_attr_mode (insn))
-	{
-	case MODE_SF:
-	  if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
-	    return "vmovss\t{%d1, %0|%0, %d1}";
-	  return "%vmovss\t{%1, %0|%0, %1}";
-
-	case MODE_V16SF:
-	  return "vmovaps\t{%g1, %g0|%g0, %g1}";
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-
-	case MODE_SI:
-	  return "%vmovd\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     case TYPE_MMXMOV:
       switch (get_attr_mode (insn))
@@ -3558,12 +3541,7 @@ (define_insn "*movsf_internal"
 		  better to maintain the whole registers in single format
 		  to avoid problems on using packed logical operations.  */
 	       (eq_attr "alternative" "6")
-		 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
-				  (not (match_test "TARGET_AVX512VL")))
-			     (ior (match_operand 0 "ext_sse_reg_operand")
-				  (match_operand 1 "ext_sse_reg_operand")))
-			  (const_string "V16SF")
-			(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+		 (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 			     (match_test "TARGET_SSE_SPLIT_REGS"))
 			  (const_string "V4SF")
 		       ]
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7a.c b/gcc/testsuite/gcc.target/i386/pr89229-7a.c
new file mode 100644
index 00000000000..856115b2f5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-7a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern float d;
+
+void
+foo1 (float x)
+{
+  register float xmm16 __asm ("xmm16") = x;
+  asm volatile ("" : "+v" (xmm16));
+  register float xmm17 __asm ("xmm17") = xmm16;
+  asm volatile ("" : "+v" (xmm17));
+  d = xmm17;
+}
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
new file mode 100644
index 00000000000..93d1e43770c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+#include "pr89229-7a.c"
+
+/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7c.c b/gcc/testsuite/gcc.target/i386/pr89229-7c.c
new file mode 100644
index 00000000000..e37ff2bf5bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-7c.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-7a.c"
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 08/10] i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (3 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 10/10] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 07/10] i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV H.J. Lu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode nor V8DFmode since
ix86_output_ssemov can properly encode xmm16-xmm31 registers with and
without AVX512VL.

gcc/

	PR target/89229
	* config/i386/i386.md (*movdf_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.  Remove TARGET_AVX512F, TARGET_PREFER_AVX256,
	TARGET_AVX512VL and ext_sse_reg_operand check.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-6a.c: New test.
	* gcc.target/i386/pr89229-6b.c: Likewise.
	* gcc.target/i386/pr89229-6c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 44 ++--------------------
 gcc/testsuite/gcc.target/i386/pr89229-6a.c | 16 ++++++++
 gcc/testsuite/gcc.target/i386/pr89229-6b.c |  7 ++++
 gcc/testsuite/gcc.target/i386/pr89229-6c.c |  6 +++
 4 files changed, 32 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fdf0e5a8802..01892992adb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3307,37 +3307,7 @@ (define_insn "*movdf_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      switch (get_attr_mode (insn))
-	{
-	case MODE_DF:
-	  if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
-	    return "vmovsd\t{%d1, %0|%0, %d1}";
-	  return "%vmovsd\t{%1, %0|%0, %1}";
-
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-	case MODE_V8DF:
-	  return "vmovapd\t{%g1, %g0|%g0, %g1}";
-	case MODE_V2DF:
-	  return "%vmovapd\t{%1, %0|%0, %1}";
-
-	case MODE_V2SF:
-	  gcc_assert (!TARGET_AVX);
-	  return "movlps\t{%1, %0|%0, %1}";
-	case MODE_V1DF:
-	  gcc_assert (!TARGET_AVX);
-	  return "movlpd\t{%1, %0|%0, %1}";
-
-	case MODE_DI:
-	  /* Handle broken assemblers that require movd instead of movq.  */
-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
-	    return "%vmovd\t{%1, %0|%0, %1}";
-	  return "%vmovq\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -3391,10 +3361,7 @@ (define_insn "*movdf_internal"
 
 	       /* xorps is one byte shorter for non-AVX targets.  */
 	       (eq_attr "alternative" "12,16")
-		 (cond [(and (match_test "TARGET_AVX512F")
-			     (not (match_test "TARGET_PREFER_AVX256")))
-			  (const_string "XI")
-			(match_test "TARGET_AVX")
+		 (cond [(match_test "TARGET_AVX")
 			  (const_string "V2DF")
 			(ior (not (match_test "TARGET_SSE2"))
 			     (match_test "optimize_function_for_size_p (cfun)"))
@@ -3410,12 +3377,7 @@ (define_insn "*movdf_internal"
 
 	       /* movaps is one byte shorter for non-AVX targets.  */
 	       (eq_attr "alternative" "13,17")
-		 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
-				  (not (match_test "TARGET_AVX512VL")))
-			     (ior (match_operand 0 "ext_sse_reg_operand")
-				  (match_operand 1 "ext_sse_reg_operand")))
-			  (const_string "V8DF")
-			(match_test "TARGET_AVX")
+		 (cond [(match_test "TARGET_AVX")
 			  (const_string "DF")
 			(ior (not (match_test "TARGET_SSE2"))
 			     (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6a.c b/gcc/testsuite/gcc.target/i386/pr89229-6a.c
new file mode 100644
index 00000000000..5bc10d25619
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-6a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern double d;
+
+void
+foo1 (double x)
+{
+  register double xmm16 __asm ("xmm16") = x;
+  asm volatile ("" : "+v" (xmm16));
+  register double xmm17 __asm ("xmm17") = xmm16;
+  asm volatile ("" : "+v" (xmm17));
+  d = xmm17;
+}
+
+/* { dg-final { scan-assembler-not "vmovapd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
new file mode 100644
index 00000000000..b248a3726f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+#include "pr89229-6a.c"
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-not "vmovapd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6c.c b/gcc/testsuite/gcc.target/i386/pr89229-6c.c
new file mode 100644
index 00000000000..7a4d254670c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-6c.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-6a.c"
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 07/10] i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (4 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 08/10] i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 06/10] i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV H.J. Lu
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

gcc/

	PR target/89229
	* config/i386/i386.md (*movtf_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-5a.c: New test.
	* gcc.target/i386/pr89229-5b.c: Likewise.
	* gcc.target/i386/pr89229-5c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 26 +---------------------
 gcc/testsuite/gcc.target/i386/pr89229-5a.c | 16 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-5b.c | 12 ++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-5c.c |  6 +++++
 4 files changed, 35 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 05815c5cf3b..fdf0e5a8802 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3154,31 +3154,7 @@ (define_insn "*movtf_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      /* Handle misaligned load/store since we
-         don't have movmisaligntf pattern. */
-      if (misaligned_operand (operands[0], TFmode)
-	  || misaligned_operand (operands[1], TFmode))
-	{
-	  if (get_attr_mode (insn) == MODE_V4SF)
-	    return "%vmovups\t{%1, %0|%0, %1}";
-	  else if (TARGET_AVX512VL
-		   && (EXT_REX_SSE_REG_P (operands[0])
-		       || EXT_REX_SSE_REG_P (operands[1])))
-	    return "vmovdqu64\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovdqu\t{%1, %0|%0, %1}";
-	}
-      else
-	{
-	  if (get_attr_mode (insn) == MODE_V4SF)
-	    return "%vmovaps\t{%1, %0|%0, %1}";
-	  else if (TARGET_AVX512VL
-		   && (EXT_REX_SSE_REG_P (operands[0])
-		       || EXT_REX_SSE_REG_P (operands[1])))
-	    return "vmovdqa64\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovdqa\t{%1, %0|%0, %1}";
-	}
+      return ix86_output_ssemov (insn, operands);
 
     case TYPE_MULTI:
 	return "#";
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5a.c b/gcc/testsuite/gcc.target/i386/pr89229-5a.c
new file mode 100644
index 00000000000..fcb85c366b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-5a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern __float128 d;
+
+void
+foo1 (__float128 x)
+{
+  register __float128 xmm16 __asm ("xmm16") = x;
+  asm volatile ("" : "+v" (xmm16));
+  register __float128 xmm17 __asm ("xmm17") = xmm16;
+  asm volatile ("" : "+v" (xmm17));
+  d = xmm17;
+}
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
new file mode 100644
index 00000000000..37eb83c783b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+extern __float128 d;
+
+void
+foo1 (__float128 x)
+{
+  register __float128 xmm16 __asm ("xmm16") = x; /* { dg-error "register specified for 'xmm16'" } */
+  asm volatile ("" : "+v" (xmm16));
+  d = xmm16;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5c.c b/gcc/testsuite/gcc.target/i386/pr89229-5c.c
new file mode 100644
index 00000000000..529a520133c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-5c.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-5a.c"
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 04/10] i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
  2020-02-15 15:26 ` [PATCH 03/10] i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 05/10] i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV H.J. Lu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

gcc/

	PR target/89229
	* config/i386/i386.md (*movti_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.  Remove ext_sse_reg_operand and TARGET_AVX512VL
	check.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-2a.c: New test.
	* gcc.target/i386/pr89229-2b.c: Likewise.
	* gcc.target/i386/pr89229-2c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 28 +---------------------
 gcc/testsuite/gcc.target/i386/pr89229-2a.c | 15 ++++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-2b.c | 13 ++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-2c.c |  6 +++++
 4 files changed, 35 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9e9b17d0913..5607d1ecddc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1955,27 +1955,7 @@ (define_insn "*movti_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      /* TDmode values are passed as TImode on the stack.  Moving them
-	 to stack may result in unaligned memory access.  */
-      if (misaligned_operand (operands[0], TImode)
-	  || misaligned_operand (operands[1], TImode))
-	{
-	  if (get_attr_mode (insn) == MODE_V4SF)
-	    return "%vmovups\t{%1, %0|%0, %1}";
-	  else if (get_attr_mode (insn) == MODE_XI)
-	    return "vmovdqu32\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovdqu\t{%1, %0|%0, %1}";
-	}
-      else
-	{
-	  if (get_attr_mode (insn) == MODE_V4SF)
-	    return "%vmovaps\t{%1, %0|%0, %1}";
-	  else if (get_attr_mode (insn) == MODE_XI)
-	    return "vmovdqa32\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovdqa\t{%1, %0|%0, %1}";
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -2002,12 +1982,6 @@ (define_insn "*movti_internal"
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "0,1")
 		 (const_string "DI")
-	       (ior (match_operand 0 "ext_sse_reg_operand")
-		    (match_operand 1 "ext_sse_reg_operand"))
-		 (const_string "XI")
-	       (and (eq_attr "alternative" "3")
-		    (match_test "TARGET_AVX512VL"))
-		 (const_string "XI")
 	       (match_test "TARGET_AVX")
 		 (const_string "TI")
 	       (ior (not (match_test "TARGET_SSE2"))
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2a.c b/gcc/testsuite/gcc.target/i386/pr89229-2a.c
new file mode 100644
index 00000000000..0cf78039481
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-2a.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+typedef __int128 __m128t __attribute__ ((__vector_size__ (16),
+					 __may_alias__));
+
+__m128t
+foo1 (void)
+{
+  register __int128 xmm16 __asm ("xmm16") = (__int128) -1;
+  asm volatile ("" : "+v" (xmm16));
+  return (__m128t) xmm16;
+}
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2b.c b/gcc/testsuite/gcc.target/i386/pr89229-2b.c
new file mode 100644
index 00000000000..8d5d6c41d30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-2b.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+typedef __int128 __m128t __attribute__ ((__vector_size__ (16),
+					 __may_alias__));
+
+__m128t
+foo1 (void)
+{
+  register __int128 xmm16 __asm ("xmm16") = (__int128) -1; /* { dg-error "register specified for 'xmm16'" } */
+  asm volatile ("" : "+v" (xmm16));
+  return (__m128t) xmm16;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2c.c b/gcc/testsuite/gcc.target/i386/pr89229-2c.c
new file mode 100644
index 00000000000..218da46dcd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-2c.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-2a.c"
+
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move
@ 2020-02-15 15:26 H.J. Lu
  2020-02-15 15:26 ` [PATCH 03/10] i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV H.J. Lu
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

This patch set was originally submitted in Feb 2019:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html

I broke it into 10 smaller patches for easy review.

On x86, when AVX and AVX512 are enabled, vector move instructions can
be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):

   0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
   4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2

We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
and 256-bit vector moves.  Mode attributes on x86 vector move patterns
indicate target preferences of vector move encoding.  For vector register
to vector register move, we can use 512-bit vector move instructions to
move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
if upper 16 vector registers aren't used.  This patch adds a function,
ix86_output_ssemov, to generate vector moves:

1. If zmm registers are used, use EVEX encoding.
2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
will be generated.
3. If xmm16-xmm31/ymm16-ymm31 registers are used:
   a. With AVX512VL, AVX512VL vector moves will be generated.
   b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
      move will be done with zmm register move.

Tested on AVX2 and AVX512 with and without --with-arch=native.

H.J. Lu (10):
  i386: Properly encode vector registers in vector move
  i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV

 gcc/config/i386/i386-protos.h                 |   2 +
 gcc/config/i386/i386.c                        | 274 ++++++++++++++++++
 gcc/config/i386/i386.md                       | 212 +-------------
 gcc/config/i386/mmx.md                        |  29 +-
 gcc/config/i386/predicates.md                 |   5 -
 gcc/config/i386/sse.md                        |  98 +------
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
 gcc/testsuite/gcc.target/i386/pr89229-2a.c    |  15 +
 gcc/testsuite/gcc.target/i386/pr89229-2b.c    |  13 +
 gcc/testsuite/gcc.target/i386/pr89229-2c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-3a.c    |  17 ++
 gcc/testsuite/gcc.target/i386/pr89229-3b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-3c.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-4a.c    |  17 ++
 gcc/testsuite/gcc.target/i386/pr89229-4b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-4c.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-5a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-5b.c    |  12 +
 gcc/testsuite/gcc.target/i386/pr89229-5c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-6a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-6b.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-6c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-7a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-7b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-7c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89346.c       |  15 +
 26 files changed, 497 insertions(+), 330 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c

-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 05/10] i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
  2020-02-15 15:26 ` [PATCH 03/10] i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV H.J. Lu
  2020-02-15 15:26 ` [PATCH 04/10] i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV H.J. Lu
@ 2020-02-15 15:26 ` H.J. Lu
  2020-02-15 15:26 ` [PATCH 10/10] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV H.J. Lu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-15 15:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

gcc/

	PR target/89229
	* config/i386/i386.md (*movdi_internal): Call ix86_output_ssemov
	for TYPE_SSEMOV.  Remove ext_sse_reg_operand and TARGET_AVX512VL
	check.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-3a.c: New test.
	* gcc.target/i386/pr89229-3b.c: Likewise.
	* gcc.target/i386/pr89229-3c.c: Likewise.
---
 gcc/config/i386/i386.md                    | 31 ++--------------------
 gcc/testsuite/gcc.target/i386/pr89229-3a.c | 17 ++++++++++++
 gcc/testsuite/gcc.target/i386/pr89229-3b.c |  6 +++++
 gcc/testsuite/gcc.target/i386/pr89229-3c.c |  7 +++++
 4 files changed, 32 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5607d1ecddc..03d8078e957 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2054,31 +2054,7 @@ (define_insn "*movdi_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      switch (get_attr_mode (insn))
-	{
-	case MODE_DI:
-	  /* Handle broken assemblers that require movd instead of movq.  */
-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
-	    return "%vmovd\t{%1, %0|%0, %1}";
-	  return "%vmovq\t{%1, %0|%0, %1}";
-
-	case MODE_TI:
-	  /* Handle AVX512 registers set.  */
-	  if (EXT_REX_SSE_REG_P (operands[0])
-	      || EXT_REX_SSE_REG_P (operands[1]))
-	    return "vmovdqa64\t{%1, %0|%0, %1}";
-	  return "%vmovdqa\t{%1, %0|%0, %1}";
-
-	case MODE_V2SF:
-	  gcc_assert (!TARGET_AVX);
-	  return "movlps\t{%1, %0|%0, %1}";
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     case TYPE_SSECVT:
       if (SSE_REG_P (operands[0]))
@@ -2164,10 +2140,7 @@ (define_insn "*movdi_internal"
      (cond [(eq_attr "alternative" "2")
 	      (const_string "SI")
 	    (eq_attr "alternative" "12,13")
-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
-			  (match_operand 1 "ext_sse_reg_operand"))
-		       (const_string "TI")
-		     (match_test "TARGET_AVX")
+	      (cond [(match_test "TARGET_AVX")
 		       (const_string "TI")
 		     (ior (not (match_test "TARGET_SSE2"))
 			  (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3a.c b/gcc/testsuite/gcc.target/i386/pr89229-3a.c
new file mode 100644
index 00000000000..cb9b071e873
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-3a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+extern long long i;
+
+long long
+foo1 (void)
+{
+  register long long xmm16 __asm ("xmm16") = i;
+  asm volatile ("" : "+v" (xmm16));
+  register long long xmm17 __asm ("xmm17") = xmm16;
+  asm volatile ("" : "+v" (xmm17));
+  return xmm17;
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3b.c b/gcc/testsuite/gcc.target/i386/pr89229-3b.c
new file mode 100644
index 00000000000..9265fc0354b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-3b.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
+
+#include "pr89229-3a.c"
+
+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3c.c b/gcc/testsuite/gcc.target/i386/pr89229-3c.c
new file mode 100644
index 00000000000..be0ca78a37e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89229-3c.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
+
+#include "pr89229-3a.c"
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
-- 
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* PING^8: [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move
  2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
                   ` (9 preceding siblings ...)
  2020-02-15 15:26 ` [PATCH 09/10] i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV H.J. Lu
@ 2020-02-24 12:55 ` H.J. Lu
  10 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-24 12:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek, Jeffrey Law, Jan Hubicka, Uros Bizjak

On Sat, Feb 15, 2020 at 7:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> This patch set was originally submitted in Feb 2019:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html
>
> I broke it into 10 smaller patches for easy review.
>
> On x86, when AVX and AVX512 are enabled, vector move instructions can
> be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
>
>    0:   c5 f9 6f d1             vmovdqa %xmm1,%xmm2
>    4:   62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
>
> We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> indicate target preferences of vector move encoding.  For vector register
> to vector register move, we can use 512-bit vector move instructions to
> move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> if upper 16 vector registers aren't used.  This patch adds a function,
> ix86_output_ssemov, to generate vector moves:
>
> 1. If zmm registers are used, use EVEX encoding.
> 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> will be generated.
> 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
>    a. With AVX512VL, AVX512VL vector moves will be generated.
>    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>       move will be done with zmm register move.
>
> Tested on AVX2 and AVX512 with and without --with-arch=native.
>
> H.J. Lu (10):
>   i386: Properly encode vector registers in vector move
>   i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV
>
>  gcc/config/i386/i386-protos.h                 |   2 +
>  gcc/config/i386/i386.c                        | 274 ++++++++++++++++++
>  gcc/config/i386/i386.md                       | 212 +-------------
>  gcc/config/i386/mmx.md                        |  29 +-
>  gcc/config/i386/predicates.md                 |   5 -
>  gcc/config/i386/sse.md                        |  98 +------
>  .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
>  gcc/testsuite/gcc.target/i386/pr89229-2a.c    |  15 +
>  gcc/testsuite/gcc.target/i386/pr89229-2b.c    |  13 +
>  gcc/testsuite/gcc.target/i386/pr89229-2c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-3a.c    |  17 ++
>  gcc/testsuite/gcc.target/i386/pr89229-3b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-3c.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-4a.c    |  17 ++
>  gcc/testsuite/gcc.target/i386/pr89229-4b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-4c.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-5a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-5b.c    |  12 +
>  gcc/testsuite/gcc.target/i386/pr89229-5c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-6a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-6b.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-6c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-7a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-7b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-7c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89346.c       |  15 +
>  26 files changed, 497 insertions(+), 330 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c
>

PING:

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00906.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-15 15:26 ` [PATCH 01/10] i386: Properly encode vector registers in vector move H.J. Lu
@ 2020-02-26 22:42   ` Jeff Law
  2020-02-27  0:03     ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2020-02-26 22:42 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches; +Cc: Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> On x86, when AVX and AVX512 are enabled, vector move instructions can
> be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> 
>    0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
>    4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2
> 
> We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> indicate target preferences of vector move encoding.  For vector register
> to vector register move, we can use 512-bit vector move instructions to
> move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> if upper 16 vector registers aren't used.  This patch adds a function,
> ix86_output_ssemov, to generate vector moves:
> 
> 1. If zmm registers are used, use EVEX encoding.
> 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> will be generated.
> 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
>    a. With AVX512VL, AVX512VL vector moves will be generated.
>    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>       move will be done with zmm register move.
> 
> 
[ ... ]

>  
> +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> +   TARGET_AVX512VL or it is a register to register move which can
> +   be done with zmm register move. */
> +
> +static const char *
> +ix86_get_ssemov (rtx *operands, unsigned size,
> +		 enum attr_mode insn_mode, machine_mode mode)
> +{
> +  char buf[128];
> +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> +		       || misaligned_operand (operands[1], mode));
> +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> +		     || EXT_REX_SSE_REG_P (operands[1]));
> +  machine_mode scalar_mode;
> +
> +  else if (SCALAR_INT_MODE_P (scalar_mode))
> +    {
> +      switch (scalar_mode)
> +	{
> +	case E_QImode:
> +	  if (size == 64)
> +	    opcode = (misaligned_p
> +		      ? (TARGET_AVX512BW
> +			 ? "vmovdqu8"
> +			 : "vmovdqu64")
> +		      : "vmovdqa64");
> +	  else if (evex_reg_p)
> +	    {
> +	      if (TARGET_AVX512VL)
> +		opcode = (misaligned_p
> +			  ? (TARGET_AVX512BW
> +			     ? "vmovdqu8"
> +			     : "vmovdqu64")
> +			  : "vmovdqa64");
> +	    }
> +	  else
> +	    opcode = (misaligned_p
> +		      ? (TARGET_AVX512BW
> +			 ? "vmovdqu8"
> +			 : "%vmovdqu")
> +		      : "%vmovdqa");
> +	  break;
> +	case E_HImode:
> +	  if (size == 64)
> +	    opcode = (misaligned_p
> +		      ? (TARGET_AVX512BW
> +			 ? "vmovdqu16"
> +			 : "vmovdqu64")
> +		      : "vmovdqa64");
> +	  else if (evex_reg_p)
> +	    {
> +	      if (TARGET_AVX512VL)
> +		opcode = (misaligned_p
> +			  ? (TARGET_AVX512BW
> +			     ? "vmovdqu16"
> +			     : "vmovdqu64")
> +			  : "vmovdqa64");
> +	    }
> +	  else
> +	    opcode = (misaligned_p
> +		      ? (TARGET_AVX512BW
> +			 ? "vmovdqu16"
> +			 : "%vmovdqu")
> +		      : "%vmovdqa");
> +	  break;
> +	case E_SImode:
> +	  if (size == 64)
> +	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +	  else if (evex_reg_p)
> +	    {
> +	      if (TARGET_AVX512VL)
> +		opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +	    }
> +	  else
> +	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +	  break;
> +	case E_DImode:
> +	case E_TImode:
> +	case E_OImode:
> +	  if (size == 64)
> +	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +	  else if (evex_reg_p)
> +	    {
> +	      if (TARGET_AVX512VL)
> +		opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +	    }
> +	  else
> +	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +	  break;
> +	case E_XImode:
> +	  opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +    }
> +  else
> +    gcc_unreachable ();
> +
> +  if (!opcode)
> +    {
> +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> +	 registers without AVX512VL by using zmm register move.  */
So the overall flow control in here is rather convoluted.  I hate the way you
don't set OPCODE above and then do it down here.  I would suggest breaking 
the !opcode bits into its own little function.  Then above in those places
where you do

if (TARGET_AVX512VL)
   opcode = <whatever>;


Instead change those to something like

if (TARGET_AVX512VL)
   opcode = <whatever>;
else
   opcode = new_function (...)

That way opcode is set on every path through the major if-else in this
function.

Second when I suggested you break the patch up on a per-pattern basis, I
probably should have also said that I would start with the minimal support in
ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
converted.  That way the mapping from current code to new code is more obvious.
 

As it stands the breaking into separate patches didn't really help much because
we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov in
patch #1 and that's the code I'm most worried about verifying we get right,
particularly at this stage.  I literally can't take any patch and map from the
old code to the new code without having to understand all of patch #1.



Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-26 22:42   ` Jeff Law
@ 2020-02-27  0:03     ` H.J. Lu
  2020-02-27  0:24       ` Jeff Law
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-02-27  0:03 UTC (permalink / raw)
  To: Jeffrey Law; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <law@redhat.com> wrote:
>
> On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> >
> >    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
> >    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> >
> > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > indicate target preferences of vector move encoding.  For vector register
> > to vector register move, we can use 512-bit vector move instructions to
> > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > if upper 16 vector registers aren't used.  This patch adds a function,
> > ix86_output_ssemov, to generate vector moves:
> >
> > 1. If zmm registers are used, use EVEX encoding.
> > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > will be generated.
> > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> >    a. With AVX512VL, AVX512VL vector moves will be generated.
> >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> >       move will be done with zmm register move.
> >
> >
> [ ... ]
>
> >
> > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > +   TARGET_AVX512VL or it is a register to register move which can
> > +   be done with zmm register move. */
> > +
> > +static const char *
> > +ix86_get_ssemov (rtx *operands, unsigned size,
> > +              enum attr_mode insn_mode, machine_mode mode)
> > +{
> > +  char buf[128];
> > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > +                    || misaligned_operand (operands[1], mode));
> > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > +                  || EXT_REX_SSE_REG_P (operands[1]));
> > +  machine_mode scalar_mode;
> > +
> > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > +    {
> > +      switch (scalar_mode)
> > +     {
> > +     case E_QImode:
> > +       if (size == 64)
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu8"
> > +                      : "vmovdqu64")
> > +                   : "vmovdqa64");
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = (misaligned_p
> > +                       ? (TARGET_AVX512BW
> > +                          ? "vmovdqu8"
> > +                          : "vmovdqu64")
> > +                       : "vmovdqa64");
> > +         }
> > +       else
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu8"
> > +                      : "%vmovdqu")
> > +                   : "%vmovdqa");
> > +       break;
> > +     case E_HImode:
> > +       if (size == 64)
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu16"
> > +                      : "vmovdqu64")
> > +                   : "vmovdqa64");
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = (misaligned_p
> > +                       ? (TARGET_AVX512BW
> > +                          ? "vmovdqu16"
> > +                          : "vmovdqu64")
> > +                       : "vmovdqa64");
> > +         }
> > +       else
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu16"
> > +                      : "%vmovdqu")
> > +                   : "%vmovdqa");
> > +       break;
> > +     case E_SImode:
> > +       if (size == 64)
> > +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > +         }
> > +       else
> > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +       break;
> > +     case E_DImode:
> > +     case E_TImode:
> > +     case E_OImode:
> > +       if (size == 64)
> > +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +         }
> > +       else
> > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +       break;
> > +     case E_XImode:
> > +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +       break;
> > +     default:
> > +       gcc_unreachable ();
> > +     }
> > +    }
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  if (!opcode)
> > +    {
> > +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> > +      registers without AVX512VL by using zmm register move.  */
> So the overall flow control in here is rather convoluted.  I hate the way you
> don't set OPCODE above and then do it down here.  I would suggest breaking
> the !opcode bits into its own little function.  Then above in those places
> where you do
>
> if (TARGET_AVX512VL)
>    opcode = <whatever>;
>
>
> Instead change those to something like
>
> if (TARGET_AVX512VL)
>    opcode = <whatever>;
> else
>    opcode = new_function (...)
>
> That way opcode is set on every path through the major if-else in this
> function.
>
> Second when I suggested you break the patch up on a per-pattern basis, I
> probably should have also said that I would start with the minimal support in
> ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
> converted.  That way the mapping from current code to new code is more obvious.

I will do these.   On x86,  different instructions can move vector
registers.  They all
do the same thing.  But some are preferred over others, depending on
tuning options.

>
> As it stands the breaking into separate patches didn't really help much because
> we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov in
> patch #1 and that's the code I'm most worried about verifying we get right,
> particularly at this stage.  I literally can't take any patch and map from the
> old code to the new code without having to understand all of patch #1.

The old code is very convoluted and wrong in some cases.  I am trying to
clean it up.  I will update my patches based on your feedback.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-27  0:03     ` H.J. Lu
@ 2020-02-27  0:24       ` Jeff Law
  2020-02-27 14:51         ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2020-02-27  0:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote:
> On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <law@redhat.com> wrote:
> > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> > > 
> > >    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
> > >    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> > > 
> > > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > > indicate target preferences of vector move encoding.  For vector register
> > > to vector register move, we can use 512-bit vector move instructions to
> > > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > > if upper 16 vector registers aren't used.  This patch adds a function,
> > > ix86_output_ssemov, to generate vector moves:
> > > 
> > > 1. If zmm registers are used, use EVEX encoding.
> > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > > will be generated.
> > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > >    a. With AVX512VL, AVX512VL vector moves will be generated.
> > >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > >       move will be done with zmm register move.
> > > 
> > > 
> > [ ... ]
> > 
> > > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > > +   TARGET_AVX512VL or it is a register to register move which can
> > > +   be done with zmm register move. */
> > > +
> > > +static const char *
> > > +ix86_get_ssemov (rtx *operands, unsigned size,
> > > +              enum attr_mode insn_mode, machine_mode mode)
> > > +{
> > > +  char buf[128];
> > > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > > +                    || misaligned_operand (operands[1], mode));
> > > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > > +                  || EXT_REX_SSE_REG_P (operands[1]));
> > > +  machine_mode scalar_mode;
> > > +
> > > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > > +    {
> > > +      switch (scalar_mode)
> > > +     {
> > > +     case E_QImode:
> > > +       if (size == 64)
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu8"
> > > +                      : "vmovdqu64")
> > > +                   : "vmovdqa64");
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = (misaligned_p
> > > +                       ? (TARGET_AVX512BW
> > > +                          ? "vmovdqu8"
> > > +                          : "vmovdqu64")
> > > +                       : "vmovdqa64");
> > > +         }
> > > +       else
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu8"
> > > +                      : "%vmovdqu")
> > > +                   : "%vmovdqa");
> > > +       break;
> > > +     case E_HImode:
> > > +       if (size == 64)
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu16"
> > > +                      : "vmovdqu64")
> > > +                   : "vmovdqa64");
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = (misaligned_p
> > > +                       ? (TARGET_AVX512BW
> > > +                          ? "vmovdqu16"
> > > +                          : "vmovdqu64")
> > > +                       : "vmovdqa64");
> > > +         }
> > > +       else
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu16"
> > > +                      : "%vmovdqu")
> > > +                   : "%vmovdqa");
> > > +       break;
> > > +     case E_SImode:
> > > +       if (size == 64)
> > > +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > +         }
> > > +       else
> > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > +       break;
> > > +     case E_DImode:
> > > +     case E_TImode:
> > > +     case E_OImode:
> > > +       if (size == 64)
> > > +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +         }
> > > +       else
> > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > +       break;
> > > +     case E_XImode:
> > > +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +       break;
> > > +     default:
> > > +       gcc_unreachable ();
> > > +     }
> > > +    }
> > > +  else
> > > +    gcc_unreachable ();
> > > +
> > > +  if (!opcode)
> > > +    {
> > > +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> > > +      registers without AVX512VL by using zmm register move.  */
> > So the overall flow control in here is rather convoluted.  I hate the way
> > you
> > don't set OPCODE above and then do it down here.  I would suggest breaking
> > the !opcode bits into its own little function.  Then above in those places
> > where you do
> > 
> > if (TARGET_AVX512VL)
> >    opcode = <whatever>;
> > 
> > 
> > Instead change those to something like
> > 
> > if (TARGET_AVX512VL)
> >    opcode = <whatever>;
> > else
> >    opcode = new_function (...)
> > 
> > That way opcode is set on every path through the major if-else in this
> > function.
> > 
> > Second when I suggested you break the patch up on a per-pattern basis, I
> > probably should have also said that I would start with the minimal support
> > in
> > ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
> > converted.  That way the mapping from current code to new code is more
> > obvious.
> 
> I will do these.   On x86,  different instructions can move vector
> registers.  They all
> do the same thing.  But some are preferred over others, depending on
> tuning options.
I know.

> 
> > As it stands the breaking into separate patches didn't really help much
> > because
> > we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov
> > in
> > patch #1 and that's the code I'm most worried about verifying we get right,
> > particularly at this stage.  I literally can't take any patch and map from
> > the
> > old code to the new code without having to understand all of patch #1.
> 
> The old code is very convoluted and wrong in some cases.  I am trying to
> clean it up.  I will update my patches based on your feedback.
Thanks.  I was going to try and break those two functions down on my own, but
you're more likely to get it right than I am :-)

jeff
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-27  0:24       ` Jeff Law
@ 2020-02-27 14:51         ` H.J. Lu
  2020-02-29  2:16           ` Jeff Law
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-02-27 14:51 UTC (permalink / raw)
  To: Jeffrey Law; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 7757 bytes --]

On Wed, Feb 26, 2020 at 4:24 PM Jeff Law <law@redhat.com> wrote:
>
> On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote:
> > On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <law@redhat.com> wrote:
> > > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > > > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> > > >
> > > >    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
> > > >    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> > > >
> > > > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > > > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > > > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > > > indicate target preferences of vector move encoding.  For vector register
> > > > to vector register move, we can use 512-bit vector move instructions to
> > > > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > > > if upper 16 vector registers aren't used.  This patch adds a function,
> > > > ix86_output_ssemov, to generate vector moves:
> > > >
> > > > 1. If zmm registers are used, use EVEX encoding.
> > > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > > > will be generated.
> > > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > > >    a. With AVX512VL, AVX512VL vector moves will be generated.
> > > >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > > >       move will be done with zmm register move.
> > > >
> > > >
> > > [ ... ]
> > >
> > > > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > > > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > > > +   TARGET_AVX512VL or it is a register to register move which can
> > > > +   be done with zmm register move. */
> > > > +
> > > > +static const char *
> > > > +ix86_get_ssemov (rtx *operands, unsigned size,
> > > > +              enum attr_mode insn_mode, machine_mode mode)
> > > > +{
> > > > +  char buf[128];
> > > > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > > > +                    || misaligned_operand (operands[1], mode));
> > > > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > > > +                  || EXT_REX_SSE_REG_P (operands[1]));
> > > > +  machine_mode scalar_mode;
> > > > +
> > > > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > > > +    {
> > > > +      switch (scalar_mode)
> > > > +     {
> > > > +     case E_QImode:
> > > > +       if (size == 64)
> > > > +         opcode = (misaligned_p
> > > > +                   ? (TARGET_AVX512BW
> > > > +                      ? "vmovdqu8"
> > > > +                      : "vmovdqu64")
> > > > +                   : "vmovdqa64");
> > > > +       else if (evex_reg_p)
> > > > +         {
> > > > +           if (TARGET_AVX512VL)
> > > > +             opcode = (misaligned_p
> > > > +                       ? (TARGET_AVX512BW
> > > > +                          ? "vmovdqu8"
> > > > +                          : "vmovdqu64")
> > > > +                       : "vmovdqa64");
> > > > +         }
> > > > +       else
> > > > +         opcode = (misaligned_p
> > > > +                   ? (TARGET_AVX512BW
> > > > +                      ? "vmovdqu8"
> > > > +                      : "%vmovdqu")
> > > > +                   : "%vmovdqa");
> > > > +       break;
> > > > +     case E_HImode:
> > > > +       if (size == 64)
> > > > +         opcode = (misaligned_p
> > > > +                   ? (TARGET_AVX512BW
> > > > +                      ? "vmovdqu16"
> > > > +                      : "vmovdqu64")
> > > > +                   : "vmovdqa64");
> > > > +       else if (evex_reg_p)
> > > > +         {
> > > > +           if (TARGET_AVX512VL)
> > > > +             opcode = (misaligned_p
> > > > +                       ? (TARGET_AVX512BW
> > > > +                          ? "vmovdqu16"
> > > > +                          : "vmovdqu64")
> > > > +                       : "vmovdqa64");
> > > > +         }
> > > > +       else
> > > > +         opcode = (misaligned_p
> > > > +                   ? (TARGET_AVX512BW
> > > > +                      ? "vmovdqu16"
> > > > +                      : "%vmovdqu")
> > > > +                   : "%vmovdqa");
> > > > +       break;
> > > > +     case E_SImode:
> > > > +       if (size == 64)
> > > > +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > > +       else if (evex_reg_p)
> > > > +         {
> > > > +           if (TARGET_AVX512VL)
> > > > +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > > +         }
> > > > +       else
> > > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > > +       break;
> > > > +     case E_DImode:
> > > > +     case E_TImode:
> > > > +     case E_OImode:
> > > > +       if (size == 64)
> > > > +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > > +       else if (evex_reg_p)
> > > > +         {
> > > > +           if (TARGET_AVX512VL)
> > > > +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > > +         }
> > > > +       else
> > > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > > +       break;
> > > > +     case E_XImode:
> > > > +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > > +       break;
> > > > +     default:
> > > > +       gcc_unreachable ();
> > > > +     }
> > > > +    }
> > > > +  else
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  if (!opcode)
> > > > +    {
> > > > +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> > > > +      registers without AVX512VL by using zmm register move.  */
> > > So the overall flow control in here is rather convoluted.  I hate the way
> > > you
> > > don't set OPCODE above and then do it down here.  I would suggest breaking
> > > the !opcode bits into its own little function.  Then above in those places
> > > where you do
> > >
> > > if (TARGET_AVX512VL)
> > >    opcode = <whatever>;
> > >
> > >
> > > Instead change those to something like
> > >
> > > if (TARGET_AVX512VL)
> > >    opcode = <whatever>;
> > > else
> > >    opcode = new_function (...)
> > >
> > > That way opcode is set on every path through the major if-else in this
> > > function.
> > >
> > > Second when I suggested you break the patch up on a per-pattern basis, I
> > > probably should have also said that I would start with the minimal support
> > > in
> > > ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
> > > converted.  That way the mapping from current code to new code is more
> > > obvious.
> >
> > I will do these.   On x86,  different instructions can move vector
> > registers.  They all
> > do the same thing.  But some are preferred over others, depending on
> > tuning options.
> I know.
>
> >
> > > As it stands the breaking into separate patches didn't really help much
> > > because
> > > we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov
> > > in
> > > patch #1 and that's the code I'm most worried about verifying we get right,
> > > particularly at this stage.  I literally can't take any patch and map from
> > > the
> > > old code to the new code without having to understand all of patch #1.
> >
> > The old code is very convoluted and wrong in some cases.  I am trying to
> > clean it up.  I will update my patches based on your feedback.
> Thanks.  I was going to try and break those two functions down on my own, but
> you're more likely to get it right than I am :-)
>

How about this?  If it looks OK, I will post the whole patch set.

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-i386-Properly-encode-vector-registers-in-vector-move.patch --]
[-- Type: text/x-patch, Size: 14902 bytes --]

From 3964b63d5ef086fa7466992f703bc1ec6de085dc Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 12 Feb 2019 13:25:41 -0800
Subject: [PATCH 01/10] i386: Properly encode vector registers in vector move

On x86, when AVX and AVX512 are enabled, vector move instructions can
be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):

   0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
   4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2

We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
and 256-bit vector moves.  Mode attributes on x86 vector move patterns
indicate target preferences of vector move encoding.  For vector register
to vector register move, we can use 512-bit vector move instructions to
move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
if upper 16 vector registers aren't used.  This patch adds a function,
ix86_output_ssemov, to generate vector moves:

1. If zmm registers are used, use EVEX encoding.
2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
will be generated.
3. If xmm16-xmm31/ymm16-ymm31 registers are used:
   a. With AVX512VL, AVX512VL vector moves will be generated.
   b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
      move will be done with zmm register move.

Tested on AVX2 and AVX512 with and without --with-arch=native.

gcc/

	PR target/89229
	PR target/89346
	* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
	* config/i386/i386.c (ix86_get_ssemov): New function.
	(ix86_output_ssemov): Likewise.
	* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
	ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
	check.

gcc/testsuite/

	PR target/89229
	PR target/89346
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
	* gcc.target/i386/pr89346.c: New test.
---
 gcc/config/i386/i386-protos.h                 |   2 +
 gcc/config/i386/i386.c                        | 203 ++++++++++++++++++
 gcc/config/i386/sse.md                        |  98 +--------
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
 gcc/testsuite/gcc.target/i386/pr89346.c       |  15 ++
 5 files changed, 225 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 266381ca5a6..39fcaa0ad5f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -38,6 +38,8 @@ extern void ix86_expand_split_stack_prologue (void);
 extern void ix86_output_addr_vec_elt (FILE *, int);
 extern void ix86_output_addr_diff_elt (FILE *, int, int);
 
+extern const char *ix86_output_ssemov (rtx_insn *, rtx *);
+
 extern enum calling_abi ix86_cfun_abi (void);
 extern enum calling_abi ix86_function_type_abi (const_tree);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dac7a3fc5fd..4602149e10c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4915,6 +4915,209 @@ ix86_pre_reload_split (void)
 	  && !(cfun->curr_properties & PROP_rtl_split_insns));
 }
 
+/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
+   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
+   TARGET_AVX512VL or it is a register to register move which can
+   be done with zmm register move. */
+
+static const char *
+ix86_get_ssemov (rtx *operands, unsigned size,
+		 enum attr_mode insn_mode, machine_mode mode)
+{
+  char buf[128];
+  bool misaligned_p = (misaligned_operand (operands[0], mode)
+		       || misaligned_operand (operands[1], mode));
+  bool evex_reg_p = (size == 64
+		     || EXT_REX_SSE_REG_P (operands[0])
+		     || EXT_REX_SSE_REG_P (operands[1]));
+  machine_mode scalar_mode;
+
+  const char *opcode = NULL;
+  enum
+    {
+      opcode_int,
+      opcode_float,
+      opcode_double
+    } type = opcode_int;
+
+  switch (insn_mode)
+    {
+    case MODE_V16SF:
+    case MODE_V8SF:
+    case MODE_V4SF:
+      scalar_mode = E_SFmode;
+      type = opcode_float;
+      break;
+    case MODE_V8DF:
+    case MODE_V4DF:
+    case MODE_V2DF:
+      scalar_mode = E_DFmode;
+      type = opcode_double;
+      break;
+    case MODE_XI:
+    case MODE_OI:
+    case MODE_TI:
+      scalar_mode = GET_MODE_INNER (mode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
+     we can only use zmm register move without memory operand.  */
+   if (evex_reg_p
+       && !TARGET_AVX512VL
+       && GET_MODE_SIZE (mode) < 64)
+     {
+       if (memory_operand (operands[0], mode)
+	   || memory_operand (operands[1], mode))
+	gcc_unreachable ();
+      size = 64;
+      switch (type)
+	{
+	case opcode_int:
+	  opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  break;
+	case opcode_float:
+	  opcode = misaligned_p ? "vmovups" : "vmovaps";
+	  break;
+	case opcode_double:
+	  opcode = misaligned_p ? "vmovupd" : "vmovapd";
+	  break;
+	}
+     }
+   else if (SCALAR_FLOAT_MODE_P (scalar_mode))
+    {
+      switch (scalar_mode)
+	{
+	case E_SFmode:
+	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
+	  break;
+	case E_DFmode:
+	  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
+	  break;
+	case E_TFmode:
+	  if (evex_reg_p)
+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else if (SCALAR_INT_MODE_P (scalar_mode))
+    {
+      switch (scalar_mode)
+	{
+	case E_QImode:
+	  if (evex_reg_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "vmovdqu64")
+		      : "vmovdqa64");
+	  else
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "%vmovdqu")
+		      : "%vmovdqa");
+	  break;
+	case E_HImode:
+	  if (evex_reg_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "vmovdqu64")
+		      : "vmovdqa64");
+	  else
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovdqu")
+		      : "%vmovdqa");
+	  break;
+	case E_SImode:
+	  if (evex_reg_p)
+	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	case E_DImode:
+	case E_TImode:
+	case E_OImode:
+	  if (evex_reg_p)
+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else
+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	  break;
+	case E_XImode:
+	  opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  switch (size)
+    {
+    case 64:
+      snprintf (buf, sizeof (buf), "%s\t{%%g1, %%g0|%%g0, %%g1}",
+		opcode);
+      break;
+    case 32:
+      snprintf (buf, sizeof (buf), "%s\t{%%t1, %%t0|%%t0, %%t1}",
+		opcode);
+      break;
+    case 16:
+      snprintf (buf, sizeof (buf), "%s\t{%%x1, %%x0|%%x0, %%x1}",
+		opcode);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  output_asm_insn (buf, operands);
+  return "";
+}
+
+/* Return the template of the TYPE_SSEMOV instruction to move
+   operands[1] into operands[0].  */
+
+const char *
+ix86_output_ssemov (rtx_insn *insn, rtx *operands)
+{
+  machine_mode mode = GET_MODE (operands[0]);
+  if (get_attr_type (insn) != TYPE_SSEMOV
+      || mode != GET_MODE (operands[1]))
+    gcc_unreachable ();
+
+  enum attr_mode insn_mode = get_attr_mode (insn);
+
+  switch (insn_mode)
+    {
+    case MODE_XI:
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return ix86_get_ssemov (operands, 64, insn_mode, mode);
+
+    case MODE_OI:
+    case MODE_V4DF:
+    case MODE_V8SF:
+      return ix86_get_ssemov (operands, 32, insn_mode, mode);
+
+    case MODE_TI:
+    case MODE_V2DF:
+    case MODE_V4SF:
+      return ix86_get_ssemov (operands, 16, insn_mode, mode);
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Returns true if OP contains a symbol reference */
 
 bool
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ee1f138d1af..8f5902292c6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1013,98 +1013,7 @@ (define_insn "mov<mode>_internal"
       return standard_sse_constant_opcode (insn, operands);
 
     case TYPE_SSEMOV:
-      /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
-	 in avx512f, so we need to use workarounds, to access sse registers
-	 16-31, which are evex-only. In avx512vl we don't need workarounds.  */
-      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
-	  && (EXT_REX_SSE_REG_P (operands[0])
-	      || EXT_REX_SSE_REG_P (operands[1])))
-	{
-	  if (memory_operand (operands[0], <MODE>mode))
-	    {
-	      if (<MODE_SIZE> == 32)
-		return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
-	      else if (<MODE_SIZE> == 16)
-		return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
-	      else
-		gcc_unreachable ();
-	    }
-	  else if (memory_operand (operands[1], <MODE>mode))
-	    {
-	      if (<MODE_SIZE> == 32)
-		return "vbroadcast<shuffletype>64x4\t{%1, %g0|%g0, %1}";
-	      else if (<MODE_SIZE> == 16)
-		return "vbroadcast<shuffletype>32x4\t{%1, %g0|%g0, %1}";
-	      else
-		gcc_unreachable ();
-	    }
-	  else
-	    /* Reg -> reg move is always aligned.  Just use wider move.  */
-	    switch (get_attr_mode (insn))
-	      {
-	      case MODE_V8SF:
-	      case MODE_V4SF:
-		return "vmovaps\t{%g1, %g0|%g0, %g1}";
-	      case MODE_V4DF:
-	      case MODE_V2DF:
-		return "vmovapd\t{%g1, %g0|%g0, %g1}";
-	      case MODE_OI:
-	      case MODE_TI:
-		return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
-	      default:
-		gcc_unreachable ();
-	      }
-	}
-
-      switch (get_attr_mode (insn))
-	{
-	case MODE_V16SF:
-	case MODE_V8SF:
-	case MODE_V4SF:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return "%vmovups\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovaps\t{%1, %0|%0, %1}";
-
-	case MODE_V8DF:
-	case MODE_V4DF:
-	case MODE_V2DF:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return "%vmovupd\t{%1, %0|%0, %1}";
-	  else
-	    return "%vmovapd\t{%1, %0|%0, %1}";
-
-	case MODE_OI:
-	case MODE_TI:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return TARGET_AVX512VL
-		   && (<MODE>mode == V4SImode
-		       || <MODE>mode == V2DImode
-		       || <MODE>mode == V8SImode
-		       || <MODE>mode == V4DImode
-		       || TARGET_AVX512BW)
-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
-		   : "%vmovdqu\t{%1, %0|%0, %1}";
-	  else
-	    return TARGET_AVX512VL ? "vmovdqa64\t{%1, %0|%0, %1}"
-				   : "%vmovdqa\t{%1, %0|%0, %1}";
-	case MODE_XI:
-	  if (misaligned_operand (operands[0], <MODE>mode)
-	      || misaligned_operand (operands[1], <MODE>mode))
-	    return (<MODE>mode == V16SImode
-		    || <MODE>mode == V8DImode
-		    || TARGET_AVX512BW)
-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
-		   : "vmovdqu64\t{%1, %0|%0, %1}";
-	  else
-	    return "vmovdqa64\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+      return ix86_output_ssemov (insn, operands);
 
     default:
       gcc_unreachable ();
@@ -1113,10 +1022,7 @@ (define_insn "mov<mode>_internal"
   [(set_attr "type" "sselog1,sselog1,ssemov,ssemov")
    (set_attr "prefix" "maybe_vex")
    (set (attr "mode")
-	(cond [(and (eq_attr "alternative" "1")
-		    (match_test "TARGET_AVX512VL"))
-		 (const_string "<sseinsnmode>")
-	       (match_test "TARGET_AVX")
+	(cond [(match_test "TARGET_AVX")
 		 (const_string "<sseinsnmode>")
 	       (ior (not (match_test "TARGET_SSE2"))
 		    (match_test "optimize_function_for_size_p (cfun)"))
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
index 14fe4b84544..db4d9d14875 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
@@ -4,14 +4,13 @@
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\\(\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\\(\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\\(\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\\(\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5,6}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\nxy\]*\\((?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5,6}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr89346.c b/gcc/testsuite/gcc.target/i386/pr89346.c
new file mode 100644
index 00000000000..cdc9accf521
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89346.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+#include <immintrin.h>
+
+long long *p;
+volatile __m256i y;
+
+void
+foo (void)
+{
+   _mm256_store_epi64 (p, y);
+}
+
+/* { dg-final { scan-assembler-not "vmovdqa64" } } */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-27 14:51         ` H.J. Lu
@ 2020-02-29  2:16           ` Jeff Law
  2020-02-29  5:43             ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2020-02-29  2:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote:
> 
> How about this?  If it looks OK, I will post the whole patch set.
It's better.  I'm guessing the two cases that were previously handled with
vextract/vbroadcast aren't supposed to happen?  They're caught here IIUC:

> +  /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
> +     we can only use zmm register move without memory operand.  */
> +   if (evex_reg_p
> +       && !TARGET_AVX512VL
> +       && GET_MODE_SIZE (mode) < 64)
> +     {
> +       if (memory_operand (operands[0], mode)
> +	   || memory_operand (operands[1], mode))
> +	gcc_unreachable ();
> 

If they truly can't happen, that's fine.  My worry is I don't see changes to
the operand predicates or constraints which would avoid this case.   Is it
prevented by the mode iterator on the operands?  Again, just want to make sure
I understand why the vextract/vbroadcast stuff isn't in the new code.

I'm doing a little assuming that the <ssescalarsize> bits in the old code are
mapped correctly to the 32/64 suffixes on the opcodes in the new version.

I'm also assuming that mapping of "size" in the argument to ix86_get_ssemov to
the operand modifiers g, t, and x are right.  I'm guessing the operand
modifiers weren't needed in the original because we had the actual operand and
could look at it to get the right modifier.  In the evex, but not avx512vl case
those are forced to a g modifier which seems to match the original.

Are we going to need further refinements to ix86_output_ssemov/ix86_get_ssemov?
If so, then I'd suggest the next patch be those patterns which don't require
further refinements to ix86_output_ssemov.

If no further refinements to ix86_output_ssemov/ix86_get_ssemov are required,
then I think you can just send the rest of the pattern changes in a single
unit.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-29  2:16           ` Jeff Law
@ 2020-02-29  5:43             ` H.J. Lu
  2020-02-29 12:18               ` H.J. Lu
  2020-03-05 23:43               ` Jeff Law
  0 siblings, 2 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-29  5:43 UTC (permalink / raw)
  To: Jeffrey Law; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Fri, Feb 28, 2020 at 4:16 PM Jeff Law <law@redhat.com> wrote:
>
> On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote:
> >
> > How about this?  If it looks OK, I will post the whole patch set.
> It's better.  I'm guessing the two cases that were previously handled with
> vextract/vbroadcast aren't supposed to happen?  They're caught here IIUC:
>
> > +  /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
> > +     we can only use zmm register move without memory operand.  */
> > +   if (evex_reg_p
> > +       && !TARGET_AVX512VL
> > +       && GET_MODE_SIZE (mode) < 64)
> > +     {
> > +       if (memory_operand (operands[0], mode)
> > +        || memory_operand (operands[1], mode))
> > +     gcc_unreachable ();
> >
>
> If they truly can't happen, that's fine.  My worry is I don't see changes to
> the operand predicates or constraints which would avoid this case.   Is it
> prevented by the mode iterator on the operands?  Again, just want to make sure
> I understand why the vextract/vbroadcast stuff isn't in the new code.

There are no GCC testcases to show that they are actually ever used.   That is
why I removed them and added gcc_unreachable ().

> I'm doing a little assuming that the <ssescalarsize> bits in the old code are
> mapped correctly to the 32/64 suffixes on the opcodes in the new version.
>
> I'm also assuming that mapping of "size" in the argument to ix86_get_ssemov to
> the operand modifiers g, t, and x are right.  I'm guessing the operand
> modifiers weren't needed in the original because we had the actual operand and
> could look at it to get the right modifier.  In the evex, but not avx512vl case
> those are forced to a g modifier which seems to match the original.
>
> Are we going to need further refinements to ix86_output_ssemov/ix86_get_ssemov?
> If so, then I'd suggest the next patch be those patterns which don't require
> further refinements to ix86_output_ssemov.

4 patches don't require changes in ix86_output_ssemov/ix86_get_ssemov:

https://gitlab.com/x86-gcc/gcc/-/commit/426f2464abb80b97b8533f9efa15bbe72e6aa888
https://gitlab.com/x86-gcc/gcc/-/commit/ec5b40d77f7a4424935275f1a7ccedbce83b6f54
https://gitlab.com/x86-gcc/gcc/-/commit/92fdd98234984f86b66fb5403dd828661cd7999f
https://gitlab.com/x86-gcc/gcc/-/commit/f8fa5e571caf6740b36d042d631b4ace11683cd7

I can combine them into a single patch.

Other 5 patches contain a small change to  ix86_output_ssemov:

https://gitlab.com/x86-gcc/gcc/-/commit/b1746392e1d350d689a80fb71b2c72f909c20f30
https://gitlab.com/x86-gcc/gcc/-/commit/14c3cbdbdcc36fa1edea4572b89a039726a4e2bc
https://gitlab.com/x86-gcc/gcc/-/commit/69c8c928b26242116cc261a9d2f6b1265218f1d3
https://gitlab.com/x86-gcc/gcc/-/commit/04335f582f0b281d5f357185d154087997fd7cfd
https://gitlab.com/x86-gcc/gcc/-/commit/64f6a5d6d3405331d9c02aaae0faccf449d6647a

Should I made the change and submit them for review?

> If no further refinements to ix86_output_ssemov/ix86_get_ssemov are required,
> then I think you can just send the rest of the pattern changes in a single
> unit.
>
> jeff
>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-29  5:43             ` H.J. Lu
@ 2020-02-29 12:18               ` H.J. Lu
  2020-03-05 23:43               ` Jeff Law
  1 sibling, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-02-29 12:18 UTC (permalink / raw)
  To: Jeffrey Law; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Fri, Feb 28, 2020 at 6:15 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Feb 28, 2020 at 4:16 PM Jeff Law <law@redhat.com> wrote:
> >
> > On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote:
> > >
> > > How about this?  If it looks OK, I will post the whole patch set.
> > It's better.  I'm guessing the two cases that were previously handled with
> > vextract/vbroadcast aren't supposed to happen?  They're caught here IIUC:
> >
> > > +  /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
> > > +     we can only use zmm register move without memory operand.  */
> > > +   if (evex_reg_p
> > > +       && !TARGET_AVX512VL
> > > +       && GET_MODE_SIZE (mode) < 64)
> > > +     {
> > > +       if (memory_operand (operands[0], mode)
> > > +        || memory_operand (operands[1], mode))
> > > +     gcc_unreachable ();
> > >
> >
> > If they truly can't happen, that's fine.  My worry is I don't see changes to
> > the operand predicates or constraints which would avoid this case.   Is it
> > prevented by the mode iterator on the operands?  Again, just want to make sure
> > I understand why the vextract/vbroadcast stuff isn't in the new code.
>
> There are no GCC testcases to show that they are actually ever used.   That is
> why I removed them and added gcc_unreachable ().

This is covered by the testcases I added:

[hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
#include <x86intrin.h>

extern __m128 d;

void
foo1 (__m128 x)
{
  register __m128 xmm16 __asm ("xmm16") = x;
  asm volatile ("" : "+v" (xmm16));
  d = xmm16;
}
[hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512  /tmp/x.c -S
[hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512 -mno-avx512vl  /tmp/x.c -S
/tmp/x.c: In function ‘foo1’:
/tmp/x.c:8:19: error: register specified for ‘xmm16’ isn’t suitable
for data type
    8 |   register __m128 xmm16 __asm ("xmm16") = x;
      |                   ^~~~~
[hjl@gnu-cfl-2 gcc]$

GCC doesn't allow xmm16-xmm31/ymm16-ymm31 without AVX512VL since
ix86_hard_regno_mode_ok has

     /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
      if (TARGET_AVX512VL
          && (mode == OImode
              || mode == TImode
              || VALID_AVX256_REG_MODE (mode)
              || VALID_AVX512VL_128_REG_MODE (mode)))
        return true;

      /* xmm16-xmm31 are only available for AVX-512.  */
      if (EXT_REX_SSE_REGNO_P (regno))
        return false;

The vextract/vbroadcast stuff is dead code.

> > I'm doing a little assuming that the <ssescalarsize> bits in the old code are
> > mapped correctly to the 32/64 suffixes on the opcodes in the new version.
> >
> > I'm also assuming that mapping of "size" in the argument to ix86_get_ssemov to
> > the operand modifiers g, t, and x are right.  I'm guessing the operand
> > modifiers weren't needed in the original because we had the actual operand and
> > could look at it to get the right modifier.  In the evex, but not avx512vl case
> > those are forced to a g modifier which seems to match the original.
> >
> > Are we going to need further refinements to ix86_output_ssemov/ix86_get_ssemov?
> > If so, then I'd suggest the next patch be those patterns which don't require
> > further refinements to ix86_output_ssemov.
>
> 4 patches don't require changes in ix86_output_ssemov/ix86_get_ssemov:
>
> https://gitlab.com/x86-gcc/gcc/-/commit/426f2464abb80b97b8533f9efa15bbe72e6aa888
> https://gitlab.com/x86-gcc/gcc/-/commit/ec5b40d77f7a4424935275f1a7ccedbce83b6f54
> https://gitlab.com/x86-gcc/gcc/-/commit/92fdd98234984f86b66fb5403dd828661cd7999f
> https://gitlab.com/x86-gcc/gcc/-/commit/f8fa5e571caf6740b36d042d631b4ace11683cd7
>
> I can combine them into a single patch.
>
> Other 5 patches contain a small change to  ix86_output_ssemov:
>
> https://gitlab.com/x86-gcc/gcc/-/commit/b1746392e1d350d689a80fb71b2c72f909c20f30
> https://gitlab.com/x86-gcc/gcc/-/commit/14c3cbdbdcc36fa1edea4572b89a039726a4e2bc
> https://gitlab.com/x86-gcc/gcc/-/commit/69c8c928b26242116cc261a9d2f6b1265218f1d3
> https://gitlab.com/x86-gcc/gcc/-/commit/04335f582f0b281d5f357185d154087997fd7cfd
> https://gitlab.com/x86-gcc/gcc/-/commit/64f6a5d6d3405331d9c02aaae0faccf449d6647a
>
> Should I made the change and submit them for review?

I am preparing the new patch set.

> > If no further refinements to ix86_output_ssemov/ix86_get_ssemov are required,
> > then I think you can just send the rest of the pattern changes in a single
> > unit.
> >
> > jeff
> >

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] i386: Properly encode vector registers in vector move
  2020-02-29  5:43             ` H.J. Lu
  2020-02-29 12:18               ` H.J. Lu
@ 2020-03-05 23:43               ` Jeff Law
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2020-03-05 23:43 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Jakub Jelinek, Jan Hubicka, Uros Bizjak

On Fri, 2020-02-28 at 18:15 -0800, H.J. Lu wrote:
> On Fri, Feb 28, 2020 at 4:16 PM Jeff Law <law@redhat.com> wrote:
> > On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote:
> > > How about this?  If it looks OK, I will post the whole patch set.
> > It's better.  I'm guessing the two cases that were previously handled with
> > vextract/vbroadcast aren't supposed to happen?  They're caught here IIUC:
> > 
> > > +  /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
> > > +     we can only use zmm register move without memory operand.  */
> > > +   if (evex_reg_p
> > > +       && !TARGET_AVX512VL
> > > +       && GET_MODE_SIZE (mode) < 64)
> > > +     {
> > > +       if (memory_operand (operands[0], mode)
> > > +        || memory_operand (operands[1], mode))
> > > +     gcc_unreachable ();
> > > 
> > 
> > If they truly can't happen, that's fine.  My worry is I don't see changes to
> > the operand predicates or constraints which would avoid this case.   Is it
> > prevented by the mode iterator on the operands?  Again, just want to make
> > sure
> > I understand why the vextract/vbroadcast stuff isn't in the new code.
> 
> There are no GCC testcases to show that they are actually ever used.   That is
> why I removed them and added gcc_unreachable ().
Understood.   

> 
> 4 patches don't require changes in ix86_output_ssemov/ix86_get_ssemov:
> 
> https://gitlab.com/x86-gcc/gcc/-/commit/426f2464abb80b97b8533f9efa15bbe72e6aa888
> https://gitlab.com/x86-gcc/gcc/-/commit/ec5b40d77f7a4424935275f1a7ccedbce83b6f54
> https://gitlab.com/x86-gcc/gcc/-/commit/92fdd98234984f86b66fb5403dd828661cd7999f
> https://gitlab.com/x86-gcc/gcc/-/commit/f8fa5e571caf6740b36d042d631b4ace11683cd7
> 
> I can combine them into a single patch.
That sounds reasonable -- it should be trivial to review.  Then we can work
through the patches that require changes to ix86_output_ssemov.

Thanks for your patience.  I'm juggling a fair amount of stuff right now.

jeff


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-03-05 23:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-15 15:26 [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu
2020-02-15 15:26 ` [PATCH 03/10] i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 04/10] i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 05/10] i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 10/10] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 08/10] i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 07/10] i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 06/10] i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 01/10] i386: Properly encode vector registers in vector move H.J. Lu
2020-02-26 22:42   ` Jeff Law
2020-02-27  0:03     ` H.J. Lu
2020-02-27  0:24       ` Jeff Law
2020-02-27 14:51         ` H.J. Lu
2020-02-29  2:16           ` Jeff Law
2020-02-29  5:43             ` H.J. Lu
2020-02-29 12:18               ` H.J. Lu
2020-03-05 23:43               ` Jeff Law
2020-02-15 15:26 ` [PATCH 02/10] i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV H.J. Lu
2020-02-15 15:26 ` [PATCH 09/10] i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV H.J. Lu
2020-02-24 12:55 ` PING^8: [PATCH 00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).