[PATCH] x86: Add -munaligned-vector-move to assembler

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] x86: Add -munaligned-vector-move to assembler
@ 2021-10-21 15:44 H.J. Lu
  2021-10-21 16:11 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: H.J. Lu @ 2021-10-21 15:44 UTC (permalink / raw)
  To: binutils

Unaligned load/store instructions on aligned memory or register are as
fast as aligned load/store instructions on modern Intel processors.  Add
a command-line option, -munaligned-vector-move, to x86 assembler to
encode encode aligned vector load/store instructions as unaligned
vector load/store instructions.

gas/

	* NEWS: Mention -munaligned-vector-move.
	* config/tc-i386.c (unaligned_vector_move): New.
	(cpu_flags_match): Skip UNALGNED_VECTOR_MOVE entries without
	-munaligned-vector-move.
	(OPTION_MUNALGNED_VECTOR_MOVE): New.
	(md_longopts): Add -munaligned-vector-move.
	(md_parse_option): Handle -munaligned-vector-move.
	(md_show_usage): Add -munaligned-vector-move.
	* doc/c-i386.texi: Document -munaligned-vector-move.
	* testsuite/gas/i386/i386.exp: Run unaligned-vector-move and
	x86-64-unaligned-vector-move.
	* testsuite/gas/i386/unaligned-vector-move.d: New file.
	* testsuite/gas/i386/unaligned-vector-move.s: Likewise.
	* testsuite/gas/i386/x86-64-unaligned-vector-move.d: Likewise.

opcodes/

	* i386-gen.c (opcode_modifiers): Add UNALGNED_VECTOR_MOVE.
	* i386-opc.h (UNALGNED_VECTOR_MOVE): New.
	(i386_opcode_modifier): Add unaligned_vector_move.
	* i386-opc.tbl: Add UNALGNED_VECTOR_MOVE entries to encode
	aligned vector move with unaligned vector move.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                      |    3 +
 gas/config/tc-i386.c                          |   16 +
 gas/doc/c-i386.texi                           |    6 +
 gas/testsuite/gas/i386/i386.exp               |    2 +
 .../gas/i386/unaligned-vector-move.d          |   22 +
 .../gas/i386/unaligned-vector-move.s          |   15 +
 .../gas/i386/x86-64-unaligned-vector-move.d   |   23 +
 opcodes/i386-gen.c                            |    1 +
 opcodes/i386-opc.h                            |    4 +
 opcodes/i386-opc.tbl                          |   10 +
 opcodes/i386-tbl.h                            | 9139 +++++++++--------
 11 files changed, 4769 insertions(+), 4472 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/unaligned-vector-move.d
 create mode 100644 gas/testsuite/gas/i386/unaligned-vector-move.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-unaligned-vector-move.d

diff --git a/gas/NEWS b/gas/NEWS
index 5de205ecd55..d07d4c15ca8 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,8 @@
 -*- text -*-
 
+* Add a command-line option, -munaligned-vector-move, for x86 target to
+  encode aligned vector move as unaligned vector move.
+
 * Add support for Cortex-R52+ for Arm.
 
 * Add support for Cortex-A510, Cortex-A710, Cortex-X2 for AArch64.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 339f9694948..f7afdf55463 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -800,6 +800,9 @@ static unsigned int no_cond_jump_promotion = 0;
 /* Encode SSE instructions with VEX prefix.  */
 static unsigned int sse2avx;
 
+/* Encode aligned vector move as unaligned vector move.  */
+static unsigned int unaligned_vector_move;
+
 /* Encode scalar AVX instructions with specific vector length.  */
 static enum
   {
@@ -1950,6 +1953,11 @@ cpu_flags_match (const insn_template *t)
   i386_cpu_flags x = t->cpu_flags;
   int match = cpu_flags_check_cpu64 (x) ? CPU_FLAGS_64BIT_MATCH : 0;
 
+  /* Encode aligned vector move as unaligned vector move if asked.  */
+  if (!unaligned_vector_move
+      && t->opcode_modifier.unaligned_vector_move)
+    return 0;
+
   x.bitfield.cpu64 = 0;
   x.bitfield.cpuno64 = 0;
 
@@ -13060,6 +13068,7 @@ const char *md_shortopts = "qnO::";
 #define OPTION_MLFENCE_AFTER_LOAD (OPTION_MD_BASE + 31)
 #define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
 #define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
+#define OPTION_MUNALGNED_VECTOR_MOVE (OPTION_MD_BASE + 34)
 
 struct option md_longopts[] =
 {
@@ -13081,6 +13090,7 @@ struct option md_longopts[] =
   {"mindex-reg", no_argument, NULL, OPTION_MINDEX_REG},
   {"mnaked-reg", no_argument, NULL, OPTION_MNAKED_REG},
   {"msse2avx", no_argument, NULL, OPTION_MSSE2AVX},
+  {"munaligned-vector-move", no_argument, NULL, OPTION_MUNALGNED_VECTOR_MOVE},
   {"msse-check", required_argument, NULL, OPTION_MSSE_CHECK},
   {"moperand-check", required_argument, NULL, OPTION_MOPERAND_CHECK},
   {"mavxscalar", required_argument, NULL, OPTION_MAVXSCALAR},
@@ -13381,6 +13391,10 @@ md_parse_option (int c, const char *arg)
       sse2avx = 1;
       break;
 
+    case OPTION_MUNALGNED_VECTOR_MOVE:
+      unaligned_vector_move = 1;
+      break;
+
     case OPTION_MSSE_CHECK:
       if (strcasecmp (arg, "error") == 0)
 	sse_check = check_error;
@@ -13796,6 +13810,8 @@ md_show_usage (FILE *stream)
   fprintf (stream, _("\
   -msse2avx               encode SSE instructions with VEX prefix\n"));
   fprintf (stream, _("\
+  -munaligned-vector-move encode aligned vector move as unaligned vector move\n"));
+  fprintf (stream, _("\
   -msse-check=[none|error|warning] (default: warning)\n\
                           check SSE instructions\n"));
   fprintf (stream, _("\
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 99576ef2953..caf8fe0a0a7 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -316,6 +316,12 @@ Valid @var{CPU} values are identical to the processor list of
 This option specifies that the assembler should encode SSE instructions
 with VEX prefix.
 
+@cindex @samp{-munaligned-vector-move} option, i386
+@cindex @samp{-munaligned-vector-move} option, x86-64
+@item -munaligned-vector-move
+This option specifies that the assembler should encode encode aligned
+vector move as unaligned vector move.
+
 @cindex @samp{-msse-check=} option, i386
 @cindex @samp{-msse-check=} option, x86-64
 @item -msse-check=@var{none}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 680259b1c4e..378e32b39cb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -272,6 +272,7 @@ if [gas_32_check] then {
     run_dump_test "evex-wig1-intel"
     run_dump_test "evex-no-scale-32"
     run_dump_test "sse2avx"
+    run_dump_test "unaligned-vector-move"
     run_list_test "inval-avx" "-al"
     run_list_test "inval-avx512f" "-al"
     run_list_test "inval-avx512vl" "-al"
@@ -948,6 +949,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-evex-wig2"
     run_dump_test "evex-no-scale-64"
     run_dump_test "x86-64-sse2avx"
+    run_dump_test "x86-64-unaligned-vector-move"
     run_list_test "x86-64-inval-avx" "-al"
     run_list_test "x86-64-inval-avx512f" "-al"
     run_list_test "x86-64-inval-avx512vl" "-al"
diff --git a/gas/testsuite/gas/i386/unaligned-vector-move.d b/gas/testsuite/gas/i386/unaligned-vector-move.d
new file mode 100644
index 00000000000..be0d96fd8b2
--- /dev/null
+++ b/gas/testsuite/gas/i386/unaligned-vector-move.d
@@ -0,0 +1,22 @@
+#as: -munaligned-vector-move
+#objdump: -dw
+#name: i386 (Encode aligned vector move as unaligned vector move)
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	0f 10 d1             	movups %xmm1,%xmm2
+ +[a-f0-9]+:	66 0f 10 d1          	movupd %xmm1,%xmm2
+ +[a-f0-9]+:	f3 0f 6f d1          	movdqu %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f8 10 d1          	vmovups %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f9 10 d1          	vmovupd %xmm1,%xmm2
+ +[a-f0-9]+:	c5 fa 6f d1          	vmovdqu %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f8 10 d1          	vmovups %xmm1,%xmm2
+ +[a-f0-9]+:	62 f1 fd 09 10 d1    	vmovupd %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 7c 09 10 d1    	vmovups %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 7e 09 6f d1    	vmovdqu32 %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 fe 09 6f d1    	vmovdqu64 %xmm1,%xmm2\{%k1\}
+#pass
diff --git a/gas/testsuite/gas/i386/unaligned-vector-move.s b/gas/testsuite/gas/i386/unaligned-vector-move.s
new file mode 100644
index 00000000000..b88ae232a38
--- /dev/null
+++ b/gas/testsuite/gas/i386/unaligned-vector-move.s
@@ -0,0 +1,15 @@
+# Encode aligned vector move as unaligned vector move.
+
+	.text
+_start:
+	movaps %xmm1, %xmm2
+	movapd %xmm1, %xmm2
+	movdqa %xmm1, %xmm2
+	vmovaps %xmm1, %xmm2
+	vmovapd %xmm1, %xmm2
+	vmovdqa %xmm1, %xmm2
+	vmovaps %xmm1, %xmm2
+	vmovapd %xmm1, %xmm2{%k1}
+	vmovaps %xmm1, %xmm2{%k1}
+	vmovdqa32 %xmm1, %xmm2{%k1}
+	vmovdqa64 %xmm1, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-unaligned-vector-move.d b/gas/testsuite/gas/i386/x86-64-unaligned-vector-move.d
new file mode 100644
index 00000000000..410d9478dad
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-unaligned-vector-move.d
@@ -0,0 +1,23 @@
+#source: unaligned-vector-move.s
+#as: -munaligned-vector-move
+#objdump: -dw
+#name: x86-64 (Encode aligned vector move as unaligned vector move)
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	0f 10 d1             	movups %xmm1,%xmm2
+ +[a-f0-9]+:	66 0f 10 d1          	movupd %xmm1,%xmm2
+ +[a-f0-9]+:	f3 0f 6f d1          	movdqu %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f8 10 d1          	vmovups %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f9 10 d1          	vmovupd %xmm1,%xmm2
+ +[a-f0-9]+:	c5 fa 6f d1          	vmovdqu %xmm1,%xmm2
+ +[a-f0-9]+:	c5 f8 10 d1          	vmovups %xmm1,%xmm2
+ +[a-f0-9]+:	62 f1 fd 09 10 d1    	vmovupd %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 7c 09 10 d1    	vmovups %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 7e 09 6f d1    	vmovdqu32 %xmm1,%xmm2\{%k1\}
+ +[a-f0-9]+:	62 f1 fe 09 6f d1    	vmovdqu64 %xmm1,%xmm2\{%k1\}
+#pass
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index f8de27ef346..42058ccbed4 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -731,6 +731,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (SIB),
   BITFIELD (SSE2AVX),
   BITFIELD (NoAVX),
+  BITFIELD (UNALGNED_VECTOR_MOVE),
   BITFIELD (EVex),
   BITFIELD (Masking),
   BITFIELD (Broadcast),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index f83af0b2bb4..30b96a911cf 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -636,6 +636,9 @@ enum
   /* No AVX equivalent */
   NoAVX,
 
+  /* Encode aligned vector move as unaligned vector move.  */
+  UNALGNED_VECTOR_MOVE,
+
   /* insn has EVEX prefix:
 	1: 512bit EVEX prefix.
 	2: 128bit EVEX prefix.
@@ -761,6 +764,7 @@ typedef struct i386_opcode_modifier
   unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
+  unsigned int unaligned_vector_move:1;
   unsigned int evex:3;
   unsigned int masking:2;
   unsigned int broadcast:3;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index ba9451a4e4e..dedaca6a7bc 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -1086,6 +1086,7 @@ maxps<sse>, 0x0f5f, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf
 maxss<sse>, 0xf30f5f, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
 minps<sse>, 0x0f5d, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minss<sse>, 0xf30f5d, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movaps<sse>, 0x0f10, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movaps<sse>, 0x0f28, None, <sse:cpu>, D|Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhlps<sse>, 0x0f12, None, <sse:cpu>, Modrm|<sse:attr>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
 movhps, 0x16, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
@@ -1176,6 +1177,7 @@ maxpd<sse2>, 0x660f5f, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|N
 maxsd<sse2>, 0xf20f5f, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 minpd<sse2>, 0x660f5d, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 minsd<sse2>, 0xf20f5d, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movapd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movapd<sse2>, 0x660f28, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movhpd, 0x6616, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
 movhpd, 0x6617, None, CpuAVX, Modrm|Vex|Space0F|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex }
@@ -1219,6 +1221,7 @@ cvttsd2si, 0xf20f2c, None, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_
 cvttpd2dq<sse2>, 0x660fe6, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cvttps2dq<sse2>, 0xf30f5b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 maskmovdqu<sse2>, 0x660ff7, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
+movdqa<sse2>, 0xf30f6f, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movdqa<sse2>, 0x660f6f, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movdqu<sse2>, 0xf30f6f, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 movdq2q, 0xf20fd6, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { RegXMM, RegMMX }
@@ -1565,7 +1568,9 @@ vminpd, 0x665d, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No
 vminps, 0x5d, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
 vminsd, 0xf25d, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vminss, 0xf35d, None, CpuAVX, Modrm|Vex=3|Space0F|VexVVVV|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vmovapd, 0x6610, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovapd, 0x6628, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
+vmovaps, 0x10, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovaps, 0x28, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 // vmovd really shouldn't allow for 64bit operand (vmovq is the right
 // mnemonic for copying between Reg64/Mem64 and RegXMM, as is mandated
@@ -1576,6 +1581,7 @@ vmovd, 0x666e, None, CpuAVX, D|Modrm|Vex=1|Space0F|IgnoreSize|No_bSuf|No_wSuf|No
 vmovd, 0x667e, None, CpuAVX|Cpu64, D|RegMem|Vex=1|Space0F|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { RegXMM, Reg64 }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 vmovddup, 0xf212, None, CpuAVX, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM }
+vmovdqa, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqa, 0x666f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovdqu, 0xf36f, None, CpuAVX, D|Modrm|Vex|Space0F|VexWIG|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM }
 vmovhlps, 0x12, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
@@ -2799,10 +2805,12 @@ vmaxss, 0xF35F, None, CpuAVX512F, Modrm|EVex=4|Masking=3|Space0F|VexVVVV=1|VexW=
 vminss, 0xF35D, None, CpuAVX512F, Modrm|EVex=4|Masking=3|Space0F|VexVVVV|VexW0|Disp8MemShift=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
 vminss, 0xF35D, None, CpuAVX512F, Modrm|EVex=4|Masking=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
 
+vmovapd, 0x6610, None, CpuAVX512F, D|Modrm|Load|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovapd, 0x6628, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovntpd, 0x662B, None, CpuAVX512F, Modrm|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovupd, 0x6610, None, CpuAVX512F, D|Modrm|Load|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 
+vmovaps, 0x10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|UNALGNED_VECTOR_MOVE, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovaps, 0x28, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovntps, 0x2B, None, CpuAVX512F, Modrm|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovups, 0x10, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
@@ -2811,7 +2819,9 @@ vmovd, 0x666E, None, CpuAVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|IgnoreSi
 
 vmovddup, 0xF212, None, CpuAVX512F, Modrm|Masking=3|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM }
 
+vmovdqa64, 0xF36F, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize|UNALGNED_VECTOR_MOVE, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqa64, 0x666F, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vmovdqa32, 0xF36F, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize|UNALGNED_VECTOR_MOVE, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovdqa32, 0x666F, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
 vmovntdq, 0x66E7, None, CpuAVX512F, Modrm|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex }
 vmovdqu32, 0xF36F, None, CpuAVX512F, D|Modrm|MaskingMorZ|Space0F|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86: Add -munaligned-vector-move to assembler
  2021-10-21 15:44 [PATCH] x86: Add -munaligned-vector-move to assembler H.J. Lu
@ 2021-10-21 16:11 ` Jan Beulich
  2021-10-21 16:44   ` H.J. Lu
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2021-10-21 16:11 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

On 21.10.2021 17:44, H.J. Lu wrote:
> Unaligned load/store instructions on aligned memory or register are as
> fast as aligned load/store instructions on modern Intel processors.  Add
> a command-line option, -munaligned-vector-move, to x86 assembler to
> encode encode aligned vector load/store instructions as unaligned
> vector load/store instructions.

But this doesn't clarify yet what the benefit is. For legacy encoded ones
it might be the shorter insn encoding, but for the VEX and EVEX ones? And
if encoding size matters, how about modern CPUs' behavior for MOV{A,U}PS
vs MOV{A,U}PD and VMOVDQ{A,U}?

> @@ -1950,6 +1953,11 @@ cpu_flags_match (const insn_template *t)
>    i386_cpu_flags x = t->cpu_flags;
>    int match = cpu_flags_check_cpu64 (x) ? CPU_FLAGS_64BIT_MATCH : 0;
>  
> +  /* Encode aligned vector move as unaligned vector move if asked.  */
> +  if (!unaligned_vector_move
> +      && t->opcode_modifier.unaligned_vector_move)
> +    return 0;

New (and effectively redundant) templates just to record this extra flag
look wasteful to me. Couldn't you arrange for this via the Optimize flag
(or some derived logic simply fiddling with the opcodes; the patterns
are sufficiently regular iirc)?

> @@ -13060,6 +13068,7 @@ const char *md_shortopts = "qnO::";
>  #define OPTION_MLFENCE_AFTER_LOAD (OPTION_MD_BASE + 31)
>  #define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
>  #define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
> +#define OPTION_MUNALGNED_VECTOR_MOVE (OPTION_MD_BASE + 34)

Did you miss an I here and ...

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -731,6 +731,7 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (SIB),
>    BITFIELD (SSE2AVX),
>    BITFIELD (NoAVX),
> +  BITFIELD (UNALGNED_VECTOR_MOVE),

... here and ...

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -636,6 +636,9 @@ enum
>    /* No AVX equivalent */
>    NoAVX,
>  
> +  /* Encode aligned vector move as unaligned vector move.  */
> +  UNALGNED_VECTOR_MOVE,

... here?

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86: Add -munaligned-vector-move to assembler
  2021-10-21 16:11 ` Jan Beulich
@ 2021-10-21 16:44   ` H.J. Lu
  2021-10-22  6:12     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: H.J. Lu @ 2021-10-21 16:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Oct 21, 2021 at 9:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 21.10.2021 17:44, H.J. Lu wrote:
> > Unaligned load/store instructions on aligned memory or register are as
> > fast as aligned load/store instructions on modern Intel processors.  Add
> > a command-line option, -munaligned-vector-move, to x86 assembler to
> > encode encode aligned vector load/store instructions as unaligned
> > vector load/store instructions.
>
> But this doesn't clarify yet what the benefit is. For legacy encoded ones

AVX ops take unaligned memory, except for load and store.   This makes
all AVX load/store to take unaligned memory.

> it might be the shorter insn encoding, but for the VEX and EVEX ones? And
> if encoding size matters, how about modern CPUs' behavior for MOV{A,U}PS
> vs MOV{A,U}PD and VMOVDQ{A,U}?

They have no performance difference on aligned memory.  Will it help
if it is limited to AVX load/store?

> > @@ -1950,6 +1953,11 @@ cpu_flags_match (const insn_template *t)
> >    i386_cpu_flags x = t->cpu_flags;
> >    int match = cpu_flags_check_cpu64 (x) ? CPU_FLAGS_64BIT_MATCH : 0;
> >
> > +  /* Encode aligned vector move as unaligned vector move if asked.  */
> > +  if (!unaligned_vector_move
> > +      && t->opcode_modifier.unaligned_vector_move)
> > +    return 0;
>
> New (and effectively redundant) templates just to record this extra flag
> look wasteful to me. Couldn't you arrange for this via the Optimize flag
> (or some derived logic simply fiddling with the opcodes; the patterns
> are sufficiently regular iirc)?

I will see what I can do.

> > @@ -13060,6 +13068,7 @@ const char *md_shortopts = "qnO::";
> >  #define OPTION_MLFENCE_AFTER_LOAD (OPTION_MD_BASE + 31)
> >  #define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
> >  #define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
> > +#define OPTION_MUNALGNED_VECTOR_MOVE (OPTION_MD_BASE + 34)
>
> Did you miss an I here and ...

Typos will be fixed.

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -731,6 +731,7 @@ static bitfield opcode_modifiers[] =
> >    BITFIELD (SIB),
> >    BITFIELD (SSE2AVX),
> >    BITFIELD (NoAVX),
> > +  BITFIELD (UNALGNED_VECTOR_MOVE),
>
> ... here and ...
>
> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -636,6 +636,9 @@ enum
> >    /* No AVX equivalent */
> >    NoAVX,
> >
> > +  /* Encode aligned vector move as unaligned vector move.  */
> > +  UNALGNED_VECTOR_MOVE,
>
> ... here?
>
> Jan
>

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86: Add -munaligned-vector-move to assembler
  2021-10-21 16:44   ` H.J. Lu
@ 2021-10-22  6:12     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2021-10-22  6:12 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 21.10.2021 18:44, H.J. Lu wrote:
> On Thu, Oct 21, 2021 at 9:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 21.10.2021 17:44, H.J. Lu wrote:
>>> Unaligned load/store instructions on aligned memory or register are as
>>> fast as aligned load/store instructions on modern Intel processors.  Add
>>> a command-line option, -munaligned-vector-move, to x86 assembler to
>>> encode encode aligned vector load/store instructions as unaligned
>>> vector load/store instructions.
>>
>> But this doesn't clarify yet what the benefit is. For legacy encoded ones
> 
> AVX ops take unaligned memory, except for load and store.   This makes
> all AVX load/store to take unaligned memory.

I have to admit that I still don't see the gains of this option then.
The more that only now I realize that the encoding length is unaffected
here, and would only make a difference ...

>> it might be the shorter insn encoding, but for the VEX and EVEX ones? And
>> if encoding size matters, how about modern CPUs' behavior for MOV{A,U}PS
>> vs MOV{A,U}PD and VMOVDQ{A,U}?
> 
> They have no performance difference on aligned memory.  Will it help
> if it is limited to AVX load/store?

... with these. I don't see any gains to be had if such a further change
was made for AVX insns, let alone _only_ for them.

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-22  6:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 15:44 [PATCH] x86: Add -munaligned-vector-move to assembler H.J. Lu
2021-10-21 16:11 ` Jan Beulich
2021-10-21 16:44   ` H.J. Lu
2021-10-22  6:12     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).