public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x32: Generate 0x67 prefix for VSIB address without base
@ 2019-02-25 14:02 H.J. Lu
  2019-02-25 15:14 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-25 14:02 UTC (permalink / raw)
  To: binutils

In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
register so that vector index register won't be sign-extended to 64 bits.
We can't have ADDR_PREFIX_OPCODE prefix for VSIB if there is segment
override since address will be segment override + zero-extended to 64
bits of (base + index * scale + disp).

	PR gas/24263
	* config/tc-i386.c (output_insn): In x32, add 0x67 address size
	prefix for VSIB address without base register and issue an error
	if there is segment override with ADDR_PREFIX_OPCODE prefix.
	* testsuite/gas/i386/ilp32/ilp32.exp: Run x86-64-avx-vsib-inval.
	* testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l: New file.
	* testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-avx-gather.d: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-avx-gather.s: Likewise.
---
 gas/config/tc-i386.c                          |  26 +++
 gas/testsuite/gas/i386/ilp32/ilp32.exp        |   1 +
 .../gas/i386/ilp32/x86-64-avx-vsib-inval.l    |   3 +
 .../gas/i386/ilp32/x86-64-avx-vsib-inval.s    |   3 +
 .../gas/i386/ilp32/x86-64-avx-vsib.d          | 203 ++++++++++++++++++
 .../gas/i386/ilp32/x86-64-avx-vsib.s          |   5 +
 6 files changed, 241 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d31ee6abdd..cc59516221 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8194,6 +8194,32 @@ output_insn (void)
 	}
       else
 	{
+#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
+	  if (x86_elf_abi == X86_64_X32_ABI
+	      && i.tm.opcode_modifier.vecsib)
+	    {
+	      /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
+		 without base register so that vector index register
+		 won't be sign-extended to 64 bits.  */
+	      if (!i.base_reg)
+		add_prefix (ADDR_PREFIX_OPCODE);
+	      /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
+	         VSIB if there is segment override since address will
+		 be segment override + zero-extended to 64 bits of
+		 (base + index * scale + disp).  */
+	      if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
+		{
+		   const seg_entry *seg;
+		   if (i.seg[0])
+		     seg = i.seg[0];
+		   else
+		     seg = i.seg[1];
+		   as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
+			   seg->seg_name);
+		}
+	    }
+#endif
+
 	  for (j = 0, q = i.prefix; j < ARRAY_SIZE (i.prefix); j++, q++)
 	    if (*q)
 	      switch (j)
diff --git a/gas/testsuite/gas/i386/ilp32/ilp32.exp b/gas/testsuite/gas/i386/ilp32/ilp32.exp
index d3a7190ac5..600725aaba 100644
--- a/gas/testsuite/gas/i386/ilp32/ilp32.exp
+++ b/gas/testsuite/gas/i386/ilp32/ilp32.exp
@@ -38,6 +38,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_x32_check] &
     }
 
     run_list_test "reloc64" "--defsym _bad_=1"
+    run_list_test "x86-64-avx-vsib-inval"
 
     set ASFLAGS "$old_ASFLAGS"
 }
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
new file mode 100644
index 0000000000..c1e75b6d86
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:2: Error: can't encode segment `fs' with 32-bit VSIB
+.*:3: Error: can't encode segment `gs' with 32-bit VSIB
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
new file mode 100644
index 0000000000..2bb09e6944
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
@@ -0,0 +1,3 @@
+	.text
+	vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps %ymm12,%gs:0xc(,%ymm15,1),%ymm11
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
new file mode 100644
index 0000000000..b6e82ee419
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
@@ -0,0 +1,203 @@
+#as: -I$srcdir/$subdir
+#objdump: -dw
+#name: x86-64 (ILP32) AVX gather
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	64 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%rax,%ymm15,1\),%ymm11
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s
new file mode 100644
index 0000000000..c8ca6e7091
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s
@@ -0,0 +1,5 @@
+.include "../x86-64-avx-gather.s"
+
+	.text
+	.att_syntax
+	vgatherdps	%ymm12,%fs:0xc(%rax,%ymm15,1),%ymm11
-- 
2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 14:02 [PATCH] x32: Generate 0x67 prefix for VSIB address without base H.J. Lu
@ 2019-02-25 15:14 ` Jan Beulich
  2019-02-25 15:56   ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-25 15:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 25.02.19 at 15:02, <hjl.tools@gmail.com> wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -8194,6 +8194,32 @@ output_insn (void)
>  	}
>        else
>  	{
> +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> +	  if (x86_elf_abi == X86_64_X32_ABI
> +	      && i.tm.opcode_modifier.vecsib)
> +	    {
> +	      /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> +		 without base register so that vector index register
> +		 won't be sign-extended to 64 bits.  */
> +	      if (!i.base_reg)
> +		add_prefix (ADDR_PREFIX_OPCODE);

Leaving aside the question of how one would go about overriding
this behavior (after all iirc it's not forbidden to use full 64-bit
addresses / base registers in x32) I understand this part, but ...

> +	      /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
> +	         VSIB if there is segment override since address will
> +		 be segment override + zero-extended to 64 bits of
> +		 (base + index * scale + disp).  */

... I don't understand this: What is it that goes wrong here in
x32 mode? "base" is either zero or a full 64-bit value anyway, so
I'm struggling in the first place what (uniform) zero-extension the
comment is talking about. But I also don't understand why, if this
was needed at all, it would affect VSIB addressing only.

Also indentation looks inconsistent here (tabs vs spaces).

> +	      if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
> +		{
> +		   const seg_entry *seg;
> +		   if (i.seg[0])
> +		     seg = i.seg[0];
> +		   else
> +		     seg = i.seg[1];
> +		   as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
> +			   seg->seg_name);

Please don't emit the % prefix unconditionally, it should not be there
in no-prefix / Intel mode.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 15:14 ` Jan Beulich
@ 2019-02-25 15:56   ` H.J. Lu
  2019-02-25 16:18     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-25 15:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]

On Mon, Feb 25, 2019 at 7:14 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 25.02.19 at 15:02, <hjl.tools@gmail.com> wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -8194,6 +8194,32 @@ output_insn (void)
> >       }
> >        else
> >       {
> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> > +       if (x86_elf_abi == X86_64_X32_ABI
> > +           && i.tm.opcode_modifier.vecsib)
> > +         {
> > +           /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> > +              without base register so that vector index register
> > +              won't be sign-extended to 64 bits.  */
> > +           if (!i.base_reg)
> > +             add_prefix (ADDR_PREFIX_OPCODE);
>
> Leaving aside the question of how one would go about overriding
> this behavior (after all iirc it's not forbidden to use full 64-bit
> addresses / base registers in x32) I understand this part, but ...

You can use a 64-bit base register to avoid 0x67 prefix,

> > +           /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
> > +              VSIB if there is segment override since address will
> > +              be segment override + zero-extended to 64 bits of
> > +              (base + index * scale + disp).  */
>
> ... I don't understand this: What is it that goes wrong here in
> x32 mode? "base" is either zero or a full 64-bit value anyway, so

"base" can be a 32-bit register:

vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11

> I'm struggling in the first place what (uniform) zero-extension the
> comment is talking about. But I also don't understand why, if this
> was needed at all, it would affect VSIB addressing only.

Segment override is applied AFTER "base + index * scale + disp".
So memory address in

movl %fs:(%eax), %eax

is %fs + zero-extend (%eax), not zero-extend (%fs + %eax).
In x32, GCC avoids 32-bit base/index for TLS:

[hjl@gnu-cfl-1 tmp]$ cat x.i
extern __thread int i __attribute__((__visibility__("hidden")));

int
foo (void)
{
  return i;
}
[hjl@gnu-cfl-1 tmp]$ gcc -S -O2  x.i
[hjl@gnu-cfl-1 tmp]$ cat x.s
.file "x.i"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movq i@gottpoff(%rip), %rax
movl %fs:(%rax), %eax
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.hidden i
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 tmp]$ gcc -S -O2  x.i -mx32
[hjl@gnu-cfl-1 tmp]$ cat x.s
.file "x.i"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movl %fs:0, %eax
addl i@gottpoff(%rip), %eax
movl (%eax), %eax
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.hidden i
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 tmp]$


> Also indentation looks inconsistent here (tabs vs spaces).

Fixed.

> > +           if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
> > +             {
> > +                const seg_entry *seg;
> > +                if (i.seg[0])
> > +                  seg = i.seg[0];
> > +                else
> > +                  seg = i.seg[1];
> > +                as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
> > +                        seg->seg_name);
>
> Please don't emit the % prefix unconditionally, it should not be there
> in no-prefix / Intel mode.

What % prefix?

[hjl@gnu-4 pr24263]$ cat y.s
.text
vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
vgatherdps %ymm12,%fs:0xc(,%ymm15,1),%ymm11
[hjl@gnu-4 pr24263]$ ./as --x32  -o y.o y.s
y.s: Assembler messages:
y.s:2: Error: can't encode segment `fs' with 32-bit VSIB
y.s:3: Error: can't encode segment `fs' with 32-bit VSIB
[hjl@gnu-4 pr24263]$

-- 
H.J.

[-- Attachment #2: 0001-x32-Generate-0x67-prefix-for-VSIB-address-without-ba.patch --]
[-- Type: text/x-patch, Size: 21531 bytes --]

From d187d424a2d62ec02876b56f67d750566f10831f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 24 Feb 2019 18:38:33 -0800
Subject: [PATCH] x32: Generate 0x67 prefix for VSIB address without base

In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
register so that vector index register won't be sign-extended to 64 bits.
We can't have ADDR_PREFIX_OPCODE prefix for VSIB if there is segment
override since address will be segment override + zero-extended to 64
bits of (base + index * scale + disp).

	PR gas/24263
	* config/tc-i386.c (output_insn): In x32, add 0x67 address size
	prefix for VSIB address without base register and issue an error
	if there is segment override with ADDR_PREFIX_OPCODE prefix.
	* testsuite/gas/i386/ilp32/ilp32.exp: Run x86-64-avx-vsib-inval.
	* testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l: New file.
	* testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-avx-gather.d: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-avx-gather.s: Likewise.
---
 gas/config/tc-i386.c                          |  26 +++
 gas/testsuite/gas/i386/ilp32/ilp32.exp        |   1 +
 .../gas/i386/ilp32/x86-64-avx-vsib-inval.l    |   3 +
 .../gas/i386/ilp32/x86-64-avx-vsib-inval.s    |   3 +
 .../gas/i386/ilp32/x86-64-avx-vsib.d          | 204 ++++++++++++++++++
 .../gas/i386/ilp32/x86-64-avx-vsib.s          |   6 +
 6 files changed, 243 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d31ee6abdd..e9f16291a0 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8194,6 +8194,32 @@ output_insn (void)
 	}
       else
 	{
+#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
+	  if (x86_elf_abi == X86_64_X32_ABI
+	      && i.tm.opcode_modifier.vecsib)
+	    {
+	      /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
+		 without base register so that vector index register
+		 won't be sign-extended to 64 bits.  */
+	      if (!i.base_reg)
+		add_prefix (ADDR_PREFIX_OPCODE);
+	      /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
+		 VSIB if there is segment override since address will
+		 be segment override + zero-extended to 64 bits of
+		 (base + index * scale + disp).  */
+	      if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
+		{
+		  const seg_entry *seg;
+		  if (i.seg[0])
+		    seg = i.seg[0];
+		  else
+		    seg = i.seg[1];
+		  as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
+			  seg->seg_name);
+		}
+	    }
+#endif
+
 	  for (j = 0, q = i.prefix; j < ARRAY_SIZE (i.prefix); j++, q++)
 	    if (*q)
 	      switch (j)
diff --git a/gas/testsuite/gas/i386/ilp32/ilp32.exp b/gas/testsuite/gas/i386/ilp32/ilp32.exp
index d3a7190ac5..600725aaba 100644
--- a/gas/testsuite/gas/i386/ilp32/ilp32.exp
+++ b/gas/testsuite/gas/i386/ilp32/ilp32.exp
@@ -38,6 +38,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_x32_check] &
     }
 
     run_list_test "reloc64" "--defsym _bad_=1"
+    run_list_test "x86-64-avx-vsib-inval"
 
     set ASFLAGS "$old_ASFLAGS"
 }
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
new file mode 100644
index 0000000000..c1e75b6d86
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.l
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:2: Error: can't encode segment `fs' with 32-bit VSIB
+.*:3: Error: can't encode segment `gs' with 32-bit VSIB
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
new file mode 100644
index 0000000000..2bb09e6944
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib-inval.s
@@ -0,0 +1,3 @@
+	.text
+	vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps %ymm12,%gs:0xc(,%ymm15,1),%ymm11
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
new file mode 100644
index 0000000000..71db0ea53a
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.d
@@ -0,0 +1,204 @@
+#as: -I$srcdir/$subdir
+#objdump: -dw
+#name: x86-64 (ILP32) AVX gather
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%rax,%ymm15,1\),%ymm11
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s
new file mode 100644
index 0000000000..a72d0fdfe1
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-avx-vsib.s
@@ -0,0 +1,6 @@
+.include "../x86-64-avx-gather.s"
+
+	.text
+	.att_syntax
+	vgatherdps	%ymm12,0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps	%ymm12,%fs:0xc(%rax,%ymm15,1),%ymm11
-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 15:56   ` H.J. Lu
@ 2019-02-25 16:18     ` Jan Beulich
  2019-02-25 16:54       ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-25 16:18 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 25.02.19 at 16:55, <hjl.tools@gmail.com> wrote:
> On Mon, Feb 25, 2019 at 7:14 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 25.02.19 at 15:02, <hjl.tools@gmail.com> wrote:
>> > --- a/gas/config/tc-i386.c
>> > +++ b/gas/config/tc-i386.c
>> > @@ -8194,6 +8194,32 @@ output_insn (void)
>> >       }
>> >        else
>> >       {
>> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
>> > +       if (x86_elf_abi == X86_64_X32_ABI
>> > +           && i.tm.opcode_modifier.vecsib)
>> > +         {
>> > +           /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
>> > +              without base register so that vector index register
>> > +              won't be sign-extended to 64 bits.  */
>> > +           if (!i.base_reg)
>> > +             add_prefix (ADDR_PREFIX_OPCODE);
>>
>> Leaving aside the question of how one would go about overriding
>> this behavior (after all iirc it's not forbidden to use full 64-bit
>> addresses / base registers in x32) I understand this part, but ...
> 
> You can use a 64-bit base register to avoid 0x67 prefix,

I don't understand - this is about the no-base-register case. I was
referring to the 64-bit base register case merely for comparison
purposes. Don't forget that in the qword-index-element case you
now actively truncate index elements.

>> > +           /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
>> > +              VSIB if there is segment override since address will
>> > +              be segment override + zero-extended to 64 bits of
>> > +              (base + index * scale + disp).  */
>>
>> ... I don't understand this: What is it that goes wrong here in
>> x32 mode? "base" is either zero or a full 64-bit value anyway, so
> 
> "base" can be a 32-bit register:
> 
> vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11

Oh, sorry, I mixed up base and segment base (because of the
mention of a segment override here).

>> I'm struggling in the first place what (uniform) zero-extension the
>> comment is talking about. But I also don't understand why, if this
>> was needed at all, it would affect VSIB addressing only.
> 
> Segment override is applied AFTER "base + index * scale + disp".
> So memory address in
> 
> movl %fs:(%eax), %eax
> 
> is %fs + zero-extend (%eax), not zero-extend (%fs + %eax).

Sure.

> In x32, GCC avoids 32-bit base/index for TLS:
>[...]

I'm sorry, but this still doesn't make me see what's wrong with
segment overrides in x32 mode, and only with VSIB addressing.
 
>> > +           if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
>> > +             {
>> > +                const seg_entry *seg;
>> > +                if (i.seg[0])
>> > +                  seg = i.seg[0];
>> > +                else
>> > +                  seg = i.seg[1];
>> > +                as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
>> > +                        seg->seg_name);
>>
>> Please don't emit the % prefix unconditionally, it should not be there
>> in no-prefix / Intel mode.
> 
> What % prefix?
> 
> [hjl@gnu-4 pr24263]$ cat y.s
> .text
> vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
> vgatherdps %ymm12,%fs:0xc(,%ymm15,1),%ymm11
> [hjl@gnu-4 pr24263]$ ./as --x32  -o y.o y.s
> y.s: Assembler messages:
> y.s:2: Error: can't encode segment `fs' with 32-bit VSIB
> y.s:3: Error: can't encode segment `fs' with 32-bit VSIB
> [hjl@gnu-4 pr24263]$

Oops, sorry - I should have asked for a % prefix to be added outside
of no-prefix mode.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 16:18     ` Jan Beulich
@ 2019-02-25 16:54       ` H.J. Lu
  2019-02-25 22:55         ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-25 16:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Feb 25, 2019 at 8:18 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 25.02.19 at 16:55, <hjl.tools@gmail.com> wrote:
> > On Mon, Feb 25, 2019 at 7:14 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 25.02.19 at 15:02, <hjl.tools@gmail.com> wrote:
> >> > --- a/gas/config/tc-i386.c
> >> > +++ b/gas/config/tc-i386.c
> >> > @@ -8194,6 +8194,32 @@ output_insn (void)
> >> >       }
> >> >        else
> >> >       {
> >> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> >> > +       if (x86_elf_abi == X86_64_X32_ABI
> >> > +           && i.tm.opcode_modifier.vecsib)
> >> > +         {
> >> > +           /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> >> > +              without base register so that vector index register
> >> > +              won't be sign-extended to 64 bits.  */
> >> > +           if (!i.base_reg)
> >> > +             add_prefix (ADDR_PREFIX_OPCODE);
> >>
> >> Leaving aside the question of how one would go about overriding
> >> this behavior (after all iirc it's not forbidden to use full 64-bit
> >> addresses / base registers in x32) I understand this part, but ...
> >
> > You can use a 64-bit base register to avoid 0x67 prefix,
>
> I don't understand - this is about the no-base-register case. I was
> referring to the 64-bit base register case merely for comparison
> purposes. Don't forget that in the qword-index-element case you
> now actively truncate index elements.

Without a base register, 0x67 prefix will added for VSIB.

> >> > +           /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
> >> > +              VSIB if there is segment override since address will
> >> > +              be segment override + zero-extended to 64 bits of
> >> > +              (base + index * scale + disp).  */
> >>
> >> ... I don't understand this: What is it that goes wrong here in
> >> x32 mode? "base" is either zero or a full 64-bit value anyway, so
> >
> > "base" can be a 32-bit register:
> >
> > vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
>
> Oh, sorry, I mixed up base and segment base (because of the
> mention of a segment override here).

Explicit segment is disallowed for 32-bit address in x32.

> >> I'm struggling in the first place what (uniform) zero-extension the
> >> comment is talking about. But I also don't understand why, if this
> >> was needed at all, it would affect VSIB addressing only.
> >
> > Segment override is applied AFTER "base + index * scale + disp".
> > So memory address in
> >
> > movl %fs:(%eax), %eax
> >
> > is %fs + zero-extend (%eax), not zero-extend (%fs + %eax).
>
> Sure.
>
> > In x32, GCC avoids 32-bit base/index for TLS:
> >[...]
>
> I'm sorry, but this still doesn't make me see what's wrong with
> segment overrides in x32 mode, and only with VSIB addressing.

I will add check for other cases.

> >> > +           if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
> >> > +             {
> >> > +                const seg_entry *seg;
> >> > +                if (i.seg[0])
> >> > +                  seg = i.seg[0];
> >> > +                else
> >> > +                  seg = i.seg[1];
> >> > +                as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
> >> > +                        seg->seg_name);
> >>
> >> Please don't emit the % prefix unconditionally, it should not be there
> >> in no-prefix / Intel mode.
> >
> > What % prefix?
> >
> > [hjl@gnu-4 pr24263]$ cat y.s
> > .text
> > vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
> > vgatherdps %ymm12,%fs:0xc(,%ymm15,1),%ymm11
> > [hjl@gnu-4 pr24263]$ ./as --x32  -o y.o y.s
> > y.s: Assembler messages:
> > y.s:2: Error: can't encode segment `fs' with 32-bit VSIB
> > y.s:3: Error: can't encode segment `fs' with 32-bit VSIB
> > [hjl@gnu-4 pr24263]$
>
> Oops, sorry - I should have asked for a % prefix to be added outside
> of no-prefix mode.

Sure.


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 16:54       ` H.J. Lu
@ 2019-02-25 22:55         ` H.J. Lu
  2019-02-26  4:35           ` V2: " H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-25 22:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

[-- Attachment #1: Type: text/plain, Size: 4213 bytes --]

On Mon, Feb 25, 2019 at 8:53 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Feb 25, 2019 at 8:18 AM Jan Beulich <JBeulich@suse.com> wrote:
> >
> > >>> On 25.02.19 at 16:55, <hjl.tools@gmail.com> wrote:
> > > On Mon, Feb 25, 2019 at 7:14 AM Jan Beulich <JBeulich@suse.com> wrote:
> > >>
> > >> >>> On 25.02.19 at 15:02, <hjl.tools@gmail.com> wrote:
> > >> > --- a/gas/config/tc-i386.c
> > >> > +++ b/gas/config/tc-i386.c
> > >> > @@ -8194,6 +8194,32 @@ output_insn (void)
> > >> >       }
> > >> >        else
> > >> >       {
> > >> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> > >> > +       if (x86_elf_abi == X86_64_X32_ABI
> > >> > +           && i.tm.opcode_modifier.vecsib)
> > >> > +         {
> > >> > +           /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> > >> > +              without base register so that vector index register
> > >> > +              won't be sign-extended to 64 bits.  */
> > >> > +           if (!i.base_reg)
> > >> > +             add_prefix (ADDR_PREFIX_OPCODE);
> > >>
> > >> Leaving aside the question of how one would go about overriding
> > >> this behavior (after all iirc it's not forbidden to use full 64-bit
> > >> addresses / base registers in x32) I understand this part, but ...
> > >
> > > You can use a 64-bit base register to avoid 0x67 prefix,
> >
> > I don't understand - this is about the no-base-register case. I was
> > referring to the 64-bit base register case merely for comparison
> > purposes. Don't forget that in the qword-index-element case you
> > now actively truncate index elements.
>
> Without a base register, 0x67 prefix will added for VSIB.
>
> > >> > +           /* In x32, we can't have ADDR_PREFIX_OPCODE prefix for
> > >> > +              VSIB if there is segment override since address will
> > >> > +              be segment override + zero-extended to 64 bits of
> > >> > +              (base + index * scale + disp).  */
> > >>
> > >> ... I don't understand this: What is it that goes wrong here in
> > >> x32 mode? "base" is either zero or a full 64-bit value anyway, so
> > >
> > > "base" can be a 32-bit register:
> > >
> > > vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
> >
> > Oh, sorry, I mixed up base and segment base (because of the
> > mention of a segment override here).
>
> Explicit segment is disallowed for 32-bit address in x32.
>
> > >> I'm struggling in the first place what (uniform) zero-extension the
> > >> comment is talking about. But I also don't understand why, if this
> > >> was needed at all, it would affect VSIB addressing only.
> > >
> > > Segment override is applied AFTER "base + index * scale + disp".
> > > So memory address in
> > >
> > > movl %fs:(%eax), %eax
> > >
> > > is %fs + zero-extend (%eax), not zero-extend (%fs + %eax).
> >
> > Sure.
> >
> > > In x32, GCC avoids 32-bit base/index for TLS:
> > >[...]
> >
> > I'm sorry, but this still doesn't make me see what's wrong with
> > segment overrides in x32 mode, and only with VSIB addressing.
>
> I will add check for other cases.
>
> > >> > +           if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
> > >> > +             {
> > >> > +                const seg_entry *seg;
> > >> > +                if (i.seg[0])
> > >> > +                  seg = i.seg[0];
> > >> > +                else
> > >> > +                  seg = i.seg[1];
> > >> > +                as_bad (_("can't encode segment `%s' with 32-bit VSIB"),
> > >> > +                        seg->seg_name);
> > >>
> > >> Please don't emit the % prefix unconditionally, it should not be there
> > >> in no-prefix / Intel mode.
> > >
> > > What % prefix?
> > >
> > > [hjl@gnu-4 pr24263]$ cat y.s
> > > .text
> > > vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
> > > vgatherdps %ymm12,%fs:0xc(,%ymm15,1),%ymm11
> > > [hjl@gnu-4 pr24263]$ ./as --x32  -o y.o y.s
> > > y.s: Assembler messages:
> > > y.s:2: Error: can't encode segment `fs' with 32-bit VSIB
> > > y.s:3: Error: can't encode segment `fs' with 32-bit VSIB
> > > [hjl@gnu-4 pr24263]$
> >
> > Oops, sorry - I should have asked for a % prefix to be added outside
> > of no-prefix mode.
>
> Sure.
>

Here is the updated patch.  Tested for glibc, GCC, binutils and CPU
CPU 2000.

-- 
H.J.

[-- Attachment #2: 0001-x32-Generate-0x67-prefix-for-VSIB-address-without-ba.patch --]
[-- Type: text/x-patch, Size: 22192 bytes --]

From 18abbd9770fb021195cd3950e4a1104de5158d1b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 24 Feb 2019 18:38:33 -0800
Subject: [PATCH] x32: Generate 0x67 prefix for VSIB address without base

In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
register so that vector index register will be zero-extended to 64 bits.
We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
segment override since address will be segment base + zero-extended to 64
bits of (base + index * scale + disp).

	PR gas/24263
	* config/tc-i386.c (output_insn): In x32, add 0x67 address size
	prefix for VSIB address without base register and issue an error
	if there is segment override with ADDR_PREFIX_OPCODE prefix.
	* testsuite/gas/i386/ilp32/ilp32.exp: Run x86-64-seg-inval.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.l: New file.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.s: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.d: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.s: Likewise.
---
 gas/config/tc-i386.c                          |  24 ++
 gas/testsuite/gas/i386/ilp32/ilp32.exp        |   1 +
 .../gas/i386/ilp32/x86-64-seg-inval.l         |   7 +
 .../gas/i386/ilp32/x86-64-seg-inval.s         |   8 +
 gas/testsuite/gas/i386/ilp32/x86-64-seg.d     | 207 ++++++++++++++++++
 gas/testsuite/gas/i386/ilp32/x86-64-seg.s     |   9 +
 6 files changed, 256 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d31ee6abdd..7f53441a38 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8141,6 +8141,30 @@ output_insn (void)
 	  i.prefix[LOCK_PREFIX] = 0;
 	}
 
+#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
+      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
+	{
+	  /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
+	     without base register so that vector index register will
+	     be zero-extended to 64 bits.  */
+	  if (!i.base_reg && i.tm.opcode_modifier.vecsib)
+	    add_prefix (ADDR_PREFIX_OPCODE);
+	  /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
+	     segment override since final address will be segment
+	     base + zero-extended (base + index * scale + disp).  */
+	  if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])
+	    {
+	      const seg_entry *seg;
+	      if (i.seg[0])
+		seg = i.seg[0];
+	      else
+		seg = i.seg[1];
+	      as_bad (_("can't encode segment `%s%s' with 32-bit address"),
+		      register_prefix, seg->seg_name);
+	    }
+	}
+#endif
+
       /* Since the VEX/EVEX prefix contains the implicit prefix, we
 	 don't need the explicit prefix.  */
       if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
diff --git a/gas/testsuite/gas/i386/ilp32/ilp32.exp b/gas/testsuite/gas/i386/ilp32/ilp32.exp
index d3a7190ac5..eba03ae891 100644
--- a/gas/testsuite/gas/i386/ilp32/ilp32.exp
+++ b/gas/testsuite/gas/i386/ilp32/ilp32.exp
@@ -38,6 +38,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_x32_check] &
     }
 
     run_list_test "reloc64" "--defsym _bad_=1"
+    run_list_test "x86-64-seg-inval"
 
     set ASFLAGS "$old_ASFLAGS"
 }
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
new file mode 100644
index 0000000000..be11a408df
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
@@ -0,0 +1,7 @@
+.*: Assembler messages:
+.*:3: Error: can't encode segment `%fs' with 32-bit address
+.*:4: Error: can't encode segment `%gs' with 32-bit address
+.*:5: Error: can't encode segment `%fs' with 32-bit address
+.*:6: Error: can't encode segment `%fs' with 32-bit address
+.*:7: Error: can't encode segment `%gs' with 32-bit address
+.*:8: Error: can't encode segment `%gs' with 32-bit address
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
new file mode 100644
index 0000000000..c5097e7cab
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
@@ -0,0 +1,8 @@
+	.text
+	.allow_index_reg
+	vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps %ymm12,%gs:0xc(,%ymm15,1),%ymm11
+	movl	%fs:(%eax), %eax
+	movl	%fs:(,%eax,1), %eax
+	movl	%gs:(,%eiz,1), %eax
+	movl	%gs:(%eip), %eax
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.d b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
new file mode 100644
index 0000000000..86e5526676
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
@@ -0,0 +1,207 @@
+#as: -I$srcdir/$subdir
+#objdump: -dw
+#name: x86-64 (ILP32) segment
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	64 8b 04 25 00 00 00 00 	mov    %fs:0x0,%eax
+ +[a-f0-9]+:	65 8b 05 00 00 00 00 	mov    %gs:0x0\(%rip\),%eax        # [a-f0-9]+ <_start\+0x[a-f0-9]+>
+ +[a-f0-9]+:	65 8b 00             	mov    %gs:\(%rax\),%eax
+ +[a-f0-9]+:	67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%rax,%ymm15,1\),%ymm11
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
new file mode 100644
index 0000000000..7ad33e498c
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
@@ -0,0 +1,9 @@
+.include "../x86-64-avx-gather.s"
+
+	.text
+	.att_syntax
+	movl	%fs:0, %eax
+	movl	%gs:(%rip), %eax
+	movl	%gs:(%rax), %eax
+	vgatherdps	%ymm12,0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps	%ymm12,%fs:0xc(%rax,%ymm15,1),%ymm11
-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-25 22:55         ` H.J. Lu
@ 2019-02-26  4:35           ` H.J. Lu
  2019-02-26 11:41             ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-26  4:35 UTC (permalink / raw)
  To: Jan Beulich, Binutils

On Mon, Feb 25, 2019 at 02:54:28PM -0800, H.J. Lu wrote:
> Here is the updated patch.  Tested for glibc, GCC, binutils and CPU
> CPU 2000.
> 

This patch changed error into warning for GCC.


H.J.
---
In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
register so that vector index register will be zero-extended to 64 bits.

We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
segment override since address will be segment base + zero-extended to 64
bits of (base + index * scale + disp).  But GCC:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502

generates

	movl	$24, %edx
	movl	%fs:(%edx), %ecx

instead of

	movl	%fs:24, %ecx

So a warning:

Warning: segment `%fs' override with 32-bit address

is issued by default.  -moperand-check=error will turn a warning into
an error.

Error: can't encode segment `%fs' with 32-bit address

	PR gas/24263
	* config/tc-i386.c (output_insn): In x32, add 0x67 address size
	prefix for VSIB address without base register.  Issue a warning
	or an error if there is segment override with ADDR_PREFIX_OPCODE
	prefix.
	* testsuite/gas/i386/ilp32/ilp32.exp: Run x86-64-seg-inval.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.l: New file.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.s: Likewise.
	* estsuite/gas/i386/ilp32/x86-64-seg-warn.d: Likewise.
	* estsuite/gas/i386/ilp32/x86-64-seg-warn.s: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.d: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.s: Likewise.
---
 gas/config/tc-i386.c                          |  30 +++
 gas/testsuite/gas/i386/ilp32/ilp32.exp        |   1 +
 .../gas/i386/ilp32/x86-64-seg-inval.l         |   7 +
 .../gas/i386/ilp32/x86-64-seg-inval.s         |   9 +
 .../gas/i386/ilp32/x86-64-seg-warn.d          |  17 ++
 .../gas/i386/ilp32/x86-64-seg-warn.e          |   7 +
 gas/testsuite/gas/i386/ilp32/x86-64-seg.d     | 207 ++++++++++++++++++
 gas/testsuite/gas/i386/ilp32/x86-64-seg.s     |   9 +
 8 files changed, 287 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d31ee6abdd..df7c152cc4 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8141,6 +8141,36 @@ output_insn (void)
 	  i.prefix[LOCK_PREFIX] = 0;
 	}
 
+#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
+      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
+	{
+	  /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
+	     without base register so that vector index register will
+	     be zero-extended to 64 bits.  */
+	  if (!i.base_reg && i.tm.opcode_modifier.vecsib)
+	    add_prefix (ADDR_PREFIX_OPCODE);
+	  /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
+	     segment override since final address will be segment
+	     base + zero-extended (base + index * scale + disp).  */
+	  if (operand_check != check_none
+	      && i.prefix[ADDR_PREFIX]
+	      && i.prefix[SEG_PREFIX])
+	    {
+	      const seg_entry *seg;
+	      if (i.seg[0])
+		seg = i.seg[0];
+	      else
+		seg = i.seg[1];
+	      if (operand_check == check_error)
+		as_bad (_("can't encode segment `%s%s' with 32-bit address"),
+			register_prefix, seg->seg_name);
+	      else
+		as_warn (_("segment `%s%s' override with 32-bit address"),
+			 register_prefix, seg->seg_name);
+	    }
+	}
+#endif
+
       /* Since the VEX/EVEX prefix contains the implicit prefix, we
 	 don't need the explicit prefix.  */
       if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
diff --git a/gas/testsuite/gas/i386/ilp32/ilp32.exp b/gas/testsuite/gas/i386/ilp32/ilp32.exp
index d3a7190ac5..fe1e9ea5df 100644
--- a/gas/testsuite/gas/i386/ilp32/ilp32.exp
+++ b/gas/testsuite/gas/i386/ilp32/ilp32.exp
@@ -38,6 +38,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_x32_check] &
     }
 
     run_list_test "reloc64" "--defsym _bad_=1"
+    run_list_test "x86-64-seg-inval" "-moperand-check=error"
 
     set ASFLAGS "$old_ASFLAGS"
 }
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
new file mode 100644
index 0000000000..7ec3f4d14b
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
@@ -0,0 +1,7 @@
+.*: Assembler messages:
+.*:4: Error: can't encode segment `%fs' with 32-bit address
+.*:5: Error: can't encode segment `%gs' with 32-bit address
+.*:6: Error: can't encode segment `%fs' with 32-bit address
+.*:7: Error: can't encode segment `%fs' with 32-bit address
+.*:8: Error: can't encode segment `%gs' with 32-bit address
+.*:9: Error: can't encode segment `%gs' with 32-bit address
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
new file mode 100644
index 0000000000..8117c68ec2
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
@@ -0,0 +1,9 @@
+	.text
+	.allow_index_reg
+_start:
+	vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps %ymm12,%gs:0xc(,%ymm15,1),%ymm11
+	movl	%fs:(%eax), %eax
+	movl	%fs:(,%eax,1), %eax
+	movl	%gs:(,%eiz,1), %eax
+	movl	%gs:(%eip), %eax
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
new file mode 100644
index 0000000000..7c317c2d6b
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
@@ -0,0 +1,17 @@
+#source: x86-64-seg-inval.s
+#warning_output: x86-64-seg-warn.e
+#objdump: -dw
+#name: x86-64 (ILP32) segment (warning)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	64 67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	65 67 c4 22 1d 92 1c 3d 0c 00 00 00 	vgatherdps %ymm12,%gs:0xc\(,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 67 8b 00          	mov    %fs:\(%eax\),%eax
+ +[a-f0-9]+:	64 67 8b 04 05 00 00 00 00 	mov    %fs:0x0\(,%eax,1\),%eax
+ +[a-f0-9]+:	65 67 8b 04 25 00 00 00 00 	mov    %gs:0x0\(,%eiz,1\),%eax
+ +[a-f0-9]+:	65 67 8b 05 00 00 00 00 	mov    %gs:0x0\(%eip\),%eax        # [a-f0-9]+ <_start\+0x[a-f0-9]+>
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
new file mode 100644
index 0000000000..f5a030f220
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
@@ -0,0 +1,7 @@
+.*: Assembler messages:
+.*:4: Warning: segment `%fs' override with 32-bit address
+.*:5: Warning: segment `%gs' override with 32-bit address
+.*:6: Warning: segment `%fs' override with 32-bit address
+.*:7: Warning: segment `%fs' override with 32-bit address
+.*:8: Warning: segment `%gs' override with 32-bit address
+.*:9: Warning: segment `%gs' override with 32-bit address
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.d b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
new file mode 100644
index 0000000000..86e5526676
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
@@ -0,0 +1,207 @@
+#as: -I$srcdir/$subdir
+#objdump: -dw
+#name: x86-64 (ILP32) segment
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	64 8b 04 25 00 00 00 00 	mov    %fs:0x0,%eax
+ +[a-f0-9]+:	65 8b 05 00 00 00 00 	mov    %gs:0x0\(%rip\),%eax        # [a-f0-9]+ <_start\+0x[a-f0-9]+>
+ +[a-f0-9]+:	65 8b 00             	mov    %gs:\(%rax\),%eax
+ +[a-f0-9]+:	67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%rax,%ymm15,1\),%ymm11
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
new file mode 100644
index 0000000000..7ad33e498c
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
@@ -0,0 +1,9 @@
+.include "../x86-64-avx-gather.s"
+
+	.text
+	.att_syntax
+	movl	%fs:0, %eax
+	movl	%gs:(%rip), %eax
+	movl	%gs:(%rax), %eax
+	vgatherdps	%ymm12,0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps	%ymm12,%fs:0xc(%rax,%ymm15,1),%ymm11
-- 
2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26  4:35           ` V2: " H.J. Lu
@ 2019-02-26 11:41             ` Jan Beulich
  2019-02-26 13:24               ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-26 11:41 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
> In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
> register so that vector index register will be zero-extended to 64 bits.
> 
> We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
> segment override since address will be segment base + zero-extended to 64
> bits of (base + index * scale + disp).  But GCC:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 

Neither above nor in the bug you explain what's wrong with the
segment override plus address size override in x32 mode. Since you
keep using the same wording with just slight alterations, it must be
something very obvious to you, but entirely un-obvious to me. Is
this related to the desire of using both negative and positive
offsets into TLS, where (obviously I would say) there's not going
to be any wrapping at the 4Gb boundary? If so, I'd say the TLS
usage model is broken, but it's not the assembler that should
prevent use of otherwise valid constructs. Whether full 64-bit
addresses (and hence full non-zero %fs/%gs bases with no
wrapping at the 4Gb boundary) is intended is the programmer's
choice, not something the assembler should enforce unconditionally.
Optionally emitting a warning is acceptable, but then this shouldn't
be tied to any other, more generically applicable warnings.

In any event, if this is to stay, then at least the code comment
needs to be quite a bit more clear - "we can't have" is not enough
without explicitly saying why that is.

> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -8141,6 +8141,36 @@ output_insn (void)
>  	  i.prefix[LOCK_PREFIX] = 0;
>  	}
>  
> +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
> +	{
> +	  /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> +	     without base register so that vector index register will
> +	     be zero-extended to 64 bits.  */
> +	  if (!i.base_reg && i.tm.opcode_modifier.vecsib)
> +	    add_prefix (ADDR_PREFIX_OPCODE);

Just to re-state: There needs to be a way to override this behavior.
And this is already leaving aside that making this the default from
now on has a fair risk of breaking currently working code. (Note
that this is not to say that I can't see that the change will also
help currently broken code.)

> +	  /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
> +	     segment override since final address will be segment
> +	     base + zero-extended (base + index * scale + disp).  */
> +	  if (operand_check != check_none
> +	      && i.prefix[ADDR_PREFIX]
> +	      && i.prefix[SEG_PREFIX])
> +	    {
> +	      const seg_entry *seg;
> +	      if (i.seg[0])
> +		seg = i.seg[0];
> +	      else
> +		seg = i.seg[1];
> +	      if (operand_check == check_error)
> +		as_bad (_("can't encode segment `%s%s' with 32-bit address"),
> +			register_prefix, seg->seg_name);
> +	      else
> +		as_warn (_("segment `%s%s' override with 32-bit address"),
> +			 register_prefix, seg->seg_name);
> +	    }
> +	}
> +#endif
> +
>        /* Since the VEX/EVEX prefix contains the implicit prefix, we
>  	 don't need the explicit prefix.  */
>        if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 11:41             ` Jan Beulich
@ 2019-02-26 13:24               ` H.J. Lu
  2019-02-26 14:46                 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-26 13:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
> > register so that vector index register will be zero-extended to 64 bits.
> >
> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
> > segment override since address will be segment base + zero-extended to 64
> > bits of (base + index * scale + disp).  But GCC:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502
>
> Neither above nor in the bug you explain what's wrong with the
> segment override plus address size override in x32 mode. Since you

X32 relies on 0x67 prefix to zero-extend address to 64 bits:

zero-extended (base + index * scale + disp)

With segment override, we got

segment base + zero-extended (base + index * scale + disp)

instead of

zero-extended (segment base + base + index * scale + disp)

When base + index * scale + disp is negative, we get the wrong
address.

VSIB address in vgatherdps is

base + sign-extend(index) * scale + disp

With segment override, we got

segment base + zero-extended (base + sign-extend(index) * scale + disp)

175.vpr in SPEC CPU 2000:

VPR FPGA Placement and Routing Program Version 4.00-spec
Source completed August 19, 1997.


General Options:
The circuit will be placed but not routed.

Placer Options:
User annealing schedule selected with:
Initial Temperature: 5
Exit (Final) Temperature: 0.005
Temperature Reduction factor (alpha_t): 0.9412
Number of moves in the inner loop is (num_blocks)^4/3 * 2
Placement cost type is linear congestion.
Placement will be performed once.
Placement channel width factor = 100.
Exponent used in placement cost: 1
Initial random seed: 1

Reading the FPGA architectural description from arch.in.
Successfully read arch.in.
Pins per clb: 6.  Pads per row/column: 2.
Subblocks per clb: 1.  Subblock LUT size: 4.
Fc value is fraction of tracks in a channel.
Fc_output: 1.  Fc_input: 1.  Fc_pad: 1.
Switch block type: Subset.
Distinct types of segments: 3.
Distinct types of user-specified switches: 3.

Reading the circuit netlist from net.in.
Warning:  logic block #368 (n_n13961) has only 1 pin.
Pin is an output -- may be a constant generator.  Non-fatal, but check this.
Successfully read net.in.
8527 blocks, 8445 nets, 1 global nets.
8383 clbs, 62 inputs, 82 outputs.
The circuit will be mapped into a 92 x 92 array of clbs.


Program received signal SIGSEGV, Segmentation fault.
0x004158fd in try_place.isra ()
(gdb) disass 0x004158fd,+32
Dump of assembler code from 0x4158fd to 0x41591d:
=> 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
   0x00415907 <try_place.isra.5+7527>: vandps %ymm12,%ymm7,%ymm0
   0x0041590c <try_place.isra.5+7532>: vpslld $0x2,%ymm1,%ymm10
   0x00415911 <try_place.isra.5+7537>: vmovdqa 0x1cbe7(%rip),%ymm13
    # 0x432500
   0x00415919 <try_place.isra.5+7545>: inc    %eax
   0x0041591b <try_place.isra.5+7547>: vpaddd %ymm5,%ymm10,%ymm14
End of assembler dump.
(gdb) p/x $ymm15
$1 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
    0x8000000000000000, 0x8000000000000000, 0x8000000000000000,
    0x8000000000000000}, v32_int8 = {0x10, 0x30, 0xfa, 0xf7, 0x24, 0x30, 0xfa,
    0xf7, 0x38, 0x30, 0xfa, 0xf7, 0x4c, 0x30, 0xfa, 0xf7, 0x60, 0x30, 0xfa,
    0xf7, 0x74, 0x30, 0xfa, 0xf7, 0x88, 0x30, 0xfa, 0xf7, 0x9c, 0x30, 0xfa,
    0xf7}, v16_int16 = {0x3010, 0xf7fa, 0x3024, 0xf7fa, 0x3038, 0xf7fa,
    0x304c, 0xf7fa, 0x3060, 0xf7fa, 0x3074, 0xf7fa, 0x3088, 0xf7fa, 0x309c,
    0xf7fa}, v8_int32 = {0xf7fa3010, 0xf7fa3024, 0xf7fa3038, 0xf7fa304c,
    0xf7fa3060, 0xf7fa3074, 0xf7fa3088, 0xf7fa309c}, v4_int64 = {
    0xf7fa3024f7fa3010, 0xf7fa304cf7fa3038, 0xf7fa3074f7fa3060,
    0xf7fa309cf7fa3088}, v2_int128 = {0xf7fa304cf7fa3038f7fa3024f7fa3010,
    0xf7fa309cf7fa3088f7fa3074f7fa3060}}
(gdb)

Here indexes are 0xf7fa3010, .... Before my fix, they are sign-extended to
0xfffffffff7fa3010 which leads to invalid address in x32.

> keep using the same wording with just slight alterations, it must be
> something very obvious to you, but entirely un-obvious to me. Is
> this related to the desire of using both negative and positive
> offsets into TLS, where (obviously I would say) there's not going
> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS

It won't wrap for x32.

> usage model is broken, but it's not the assembler that should

No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.

> prevent use of otherwise valid constructs. Whether full 64-bit

Assembly is correct for 64-bit mode.  Since it doesn't work for
x32 when offset is negative, we should at least give a warning.

> addresses (and hence full non-zero %fs/%gs bases with no
> wrapping at the 4Gb boundary) is intended is the programmer's
> choice, not something the assembler should enforce unconditionally.
> Optionally emitting a warning is acceptable, but then this shouldn't
> be tied to any other, more generically applicable warnings.

Binutils, including linker, does a few things special for x32 to deal
with address limitation.  This is just one of them.

> In any event, if this is to stay, then at least the code comment
> needs to be quite a bit more clear - "we can't have" is not enough
> without explicitly saying why that is.
>
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -8141,6 +8141,36 @@ output_insn (void)
> >         i.prefix[LOCK_PREFIX] = 0;
> >       }
> >
> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
> > +     {
> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> > +          without base register so that vector index register will
> > +          be zero-extended to 64 bits.  */
> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)
> > +         add_prefix (ADDR_PREFIX_OPCODE);
>
> Just to re-state: There needs to be a way to override this behavior.
> And this is already leaving aside that making this the default from
> now on has a fair risk of breaking currently working code. (Note
> that this is not to say that I can't see that the change will also
> help currently broken code.)

Please see above.  If VSIB index is below 2G, my fix doesn't change
anything.  If VSIB index is above 2G, the program crashes before my fix.

> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
> > +          segment override since final address will be segment
> > +          base + zero-extended (base + index * scale + disp).  */
> > +       if (operand_check != check_none
> > +           && i.prefix[ADDR_PREFIX]
> > +           && i.prefix[SEG_PREFIX])
> > +         {
> > +           const seg_entry *seg;
> > +           if (i.seg[0])
> > +             seg = i.seg[0];
> > +           else
> > +             seg = i.seg[1];
> > +           if (operand_check == check_error)
> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),

How about just

segment `%s%s' override with 32-bit address

> > +                     register_prefix, seg->seg_name);
> > +           else
> > +             as_warn (_("segment `%s%s' override with 32-bit address"),
> > +                      register_prefix, seg->seg_name);
> > +         }
> > +     }
> > +#endif
> > +
> >        /* Since the VEX/EVEX prefix contains the implicit prefix, we
> >        don't need the explicit prefix.  */
> >        if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
>
>
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 13:24               ` H.J. Lu
@ 2019-02-26 14:46                 ` Jan Beulich
  2019-02-26 16:08                   ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-26 14:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:
> On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
>> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
>> > register so that vector index register will be zero-extended to 64 bits.
>> >
>> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
>> > segment override since address will be segment base + zero-extended to 64
>> > bits of (base + index * scale + disp).  But GCC:
>> >
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 
>>
>> Neither above nor in the bug you explain what's wrong with the
>> segment override plus address size override in x32 mode. Since you
> 
> X32 relies on 0x67 prefix to zero-extend address to 64 bits:
> 
> zero-extended (base + index * scale + disp)
> 
> With segment override, we got
> 
> segment base + zero-extended (base + index * scale + disp)
> 
> instead of
> 
> zero-extended (segment base + base + index * scale + disp)
> 
> When base + index * scale + disp is negative, we get the wrong
> address.
> 
> VSIB address in vgatherdps is
> 
> base + sign-extend(index) * scale + disp
> 
> With segment override, we got
> 
> segment base + zero-extended (base + sign-extend(index) * scale + disp)

Right. But whether that's what the programmer wanted we don't
know. Also please consider the qword index forms as well, plus
the dword index forms with scaling factor 2, 4, or 8 (allowing for
effective indexes up to 35 bits wide).

All of this would be acceptable if address space was limited to 4Gb
for x32, but that's not the case according to my reading of the
chapter in the psABI.

> 175.vpr in SPEC CPU 2000:
>[...]
> Program received signal SIGSEGV, Segmentation fault.
> 0x004158fd in try_place.isra ()
> (gdb) disass 0x004158fd,+32
> Dump of assembler code from 0x4158fd to 0x41591d:
> => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12

Okay, this is the special case of the index register actually holding
addresses. What about the case where the displacement is the base
address, and the index register holds indeed indexes?

>> keep using the same wording with just slight alterations, it must be
>> something very obvious to you, but entirely un-obvious to me. Is
>> this related to the desire of using both negative and positive
>> offsets into TLS, where (obviously I would say) there's not going
>> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS
> 
> It won't wrap for x32.
> 
>> usage model is broken, but it's not the assembler that should
> 
> No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.

I trust you that you follow what is written there. The question
though is whether it wasn't a mistake to permit negative offsets in
the first place.

>> prevent use of otherwise valid constructs. Whether full 64-bit
> 
> Assembly is correct for 64-bit mode.  Since it doesn't work for
> x32 when offset is negative, we should at least give a warning.

Well, yes, since the ABI can't reasonably be changed, emitting a
warning looks like the only option now. But as said, please don't tie
this to that pre-existing one, not the least because that's also what
is going to control the lack-of-disambiguating-suffix diagnostic in
AT&T mode the change for I hope to submit at some point over the
next several months (now that I've mostly completed the prereqs
you had set for this).

>> addresses (and hence full non-zero %fs/%gs bases with no
>> wrapping at the 4Gb boundary) is intended is the programmer's
>> choice, not something the assembler should enforce unconditionally.
>> Optionally emitting a warning is acceptable, but then this shouldn't
>> be tied to any other, more generically applicable warnings.
> 
> Binutils, including linker, does a few things special for x32 to deal
> with address limitation.  This is just one of them.

But are there pre-existing cases where in order to make one
thing work a different thing got deliberately broken?

>> In any event, if this is to stay, then at least the code comment
>> needs to be quite a bit more clear - "we can't have" is not enough
>> without explicitly saying why that is.
>>
>> > --- a/gas/config/tc-i386.c
>> > +++ b/gas/config/tc-i386.c
>> > @@ -8141,6 +8141,36 @@ output_insn (void)
>> >         i.prefix[LOCK_PREFIX] = 0;
>> >       }
>> >
>> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
>> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
>> > +     {
>> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
>> > +          without base register so that vector index register will
>> > +          be zero-extended to 64 bits.  */
>> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)
>> > +         add_prefix (ADDR_PREFIX_OPCODE);
>>
>> Just to re-state: There needs to be a way to override this behavior.
>> And this is already leaving aside that making this the default from
>> now on has a fair risk of breaking currently working code. (Note
>> that this is not to say that I can't see that the change will also
>> help currently broken code.)
> 
> Please see above.  If VSIB index is below 2G, my fix doesn't change
> anything.  If VSIB index is above 2G, the program crashes before my fix.

Right, and I didn't put under question that you indeed fix one
specific case. I just can't help thinking that you do so by breaking
other cases, as per above. And I am of the opinion that it ought
to be the compiler (or assembly programmer) who ought to
explicitly request 32-bit addressing (e.g. by way of using the
addr32 prefix) in this specific example of yours.

>> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
>> > +          segment override since final address will be segment
>> > +          base + zero-extended (base + index * scale + disp).  */
>> > +       if (operand_check != check_none
>> > +           && i.prefix[ADDR_PREFIX]
>> > +           && i.prefix[SEG_PREFIX])
>> > +         {
>> > +           const seg_entry *seg;
>> > +           if (i.seg[0])
>> > +             seg = i.seg[0];
>> > +           else
>> > +             seg = i.seg[1];
>> > +           if (operand_check == check_error)
>> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),
> 
> How about just
> 
> segment `%s%s' override with 32-bit address

That's slightly better text indeed.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 14:46                 ` Jan Beulich
@ 2019-02-26 16:08                   ` H.J. Lu
  2019-02-26 16:16                     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-26 16:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:
> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
> >> > register so that vector index register will be zero-extended to 64 bits.
> >> >
> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
> >> > segment override since address will be segment base + zero-extended to 64
> >> > bits of (base + index * scale + disp).  But GCC:
> >> >
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502
> >>
> >> Neither above nor in the bug you explain what's wrong with the
> >> segment override plus address size override in x32 mode. Since you
> >
> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:
> >
> > zero-extended (base + index * scale + disp)
> >
> > With segment override, we got
> >
> > segment base + zero-extended (base + index * scale + disp)
> >
> > instead of
> >
> > zero-extended (segment base + base + index * scale + disp)
> >
> > When base + index * scale + disp is negative, we get the wrong
> > address.
> >
> > VSIB address in vgatherdps is
> >
> > base + sign-extend(index) * scale + disp
> >
> > With segment override, we got
> >
> > segment base + zero-extended (base + sign-extend(index) * scale + disp)
>
> Right. But whether that's what the programmer wanted we don't
> know. Also please consider the qword index forms as well, plus
> the dword index forms with scaling factor 2, 4, or 8 (allowing for
> effective indexes up to 35 bits wide).
>
> All of this would be acceptable if address space was limited to 4Gb
> for x32, but that's not the case according to my reading of the
> chapter in the psABI.

10.4 Kernel Support
Kernel should limit stack and addresses returned from system calls
bewteen 0x00000000
to 0xf f f f f f f f .

> > 175.vpr in SPEC CPU 2000:
> >[...]
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x004158fd in try_place.isra ()
> > (gdb) disass 0x004158fd,+32
> > Dump of assembler code from 0x4158fd to 0x41591d:
> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
>
> Okay, this is the special case of the index register actually holding
> addresses. What about the case where the displacement is the base
> address, and the index register holds indeed indexes?

I will fix it.

> >> keep using the same wording with just slight alterations, it must be
> >> something very obvious to you, but entirely un-obvious to me. Is
> >> this related to the desire of using both negative and positive
> >> offsets into TLS, where (obviously I would say) there's not going
> >> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS
> >
> > It won't wrap for x32.
> >
> >> usage model is broken, but it's not the assembler that should
> >
> > No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.
>
> I trust you that you follow what is written there. The question
> though is whether it wasn't a mistake to permit negative offsets in
> the first place.

Negative offset is by design.

> >> prevent use of otherwise valid constructs. Whether full 64-bit
> >
> > Assembly is correct for 64-bit mode.  Since it doesn't work for
> > x32 when offset is negative, we should at least give a warning.
>
> Well, yes, since the ABI can't reasonably be changed, emitting a
> warning looks like the only option now. But as said, please don't tie
> this to that pre-existing one, not the least because that's also what

Existing code will get a warning.

> is going to control the lack-of-disambiguating-suffix diagnostic in
> AT&T mode the change for I hope to submit at some point over the
> next several months (now that I've mostly completed the prereqs
> you had set for this).
>
> >> addresses (and hence full non-zero %fs/%gs bases with no
> >> wrapping at the 4Gb boundary) is intended is the programmer's
> >> choice, not something the assembler should enforce unconditionally.
> >> Optionally emitting a warning is acceptable, but then this shouldn't
> >> be tied to any other, more generically applicable warnings.
> >
> > Binutils, including linker, does a few things special for x32 to deal
> > with address limitation.  This is just one of them.
>
> But are there pre-existing cases where in order to make one
> thing work a different thing got deliberately broken?

It works only if offset isn't negative.

> >> In any event, if this is to stay, then at least the code comment
> >> needs to be quite a bit more clear - "we can't have" is not enough
> >> without explicitly saying why that is.
> >>
> >> > --- a/gas/config/tc-i386.c
> >> > +++ b/gas/config/tc-i386.c
> >> > @@ -8141,6 +8141,36 @@ output_insn (void)
> >> >         i.prefix[LOCK_PREFIX] = 0;
> >> >       }
> >> >
> >> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
> >> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
> >> > +     {
> >> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
> >> > +          without base register so that vector index register will
> >> > +          be zero-extended to 64 bits.  */
> >> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)
> >> > +         add_prefix (ADDR_PREFIX_OPCODE);
> >>
> >> Just to re-state: There needs to be a way to override this behavior.
> >> And this is already leaving aside that making this the default from
> >> now on has a fair risk of breaking currently working code. (Note
> >> that this is not to say that I can't see that the change will also
> >> help currently broken code.)
> >
> > Please see above.  If VSIB index is below 2G, my fix doesn't change
> > anything.  If VSIB index is above 2G, the program crashes before my fix.
>
> Right, and I didn't put under question that you indeed fix one
> specific case. I just can't help thinking that you do so by breaking
> other cases, as per above. And I am of the opinion that it ought
> to be the compiler (or assembly programmer) who ought to
> explicitly request 32-bit addressing (e.g. by way of using the
> addr32 prefix) in this specific example of yours.
>
> >> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
> >> > +          segment override since final address will be segment
> >> > +          base + zero-extended (base + index * scale + disp).  */
> >> > +       if (operand_check != check_none
> >> > +           && i.prefix[ADDR_PREFIX]
> >> > +           && i.prefix[SEG_PREFIX])
> >> > +         {
> >> > +           const seg_entry *seg;
> >> > +           if (i.seg[0])
> >> > +             seg = i.seg[0];
> >> > +           else
> >> > +             seg = i.seg[1];
> >> > +           if (operand_check == check_error)
> >> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),
> >
> > How about just
> >
> > segment `%s%s' override with 32-bit address
>
> That's slightly better text indeed.
>
> Jan
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 16:08                   ` H.J. Lu
@ 2019-02-26 16:16                     ` Jan Beulich
  2019-02-26 20:33                       ` V3: " H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-26 16:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 26.02.19 at 17:07, <hjl.tools@gmail.com> wrote:
> On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:
>> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>
>> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
>> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
>> >> > register so that vector index register will be zero-extended to 64 bits.
>> >> >
>> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
>> >> > segment override since address will be segment base + zero-extended to 64
>> >> > bits of (base + index * scale + disp).  But GCC:
>> >> >
>> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 
>> >>
>> >> Neither above nor in the bug you explain what's wrong with the
>> >> segment override plus address size override in x32 mode. Since you
>> >
>> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:
>> >
>> > zero-extended (base + index * scale + disp)
>> >
>> > With segment override, we got
>> >
>> > segment base + zero-extended (base + index * scale + disp)
>> >
>> > instead of
>> >
>> > zero-extended (segment base + base + index * scale + disp)
>> >
>> > When base + index * scale + disp is negative, we get the wrong
>> > address.
>> >
>> > VSIB address in vgatherdps is
>> >
>> > base + sign-extend(index) * scale + disp
>> >
>> > With segment override, we got
>> >
>> > segment base + zero-extended (base + sign-extend(index) * scale + disp)
>>
>> Right. But whether that's what the programmer wanted we don't
>> know. Also please consider the qword index forms as well, plus
>> the dword index forms with scaling factor 2, 4, or 8 (allowing for
>> effective indexes up to 35 bits wide).
>>
>> All of this would be acceptable if address space was limited to 4Gb
>> for x32, but that's not the case according to my reading of the
>> chapter in the psABI.
> 
> 10.4 Kernel Support
> Kernel should limit stack and addresses returned from system calls
> bewteen 0x00000000
> to 0xf f f f f f f f .

Hmm, if that's indeed the case, despite it - according to my
interpretation - contradicting 10.2's wording, and despite it
being an unnecessary restriction imo, then ...

>> > 175.vpr in SPEC CPU 2000:
>> >[...]
>> > Program received signal SIGSEGV, Segmentation fault.
>> > 0x004158fd in try_place.isra ()
>> > (gdb) disass 0x004158fd,+32
>> > Dump of assembler code from 0x4158fd to 0x41591d:
>> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
>>
>> Okay, this is the special case of the index register actually holding
>> addresses. What about the case where the displacement is the base
>> address, and the index register holds indeed indexes?
> 
> I will fix it.

... there's nothing to fix here, I think.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* V3: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 16:16                     ` Jan Beulich
@ 2019-02-26 20:33                       ` H.J. Lu
  2019-02-27  8:26                         ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2019-02-26 20:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

On Tue, Feb 26, 2019 at 8:16 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.02.19 at 17:07, <hjl.tools@gmail.com> wrote:
> > On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:
> >> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>
> >> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:
> >> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
> >> >> > register so that vector index register will be zero-extended to 64 bits.
> >> >> >
> >> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
> >> >> > segment override since address will be segment base + zero-extended to 64
> >> >> > bits of (base + index * scale + disp).  But GCC:
> >> >> >
> >> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502
> >> >>
> >> >> Neither above nor in the bug you explain what's wrong with the
> >> >> segment override plus address size override in x32 mode. Since you
> >> >
> >> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:
> >> >
> >> > zero-extended (base + index * scale + disp)
> >> >
> >> > With segment override, we got
> >> >
> >> > segment base + zero-extended (base + index * scale + disp)
> >> >
> >> > instead of
> >> >
> >> > zero-extended (segment base + base + index * scale + disp)
> >> >
> >> > When base + index * scale + disp is negative, we get the wrong
> >> > address.
> >> >
> >> > VSIB address in vgatherdps is
> >> >
> >> > base + sign-extend(index) * scale + disp
> >> >
> >> > With segment override, we got
> >> >
> >> > segment base + zero-extended (base + sign-extend(index) * scale + disp)
> >>
> >> Right. But whether that's what the programmer wanted we don't
> >> know. Also please consider the qword index forms as well, plus
> >> the dword index forms with scaling factor 2, 4, or 8 (allowing for
> >> effective indexes up to 35 bits wide).
> >>
> >> All of this would be acceptable if address space was limited to 4Gb
> >> for x32, but that's not the case according to my reading of the
> >> chapter in the psABI.
> >
> > 10.4 Kernel Support
> > Kernel should limit stack and addresses returned from system calls
> > bewteen 0x00000000
> > to 0xf f f f f f f f .
>
> Hmm, if that's indeed the case, despite it - according to my
> interpretation - contradicting 10.2's wording, and despite it
> being an unnecessary restriction imo, then ...
>
> >> > 175.vpr in SPEC CPU 2000:
> >> >[...]
> >> > Program received signal SIGSEGV, Segmentation fault.
> >> > 0x004158fd in try_place.isra ()
> >> > (gdb) disass 0x004158fd,+32
> >> > Dump of assembler code from 0x4158fd to 0x41591d:
> >> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
> >>
> >> Okay, this is the special case of the index register actually holding
> >> addresses. What about the case where the displacement is the base
> >> address, and the index register holds indeed indexes?
> >
> > I will fix it.
>
> ... there's nothing to fix here, I think.
>

Here is the updated patch.  I added VecSIBQword to mark VSIB instructions
with Qword indices and add 0x67 prefix only for VSIB address of Dword
indices without base register nor symbol so that Dword indices will be
zero-extended to 64 bits unless -moperand-check=none is passed to
assembler.

-- 
H.J.

[-- Attachment #2: 0001-x32-Generate-0x67-prefix-for-VSIB-address-if-needed.patch --]
[-- Type: application/x-patch, Size: 57813 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V3: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-26 20:33                       ` V3: " H.J. Lu
@ 2019-02-27  8:26                         ` Jan Beulich
  2019-02-27 18:09                           ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2019-02-27  8:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils

>>> On 26.02.19 at 21:33, <hjl.tools@gmail.com> wrote:
> Here is the updated patch.  I added VecSIBQword to mark VSIB instructions
> with Qword indices and add 0x67 prefix only for VSIB address of Dword
> indices without base register nor symbol so that Dword indices will be
> zero-extended to 64 bits unless -moperand-check=none is passed to
> assembler.

A couple of questions still remain:

1) What about a scale factor other than 1? Arguably this is difficult to
use with neither base nor O_symbol displacement, but it's not
impossible. As said before, _if_ qword indices are to be special cased,
I think such scale factors should be, too.

2) Given the wording you had quoted from psABI section 10.4, I did
suggest that special casing of qword indexes may then not be
necessary at all. Could you clarify why you (now) think otherwise?

3) Does the logic work not only with a specified displacement of zero,
but also without any displacement at all? The abort() invocations you
add make me uncertain of this, and the test cases you add don't
cover the case.

4) Why "else if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])"
instead of just if()? Isn't this diagnostic equally applicable to all the
VecSIB cases?

5) In the comment following "case O_constant", could you add
"assuming that the index register actually holds addresses" or
something along these lines? Similarly the other comment is still as
vague as it was before; as said I really think it lacks sufficient
clearness as to the "why", i.e. the non-wrapping behavior at 4Gb
should be mentioned explicitly rather then be implied.

6) You still use the existing operand_check to control the diagnostic.
This being a very special case which one may want to disable without
also disabling diagnostics for other, more generic operand checks,
don't you agree that it should be separately controllable?

7) Would you mind addressing the previously raised point of it (in
my opinion) really being the compiler's / assembly programmer's job
to enforce 32-bit addressing here?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: V3: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
  2019-02-27  8:26                         ` Jan Beulich
@ 2019-02-27 18:09                           ` H.J. Lu
  0 siblings, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2019-02-27 18:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Feb 27, 2019 at 12:26 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.02.19 at 21:33, <hjl.tools@gmail.com> wrote:
> > Here is the updated patch.  I added VecSIBQword to mark VSIB instructions
> > with Qword indices and add 0x67 prefix only for VSIB address of Dword
> > indices without base register nor symbol so that Dword indices will be
> > zero-extended to 64 bits unless -moperand-check=none is passed to
> > assembler.
>
> A couple of questions still remain:
>
> 1) What about a scale factor other than 1? Arguably this is difficult to
> use with neither base nor O_symbol displacement, but it's not
> impossible. As said before, _if_ qword indices are to be special cased,
> I think such scale factors should be, too.
>
> 2) Given the wording you had quoted from psABI section 10.4, I did
> suggest that special casing of qword indexes may then not be
> necessary at all. Could you clarify why you (now) think otherwise?
>
> 3) Does the logic work not only with a specified displacement of zero,
> but also without any displacement at all? The abort() invocations you
> add make me uncertain of this, and the test cases you add don't
> cover the case.
>
> 4) Why "else if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])"
> instead of just if()? Isn't this diagnostic equally applicable to all the
> VecSIB cases?
>
> 5) In the comment following "case O_constant", could you add
> "assuming that the index register actually holds addresses" or
> something along these lines? Similarly the other comment is still as
> vague as it was before; as said I really think it lacks sufficient
> clearness as to the "why", i.e. the non-wrapping behavior at 4Gb
> should be mentioned explicitly rather then be implied.
>
> 6) You still use the existing operand_check to control the diagnostic.
> This being a very special case which one may want to disable without
> also disabling diagnostics for other, more generic operand checks,
> don't you agree that it should be separately controllable?
>
> 7) Would you mind addressing the previously raised point of it (in
> my opinion) really being the compiler's / assembly programmer's job
> to enforce 32-bit addressing here?
>

Good point.  I withdraw my patch.  I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89523


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-02-27 18:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25 14:02 [PATCH] x32: Generate 0x67 prefix for VSIB address without base H.J. Lu
2019-02-25 15:14 ` Jan Beulich
2019-02-25 15:56   ` H.J. Lu
2019-02-25 16:18     ` Jan Beulich
2019-02-25 16:54       ` H.J. Lu
2019-02-25 22:55         ` H.J. Lu
2019-02-26  4:35           ` V2: " H.J. Lu
2019-02-26 11:41             ` Jan Beulich
2019-02-26 13:24               ` H.J. Lu
2019-02-26 14:46                 ` Jan Beulich
2019-02-26 16:08                   ` H.J. Lu
2019-02-26 16:16                     ` Jan Beulich
2019-02-26 20:33                       ` V3: " H.J. Lu
2019-02-27  8:26                         ` Jan Beulich
2019-02-27 18:09                           ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).